Michèle Finck - Blockchains and Data Protection in the EU (2017)
Introduction
- Blockchain: append-only decentralized database that is maintained by a consensus algorithm and stored on multiple nodes
- Data protection mechanisms developed for centralized data silos cannot be easily reconciled with a decentralized method of data storage and protection
- Even where data is encrypted or hashed it qualifies as personal data under EU law
- Theory: could be compatible on a meta-level, as, if properly designed, blockchains can pursue the GDPR’s underlying goal of giving a data subject more control over her data
- Blockchains by no means automatically support data sovereignty but rather must be purposefully designed to do so
- Huge variance in distributed ledgers and their internal governance structures; shared synchronized digital database
- Technically, true Blockchains designate the variants of DLT that record data in packages (‘blocks’) that are hashed (‘chained’) to another; but article refers to all others as Blockchains too for simplicity
- Data is usually grouped into blocks that, upon reaching a certain size, are chained to the existing ledger through a hashing process; data is chronologically ordered in a manner that makes it difficult to tamper with information without altering subsequent blocks -> tamper-resistance ->often described as ‘immutable’, but information can be modified in exceptional circumstances through human intervention, requires the collusion between a majority of the network’s nodes
- DLTs rely on a two-step verification process with asymmetric encryption; every user has a public key and a private key; private key can descript data that is encrypted through the public key, and the public key hides the identity of the individual unless they are linked to additional identifiers
- The nodes are the computers on which the ledger is stored
- Miners aggregate transactions into candidate blocks and hash a new block to the chain on the basis of a predetermined consensus protocol (such as proof-of-work or proof-of-stake)
- On permissionless blockchain anyone can arbitrarily read data, which is undesirable from a privacy perspective in regards to storing documents in plaintext; can be encrypted or hashed beforehand
- Blocks have limited storage capacity and storage is often expensive; not be an economical solution
- Under the common SHA 256 hashing algorithm, any amount of data will be reduced to a 32-byte hash value
- A cryptographic hash is a one-way function that cannot be reverse engineered, no key that can unlock data that has been hashed
- Blockchains can be private and permissioned; can run on a private network such as intranet or a VPN and an administrator to grant permission; focus of paper is on permissionless, public ones as they pose the most problems
Blockchains promise ...
- decentralized handling of data and data sovereignty, a concept that focuses on giving individuals control over their personal data and allowing them to share such information only with trusted parties; overlaps with GDPR's interest of data sovereignty, such as Art. 20 about data portability
- selective data sharing through adequate applications
- new forms of identity management by enabling individuals to control access to their identity information and to create, manage and use a self-sovereign identity
- remains to be seen whether this is true (probably not, and this is from 2017)
Affected data
- Two sets of data stored on blockchains can potentially be personal data under the GDPR: transactional data and public keys
- Transactional data: data related to individual behaviour in IoT use cases, digital identities, financial and medical data -> centers around transactions; can be stored plaintext, encrypted or hashing it to the chain
- Can this data be sufficiently anonymized to allow it to evade the GDPR scope of application? -> needs to "irreversibly prevent identification"; plaintext and encryption don't hold up as encryption is a pseudonymization; even hashing is considered a pseudonymisation by the Article 29 Working Party; means all transactional data falls under the scope of GDPR for now, unless a successor to SHA-256 or SHA-3 is announced that can anonymize
- Potential routes to go: personal data could be stored off-chain and merely linked to the blockchain through a hash pointer and the personal data is recorded in a referenced encrypted and modifiable database; Problem: Introduces the issue of finding a trusted third party which defeats the motivation for relying on DLT
- Attempts to design GDPR-compliant blockchains that hold data in a private store where the blockchain merely holds proof that the data is valid; challenge response patterns, off-chain signature patterns; delegated computing patterns; low contract footprint patterns; content addressable storage patterns
- Public Keys: String of letters and numbers for pseudonymous identification; can no longer be attributed to a specific person unless matched with additional information such as a name or address -> no anonymous data; public keys can be traced back to IP addresses and more (also see: [[C-582 14 Breyer]] on why IP addresses are considered personal data)
- Problem: Public Keys cannot be moved off-chain to evade GDPR compliance
- The way Monero works with hiding recipient, generating new dedicated address and secret key, one-time accounts etc. is not a solution because the transaction must completely empty one or more accounts and create one or more new accounts; merge avoidance -> highly porous, no high guarantee of privacy protection; some linking is unavoidable with multi-input transactions
- Zero-knowledge proofs: binary true/false without access to underlying data and without public key or what value was transferred
- Two-party smart contracts can involve state channels that only share information when a dispute occurs
- Ring signatures hide transactions within other transactions by tying a single transaction to multiple private keys even though only one initiated transaction
- Unclear if any of these count as anonymization
- Noise: several transactions are grouped together so from the outside it is impossible to discern the identity of the respective senders and recipients; Art. 29 WP has recognized that this could be an acceptable anonymization technique
Legal Consequences
- Private blockchains can have an identifiable controller and the users are more like data processors than controllers, but for other DLTs there is no central point of control due to decentralization; that most likely means all nodes qualify as data controller
- Nodes don't quality as joint controllers under Art. 26, as they don't jointly determine the purposes and means of processing; they can also not respond to the requirements of the GDPR that is required of centralized agents since they cannot see the data and are unable to make changes
- It would also be unclear how fines are calculated as there is no annual worldwide turnover in that sense
- Also poses the interesting question: can the data subject also be the data controller, if the user hashes their personal information to the blockchain?
Territorial Scope
- DLTs are transnational in nature; jurisdictional issues
- GDPR handles transnationality by applying to the processing of EU citizens' data and the establishment of a controller or processor in the EU regardless of whether the processing takes place in the EU
- Third country transfers are a problem; there is likely always an element of cross-border data processing on permissionless ledgers; data stored in blocks is hashed to the chain by a randomly selected miner that can be based anywhere, and then updated on each node no matter where it is, but it may be in a country that isn't declared as having an adequate level of protection, and SCCs or BCRs don't exist
- Possible solution: Art. 49 I a) GDPR: explicit consent for the transfer, informed about possible risks (sidenote: this seems like it could also be the basis for decentralized social media)
Enforcing Rights
- How a data subject enforce their rights - each node individually? How? And how can the node controllers correct, erase or restrict data based on a request from a data subject?
Data minimization
- DLTs and data minimization are at odds; specified, legitimate explicit purposes, storage limitations, and not further processed than is needed and not for other purposes; but data once added to a blockchain will perpetually remain part of it (if append-only database) and integral copies of the chain are stored on each full node
- Possible solution: Transactional data that is stored off-chain can be modified and minimized in line with legal requirements without touching the ledger itself
- Data subject cannot identify any or all of a blockchain's full nodes, so cannot address a claim
- Due to being immutable, information cannot be corrected; but it can be rectified by means of providing a supplementary statement
- Art. 19 technically says you need to inform other recipients of the data of the change, but seems to not apply here due to disproportionate effort of identifying and reaching out to all the nodes
Right to confirmation, copy etc.
- Right to obtain confirmation (Art. 15) difficult as controllers of nodes don't know which data is stored on the blockchain as they often only handle the encrypted or hashed version; data subject could join an unpermissioned network and obtain a copy of all data but this is likely not a good solution under GDPR; obtaining a copy of their data from controllers would also be impossible due to the cryptography
Right to be forgotten
- The right to be forgotten (Art. 17) especially difficult due to the immutable nature; the previously mentioned solution of the referenced encrypted database elsewhere than the blockchain might work because that can be deleted without deleting the blockchain reference to it
- Important to remember: Right to be forgotten is not an absolute right! Available technology and cost of implementation needs to be taken into account -> might mean the technical limitations of deletion in blockchains needs to be considered when judging if blockchains violate it
- Maybe deletion of the keys could account for this since the information would become inaccessible; or chameleon-hashes rewrite the content of blocks on a DLT by authorized authorities under specific constraints and with full transparency and accountability (unlikely)
- Controversial: Pruning (can be used to delete obsolete transactions in older blocks that are no longer necessary for the continuation of the chain)
- Hard forks cannot be viable GDPR compliance tool
- National implementations might be softer; see German framework that accepts limiting the processing where deletion isn't possible
Data protection by design + by default
- Art. 25 + 32; TOMs, encryption -> mostly fulfilled by DLTs except for the public keys and the immutability
- Safest advice for blockchain developers is that transactional data should never be stored on a blockchain
- Regarding public keys, the necessary risk-management solutions must be adopted and detailed Data Protection Impact Assessments must be carried out
- The GDPR cannot readily be transposed for decentralized and distributed databases as it was designed for centralized models of data collection
Reconciling Protection of Rights vs Promotion of Innovation
- Tension between Art. 8(1) Charter/Art. 16(1) TFEU and the Art. 173 TFEU; human rights vs. supporting innovation and the free movement of data
- Data protection is designed to "serve mankind" and not an absolute right but must be considered in relation to its function in society
- Rights to amendment and erasure cannot be easily applied to new technologies for data storage and processing; but new technologies do have some overlap with the goals of GDPR (giving data subjects more control over their data, for example)
- Regardless, GDPR is a technologically neutral legislation otherwise, so has potential to grow and adjust to different contexts and uses
- European Data Protection Supervisor recognizes advanced technologies increase the risk to privacy and data protection, but may also integrate technological solutions for better transparency and control for the persons whose data is processed’ -> fighting tech with tech? :/
De Filippi:
"[Decentralized structures] might turn out to be much more vulnerable to governmental or corporate surveillance than their centralized counterparts."