Machine Learning & EU Data Sharing Practices

Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 1/2020

New multidisciplinary research article: ‘Machine Learning & EU Data Sharing Practices’.

In short, the article connects the dots between intellectual property (IP) on data, data ownership and data protection (GDPR and FFD), in an easy to understand manner. It also provides AI and Data policy and regulatory recommendations to the EU legislature.

As we all know, machine learning & data science can help accelerate many aspects of the development of drugs, antibody prophylaxis, serology tests and vaccines.

Supervised machine learning needs annotated training datasets

Data sharing is a prerequisite for a successful Transatlantic AI ecosystem. Hand-labelled, annotated training datasets (corpora) are a sine qua non for supervised machine learning. But what about intellectual property (IP) and data protection?

Data that represent IP subject matter are protected by IP rights. Unlicensed (or uncleared) use of machine learning input data potentially results in an avalanche of copyright (reproduction right) and database right (extraction right) infringements. The article offers three solutions that address the input (training) data copyright clearance problem and create breathing room for AI developers.

The article contends that introducing an absolute data property right or a (neighbouring) data producer right for augmented machine learning training corpora or other classes of data is not opportune.

Legal reform and data-driven economy

In an era of exponential innovation, it is urgent and opportune that both the TSD, the CDSM and the DD shall be reformed by the EU Commission with the data-driven economy in mind.

Freedom of expression and information, public domain, competition law

Implementing a sui generis system of protection for AI-generated Creations & Inventions is -in most industrial sectors- not necessary since machines do not need incentives to create or invent. Where incentives are needed, IP alternatives exist. Autonomously generated non-personal data should fall into the public domain. The article argues that strengthening and articulation of competition law is more opportune than extending IP rights.

Data protection and privacy

More and more datasets consist of both personal and non-personal machine generated data. Both the General Data Protection Regulation (GDPR) and the Regulation on the free flow of non-personal data (FFD) apply to these ‘mixed datasets’.

Besides the legal dimensions, the article describes the technical dimensions of data in machine learning and federated learning.

Modalities of future AI-regulation

Society should actively shape technology for good. The alternative is that other societies, with different social norms and democratic standards, impose their values on us through the design of their technology. With built-in public values, including Privacy by Design that safeguards data protection, data security and data access rights, the federated learning model is consistent with Human-Centered AI and the European Trustworthy AI paradigm.

Read More