European Parliament legal affairs committee pushes for strong exception for text and data mining

MEP Therese Comodini Cachia, Rapporteur for the European Parliament’s influential Committee on Legal Affairs (JURI), finally released the official version of its already-leaked draft opinion on the Commission’s Directive on Copyright in the Digital Single Market.

As we explained yesterday, Comodini’s draft misses the opportunity to introduce more forward-looking provisions that would strengthen the position of users such as a much-needed exception for user-generated content and freedom of panorama. At the same time, there are positive amendments, including the removal of the ill-advised ancillary right for press publishers.

The JURI draft amendments are quite positive with regard to the exception for text and data mining. The Commission’s original proposal limited the beneficiaries of the text and data mining exception only to research organisations, and only for purposes of scientific research. Comodini’s amendments would expand the TDM exception to apply to anyone for any purpose. In addition, it would mandate that publishers provide a mechanism for users who otherwise do not have legal access to the corpus of works to be able to engage in TDM on the publisher’s content, possibly after paying a fee to those publishers. Finally, the amendment would direct Member States to setup a secure facility to ensure accessibility and verifiability of research made possible through TDM.

TDM for anyone, for any purpose

While the Commission’s original proposal limited the beneficiaries of the text and data mining exception only to “research organisations” and only for purposes of scientific research”, Comodini’s amended text widens the exception by removing these specific references. Instead, TDM may be conducted by “a person who has lawful access to works and other subject matter provided that reproduction or extraction is used for the sole purpose of text and data mining.”

Contemplating other potential users

JURI’s draft opinion is even more expansive with regard to text and data mining because it contemplates a mechanism by which potential miners can get access to content for purposes of TDM if they don’t already have it through something like a university’s institutional subscription to scholarly journals.

“Member States shall provide for rightholders who market works or other subject-matter primarily for research purposes, to have an obligation to allow research organisations not having lawful access to those works or other subject-matter access to datasets that enable them to carry out only text and data mining.”

However, the amendment would also permit rights holders (such as publishers that hold large sets of texts) to charge for such services.

“Member States may also provide for rightholders to have a right to request compensation for meeting this obligation as long as that compensation is related to the cost of formatting these datasets.”

This amendment means that Member States can compel publishers to provide access to their contents to those research organisations that don’t already have access to it (probably via institutional subscriptions). This is an interesting feature, especially considering the amended recital text also clarifies that users who already have lawful access would not be required to provide compensation if they are willing to take the time to normalise the content to their particular TDM technology needs. This is a welcome addition, and the JURI draft opinion could be even stronger if it would open this avenue to anyone who wishes to take advantage of TDM (above and beyond the target audience of smaller research organisations).

Verification of results

The final portion of the amendment would direct Member States to setup a secure facility to ensure accessibility and verifiability of research made possible through TDM.

“Member States shall designate a facility to store datasets used in research by text and data mining technologies securely and to make such datasets accessible only for verification purposes.”

This is another interesting and useful mechanism that hasn’t yet been addressed by any of the other draft opinions. It would help in terms of long-term reproducibility of research and experiments conducted using TDM methods if the underlying corpus of content is available for future verification of results.

Several men standing in a bull-fighting arena, one man on a horse
Featured Blog post:
A first look at the Spanish proposal to introduce ECL for AI training
Read more
Newer post
Legal Affairs Committee Stops Short of Protecting User Rights
March 22, 2017
Older post
JURI rapporteur proposes to fix most egregious flaws of the copyright reform proposal
March 20, 2017