Allegorie op de scheikunde

What the diary of Anne Frank can tell us about Text and Data mining

Recently, everybody has been busy discussing the question of whether the Diary of Anne Frank will enter (or by now, has entered) the public domain on January 1st this year (Answer: It’s complicated). Surprisingly, the discussions surrounding the copyright in Anne Frank’s writings may shed some light on another contentious copyright policy issue: text and data mining. These insights are the result of a recent ruling by the District Court of Amsterdam in dealing with a dispute between the Anne Frank Stichting (owner of the physical diaries and operator of the Anne Frank House in Amsterdam) and the Anne Frank Fonds (owner of the copyrights in Anne Frank’s writings).

The Anne Frank Stichting announced plans to publish an edition of Anne Frank’s texts online after the presumed expiration of the copyright on January 1, 2016. In response, the Anne Frank Fonds sued the Stichting over what it considered unauthorised reproductions of Anne Frank’s writings. The reproductions had been made by the Stichting as part of its preparatory research for the on-line publication after the new year. Initially, this seemed to be an attempt by the Fonds to thwart or delay the Stichting’s plans for an online edition.

However, during the course of the legal arguments it became clear that under Dutch law (which governs uses made by the Stichting), Anne Frank’s original writings would not enter the public domain in 2016. This is due to a transitional rule in the Dutch copyright act which states that works posthumously published before 1995 will retain copyright—in this case large parts of the original writings will only expire in 2037.

While this means that the Stichting had to shelve its plan to publish an online edition, the Fonds continued to press charges related to the reproductions (XML-TEI files) made by the Stichting in order to carry out its textual and historical research. The Stichting was sued alongside their research partner the Dutch Royal Academy of Science (KNAW). Both upheld the position that it did not require permission for making reproductions solely intended to enable its internal scholarship, claiming that copyright law should not be used to thwart scientific research.

On the 23rd of December the District Court of Amsterdam handed down its ruling in the case. After establishing that the writings of Anne Frank are indeed protected by copyright (and, in the Netherlands, will continue to be protected for the foreseeable future), the court also ruled on the legality of the research reproductions made by the Stichting.

While the court dismissed arguments that the creation and use of these reproductions were covered by a number of exceptions and limitations to copyright, the court did agree with the claim that the requirement to obtain permission from the rights holder for making such copies is in in conflict with the freedom of scientific research as established by article 13 of the Charter of the Fundamental Rights of the European Union. In its ruling (Dutch, translation mine) the district court argues (emphasis added):

From [the previous arguments] it follows that the creation of the XML-TEI file that has been made available to third parties constitutes an infringement of the copyright held by the [Anne Frank] Fonds. It needs to be judged if the circumstances of this particular act provide a reason to reject the demands made by the Fonds because this would, in the light of the principle of proportionality, put unreasonable restrictions on the freedom of scientific research. […]

It goes without saying that in order to carry out proper textual scientific research the researchers must have access to some copies of the texts that are being researched. Without these reproductions it is impossible to access the source materials which makes the research virtually impossible. This includes the XML-TEI file produced by the Huygens ING [Institute]. After all, this file has been created […] for the sole purpose to carry out scientific research.

The Fonds has only broadly stated that it does not have to tolerate everything that happens with the texts. Insofar as the Fonds tries to obtain control over what research should take place or not, this is not a right that is protected by copyright.

It is also clear that the infringement of the copyright of the Fonds taking place as part of the research does not extend beyond the provision of only a few reproductions of the works, and to a limited number of researchers directly involved in the research. The copyright infringement thus has minimal impact.

Under these circumstances, the court concludes that enforcement of the copyright by the Fonds is subordinate to the fundamental right of the Stichting et al on her freedom of scientific research.

Anyone who is familiar with the current discussion about the copyright status of text and data mining will quickly recognize that this case—which started as a dispute about the length of copyright protection—offers some valuable insights into the legal status of text and data mining in Europe.

The actions of the Stichting and the KNAW (creating a machine readable version of the text (the XML-TEI file)) are an excellent example of text mining. Research organisations and research libraries have long claimed that the making of reproductions of works that happens as part of the process of text and data mining should not require permission from the rights holders as long as the researchers have legal access to the works in question.

The court supports this line of reasoning by recognizing that requiring permission from the rights holders before machine readable reproductions can be made would make TDM-based research ‘virtually impossible’. In addition, by tying the issue of text and data mining to the freedom of scientific research, the court provides a strong normative justification for the rationale that TDM should not require the permission from rights holders.

Since text and data mining is one of the issues that will be dealt with during the upcoming modernization of the EU copyright rules, we hope that European lawmakers will pay close attention to the reasoning of the court in this case. As the Anne Frank Stichting and the KNAW rightly point out, copyright law should not be used to thwart scientific research. Such an outcome is unfortunately a very real danger, given the approach presented by the European Commission in December.

Several men standing in a bull-fighting arena, one man on a horse
Featured Blog post:
A first look at the Spanish proposal to introduce ECL for AI training
Read more
Newer post
COMMUNIA policy paper on leveraging copyright in support of education
January 11, 2016
Older post
COMMUNIA response to the EU consultation on online platforms
January 7, 2016