The Limits of Copyright: Text and Data Mining

This post was originally published on the Creative Commons blog under CC BY 4.0.

This week is Copyright Week, a series of actions and discussions supporting key principles that should guide copyright policy. Every day this week, various groups are taking on different elements of the law, and addressing what’s at stake, and what we need to do to make sure that copyright promotes creativity and innovation.

Today’s topic is about supporting fair use, a legal doctrine in the United States and a few other countries that permits some uses of copyrighted works without the author’s permission for purposes such as parody, criticism, teaching, and news reporting. Fair use is an important check on the exclusive bundle of rights granted to authors under copyright law. Fair use is considered a “limitation and exception” to copyright.

One area of particular importance within limitations and exceptions to copyright is the practice of text and data mining. Text and data mining typically consists of computers analyzing huge amounts of text or data, and has the potential to unlock huge swaths of interesting connections between textual and other types of content. Understanding these new connections can enable new research capabilities that result in novel scholarly discoveries and critical scientific breakthroughs. Because of this, text and data mining is increasingly important for scholarly research.

Recently the United Kingdom enacted legislation specifically excepting noncommercial text and data mining from copyright. And as the European Commission conducts their review of EU copyright rules, some groups have called for the addition of a specific text and data mining exception. Copyright for Creativity’s manifesto, released Monday, urges the European Commission to add a new exception for text and data mining, in order to support new uses of technology and user needs.

Another view holds that text and data mining activities should be considered outside the purview of copyright altogether. Our response to the EU copyright consultation takes this approach, saying “if text and data mining would be authorized by a copyright exception, it would constitute a de facto recognition that text and data mining are not legitimate usages. We believe that mining texts and data for facts is an activity that is not and should not be protected by copyright and therefore introducing a legislative solution that takes the form of an exception should be avoided.” Similarly, there have been several actions advocating that “The right to read should be the right to mine.”

Whether text and data mining falls under a copyright exception or outside the scope of copyright, it is clearly an activity that should not be able to be controlled by the copyright owner. But unfortunately, that is exactly what some incumbent publishing gatekeepers are trying to do by setting up restrictive contractual agreements. One example of this practice is with the deployment of a set of “open access” licenses from the International Association of Scientific, Technical & Medical Publishers (STM), many of which attempt to restrict text and data mining of the licensed publications. In jurisdictions such as the United States, users do not need to ask permission (or be granted permission through a license) to conduct text and data mining because the activity either falls outside of the scope of copyright or is squarely covered by fair use.

Ensuring that licenses give copyright owners no more control over their content than they have under copyright law is a fundamental principle of Creative Commons licensing. That’s why the CC licenses explicitly state that they in no way restrict uses that are under a limitation or exception to copyright. This means that users do not have to comply with the license for uses of the material permitted by an applicable limitation or exception (such as fair use) or uses that are otherwise unrestricted by copyright law, such as text and data mining in many jurisdictions.

Today’s topic of fair use rights reminds us that “for copyright to achieve its purpose of encouraging creativity and innovation, it must preserve and promote ample breathing space for unexpected and innovative uses.” To liberate the massive potential for innovation made possible by existing and future types of text and data mining, we need user-focused copyright policy that enables these new activities.

 

Several men standing in a bull-fighting arena, one man on a horse
Featured Blog post:
A first look at the Spanish proposal to introduce ECL for AI training
Read more
Newer post
The Little Prince: almost in the Public Domain
January 23, 2015
Older post
Copyright 4 Creativity releases copyright manifesto
January 19, 2015