How Filters fail (to meet the requirements of the DSM directive)

June 12, 2020

by Paul Keller

Article 17 of the DSM directive establishes that Online Content Sharing Service Providers (OCSSPs) are liable for copyright infringing uploads by their users unless they either obtain a license for the use of such content, or take a number of measures designed to prevent the availability of such content on their platforms. While the directive never explicitly talks about filters or automated content recognition (ACR) systems, it is assumed by all sides of the debate that, in order to meet this obligation, platforms have little choice but to implement ACR-based filtering systems that will scan all user uploads and block or remove uploads that contain works that have been flagged by their rightholders.

This de-facto requirement to implement upload filters is – by far – the most controversial aspect of the entire copyright directive and it continues to dominate the discussions about the implementation of Article 17 into national legislation.

In this context, it is important to remember that the use of such filters is not new and that their functioning can already be observed in practice. What is new, however, is the de-facto requirement for OCSSPs to implement filters as well as a number of requirements that OCSSPs need to meet to ensure that any measures (including filters) implemented by them are not infringing on the rights of users. This includes the requirement that any such measures “shall not result in the prevention of the availability of works or other subject matter uploaded by users, which do not infringe copyright and related rights, including where such works or other subject matter are covered by an exception or limitation“.

In other words, one of the most important contributions of the DSM directive is that, for the first time, it establishes conditions that need to be met by automated upload filters.

As we have argued many times before, these conditions present a very high hurdle for any technological solution to clear. The fact that upload filters are incapable of determining if a particular use of a copyrighted work is infringing or not has been established beyond any doubt. But that does not mean that the failure to assess the context is the only way that filters based on automated content recognition fail to meet the requirements established by the directive. In total there are at least three distinct ways how filters fail.

In the remainder of this post we will discuss these three failure modes based on examples collected by Techdirt in the course of a single week: removals caused by incorrect rights information, removals caused by the inability to recognise legitimate uses, and removals caused by the inability to accurately identify works.

Incorrect rights information

Incorrect rights information is probably the most common and best documented cause for the unjustified removal (or demonetisation) of works on YouTube.

ACR systems execute actions specified by whoever is recognised as the owner of a work. For the purposes of the ACR systems, the owner of a work is whoever claims to be the owner of the work and, unless there are conflicting ownership claims, there is no way to check the accuracy of such claims as there are no authoritative databases of ownership rights. As a result it is possible to claim ownership in public domain works (which no-one owns), in works that have been freely or widely licensed by their owners, or for any copyrighted work that has not already been claimed by someone else.

Last week, Techdirt reported that both NBC Universal and National Geographic made wrongful ownership claims for NASA owned footage of the launch of the SpaceX Crew Dragon, which resulted in the removal and blocking of clips and streams by anyone using the same freely available footage. Other well-documented examples of this failure mode include the takedown of a freely licensed animation movie by Sony and takedowns of recordings of white noise.

Note that the culprits tend to be large media companies. This is because YouTube currently limits access to Content ID to large rightholders (ironically because they consider them especially “trustworthy”). Once Article 17 is in place, platforms will need to cooperate with all rightholders, and it is safe to assume that this will lead to a substantial increase of this type of filter fail.

Inability to recognise legitimate uses

From the perspective of user rights, including freedom of expression, the inability of filters to recognise legitimate uses of a copyright work may be the most troublesome. ACR technology has been designed to recognise (parts of) works by matching them to reference files. It has not been designed to assess if a given use is legitimate or not. This means that, by design, automated filters cannot meet the requirement to “in no way affect legitimate uses”.

Again, last week’s Techdirt coverage provides a canonical example in which both Facebook and YouTube have blocked reporting from an anti-racism protest in the US because one of the recorded interviews contained music by 2Pac and Marvin Gaye playing in the background. Censoring of political speech because of copyrighted material playing in the background is exactly the type of scenario that the safeguards included in the DSM directive have been designed to prevent. As long as filters fail to meet this obligation, automated removal / blocking actions will need to be subject to human oversight in all but the most unambiguous cases of infringement.

Inability to accurately identify works

The third and last failure mode is the only one in which filters fail to do what they have been designed to do and where mistakes are being made on the level of the content recognition technology (in the first two failure modes, content is recognised correctly and the mistakes happen at other levels). The so-called “false positives” happen when (parts of) uploaded content are erroneously identified as matching a protected work. If we are to believe the vendors of ACR systems, such “false positives” are tiny fractions of all content matches (less than 1%), but given the massive amounts of content being uploaded to OCSSPs even the smallest fraction quickly adds up here.

False positives generally occur when the uploaded content is very similar to the works against which uploads are being matched. A common example are recordings of classical music where the underlying work is in the public domain and the differences between different performances are too nuanced for filters to tell the difference. Techdirt provides us with another example from last week. In this case the Guinness book of records claimed to own the video footage of a record-setting Super Mario brothers speedrun. This claim did not only result in the original video (uploaded by the speedrunner) being taken down (another example of the first failure mode discussed above), but also videos of lots of other speedruns. Again the problem here is similarity: as one of the affected speedrunners points out, “speedrunning is such a methodical art that, by nature, a lot of speedrun footage looks very, very similar”.

In other words, automated content recognition can fail even at the tasks for which it has been designed.

Failure is not an option

Given that the directive explicitly requires that measures taken by rightholders and OCCSPs must “in no way affect legitimate uses”, these structural shortcomings of ACR technologies and the filtering systems built on top of them can only be used by OCSSPs to comply with their obligations under Article 17 under very limited circumstances. Member States implementing the directive should include the following safeguards in their national laws:

In order to prevent the first type of failures, implementations of the directive must establish transparency requirements for rights claims and must include sanctions for making incorrect claims of ownership that result in unjustified blocking or removal. Any implementation without sanctions will incentivise widespread abuse of the measures OCSSPs need to implement.

In order to prevent the second type of failure from negatively affecting users’ rights, implementations of the directive should limit automated blocking or takedown actions to matches where the use is evidently infringing (i.e. where a upload contains a work or substantial parts of a work without any modifications or additional context). In all other cases there must be the possibility for the user to override a blocking or removal action and refer the dispute to the OCSSP for human review.

And in order to minimise the impact on users’ rights by the third type of failure, implementations of the directive must fully implement the complaint and redress mechanisms and ensure that, by default, user uploads remain available while a dispute is under review by the OCSSP.

As we have argued in our recent input paper for the European Commission’s stakeholder dialogue, these safeguards should also be included in the guidelines to be issued by the Commission later this year.