Taming the upload filters: Pre-flagging vs. match and flag

October 13, 2020

by Paul Keller

One of the most important elements of any implementation of Article 17 will be how platforms can reconcile the use of automated content filtering with the requirement not to prevent the availability of legitimate uploads. While most implementation proposals that we have seen so far are silent on this crucial question, both the German discussion proposal and the Commission’s consultation proposal contain specific mechanisms that are intended to ensure that automated content filters do not block legitimate uploads, and that uploads are subject to human review if they are not obviously/likely infringing.

In order to achieve this objective, the German discussion draft published in June relies on the idea of “pre-flagging”: users would be allowed to flag uploads containing third party works as legitimate. Platforms would then be prevented from automatically blocking pre-flagged uploads unless they determine that the flag is incorrect because the upload is “obviously infringing”.

By contrast, the Commission’s implementation guidance consultation proposes a “match-and-flag” mechanism: if upload filters detect the presence of a third party work in an upload and the use is not deemed to be “likely infringing”, then the uploader is notified and given the ability to state that the use is legitimate. If the user flags the upload as legitimate, the platform will have to initiate a human review of the upload, which remains available from the moment of upload until the review has been concluded. This type of mechanism was first suggested by a group of copyright academics in October of last year. It is also at the core of the proposal that we had presented during the last meeting of the stakeholder dialogue.

Both approaches provide a mechanism that limits the application of fully automated upload filters (while implicitly acknowledging the fact that many platforms will deploy upload filters). In the Commission’s proposal, filters are limited to making a pre-selection (“is the upload likely infringing?”); in the German proposal, they can only operate on unflagged content and to filter out “obviously incorrect” pre-flags.

Convergence on “match-and-flag”?

Both approaches have been criticised by rightholders, who claim that they undermine the “original objective of the directive” without providing alternative proposals on how automated filtering can be reconciled with the requirement not to block legitimate uploads. In addition, the German discussion proposal has also been criticised by platforms such as Google and Facebook. The platforms are arguing that giving users the ability to pre-flag every single upload would be impractical and would likely lead to substantial numbers of unnecessary (where the content in question is already licensed) or unjustified (users making excessive use of the pre-flagging tool) pre-flags, which would make such a system impractical to operate at scale.

Netzpolitik.org has now published a leak of a new version (“Referentenentwurf”) of the German implementation law proposal. This version abandons the pre-flagging mechanism and replaces it with a “match-and-flag” approach similar to what the Commission has proposed (it also closely resembles a suggestion made by Google in its response to the German Consultation). However, there are also important differences between the two proposals, and based on a closer analysis it is clear that the new German proposal offers considerably less protection against unjustified blocking or removal of uploads than either the initial pre-flagging approach or the approach proposed by the Commission. To understand why we need to look at the details of the proposed mechanisms.

Both approaches clearly assume that platforms are able to identify matches between uploads and works that rightholder have requested to be blocked in (near) real time. Both the Commission’s proposal and Article § 8 of the German Referentenentwurf assume that users can be notified of a match during the upload process and thus can prevent legitimate uploads from being blocked at upload. While some technology vendors claim to have the ability to reliably match content during the upload, it is currently unclear if the ability to match in (near) real time is widely available to all platforms.

Given the uncertainty about the availability of real-time matching solutions for all types and sizes of platforms, it must be ensured that the use of automated filters is not imposed de facto by national legislators if this could be disproportionate for smaller platforms. The New German proposal does seem to require the use of real-time filters which would make it incompatible with the proportionality requirements in Art 17(5).

The limits of “match-and-flag”

But even if we assume that platforms have the ability to match in real time during the upload, the approach still has limitations. The requirement to make best efforts to prevent the availability of works in Article 17(4)b does not apply only to new uploads: it also applies to uploads that are already on a platform. In situations where rightholders provide platforms with new blocking requests, the platforms will need to make best efforts to identify and remove them as well (this problem will be especially acute at the moment when the directive comes into force). Notifying the uploader of a match and giving her the possibility to flag the upload as legitimate does not offer the same protection here, because it cannot be assumed that the user has the ability to react immediately. This would mean that the upload in question would become unavailable until the uploader has had a chance to object.

This problem is much more pronounced in the new German proposal. The Commission’s proposal makes it clear that platforms are only allowed to automatically remove uploads if a match is “likely infringing”. This means that already uploaded works that do not meet this requirement cannot be removed until either the user has had a chance to react to a notification or until the platform has concluded a human review of the upload in question. The German proposal does not contain such a safeguard, as it requires the automated removal of uploads unless these have been flagged as legitimate during the upload.

This is regardless of whether the match is likely to be infringing or not. In situations where users cannot react to notifications right away, this will result in the removal of substantial amounts of legitimate uploads. Under the previous German pre-flagging mechanism this would not be an issue (with the exception of uploads already on the platform when the German implementation enters into force), because users would have had the ability to flag any legitimate upload as legitimate. The new German proposal only gives them the possibility to flag works as legitimate that are already on a blocklist at the moment of upload.

Towards a combined approach?

As long as this blindspot persists, the new German proposal does not adequately implement the requirement in Article 17(7) that the availability of uploads that do not infringe copyright must not be prevented by measures deployed to implement Article 17(4)b. To fix this, the German legislator should add, to the mechanism provided in §8 of the new proposal, the ability to flag any upload as legitimate after it has been uploaded and that flagged upload cannot be automatically blocked.

This combined approach would provide even stronger safeguards than the Commission’s proposal, which hinges on the idea that it is possible to automatically differentiate between likely infringing and likely legitimate content based on technical parameters.

As we have pointed out in our response to the Commission’s consultation, this approach, while viable in principle, is flawed as long as defining those technical parameters is left to platforms and rightholders without any involvement from users’ organisations. In addition, the proposed “likely infringing” standard does not set a high enough bar for preventing automated removal of potentially legitimate content. Instead, the “identical or equivalent” standard proposed in the academic statement that introduced the idea of “match-and-flag” should be a point of departure. In the case of time based media, this could be operationalised as matches that are at least 20 seconds long and where the match consists of at least 90% of the original work and at least 90% of the upload in question. In addition, matches of indivisible works (such as pictures) and short works (such as short poems) should never be assumed to be infringing, even when they correspond to 100% of an upload.

Meaningful protection for Public Domain and openly licensed works

A final advantage of such a combined approach is that it would also offer real protection from automated blocking for works that are in the public domain or available under open licenses. While such works are free to use for anybody, they are frequently blocked or removed as the result of wrongful ownership claims. In this situation it must be possible for anyone at any time to flag such works as being in the public domain or openly licensed. Given that this status will be the same across all (types of) platforms, such flags should not be recorded by individual platforms but in a public database that must be consulted by any system as part of assessing the status of an upload.

While it may make sense for platforms to use their own private databases when it comes to matching uploads to reference files of works to be blocked, the effective protection of public domain and openly licensed works requires a fully transparent public database that reflects their status as public goods. This public database must be consulted by any system as part of assessing the status of an upload and should be maintained by an independent trusted entity that also offers a conflict resolution mechanism for resolving conflicting claims.

Summary

At this stage, there seems to be some level of convergence towards “match-and-flag” mechanisms as a practical approach to reconciling 17(4) and 17(7). While still exhibiting shortcomings, such an approach would reflect the internal balance of Article 17 that the EU legislator arrived at. In how far a “match-and-flag” mechanism will be able to put this balance into practice depends on its practical implementation. As we have outlined above this means that:

There must be high thresholds to presume infringement and consequently permit fully automated blocking of uploads.
These thresholds should be based on fully transparent criteria, which users can challenge in court.
All matched uploads that do not meet these thresholds must be protected from blocking and flagged uploads must not be removed while under review by the platform.
In addition there must be the ability for anyone to pre-flag works that are in the Public Domain or available under an open license via a decentralised public database that must be consulted by any (automated) measures used to comply with Article 17(4).
National implementation must contain safeguards that ensure that already existing uploads cannot be blocked automatically.

Finally, it must also be ensured that the use of automated filters is not imposed by national legislators if this would be disproportionate for the platform in question.