On December 19 2024, the European AI Office published a second draft (download as a PDF file) of the General Purpose AI Code of Practice. This is the second of a series of drafting rounds planned until April 2025. An analysis of the first draft is also available on our blog.
Following the release of this new version, the AI Office organised working group meetings with the stakeholders that participate in the Code of Practice Plenary. Stakeholders were also invited to provide written feedback on specific sections of the draft Code, according to the working groups they take part in. COMMUNIA takes part in the Working Group 1 on transparency and copyright-related rules (WG1) and was granted a speaking slot in the meeting of that group. Our intervention at the meeting and our written submission both focused on the copyright-related commitments applicable to GPAI model and system providers (download as a PDF file).
From a users rights and fundamental rights perspective, the copyright section has a few improvements. However, the scope of some measures is still unduly broad. In this blogpost, we highlight some of our concerns with the second draft of the Code, as well as other insights into the positions expressed by the chairs and other stakeholders during the meeting.
Reasonable and proportionate efforts to ensure lawful access
One of the measures that raised more opposition in this second round of discussions was measure 2.4 (“Ensure lawful access to copyright-protected content”), which reads as follows:
If Signatories engage in text and data mining according to Article 2(2) of Directive (EU) 2019/790 for the training of their general-purpose AI models, they commit to making reasonable and proportionate efforts to ensure that they have lawful access to copyright-protected content in accordance with Article 4(1) of Directive (EU) 2019/790. (emphasis added)
Measure 2.4 does not simply re-state Article 4(1) of the CDSM Directive. The CDSM Directive allows the reproduction and extraction of “lawfully accessible works and other subject matter”, whereas the proposed measure would require GPAI model providers to commit to make “reasonable and proportionate efforts to ensure that they have lawful access”.
At the WG1 meeting, many right holders, GPAI model providers and civil society organisations questioned whether the measure was compliant with the CDSM Directive. Rightholders took the view that introducing a “reasonable and proportionate efforts” standard would lower the scope of protection offered to rightholders, whereas other stakeholders questioned whether it would reduce the scope of protection offered to the beneficiaries of the exception.
Arguably, the condition imposed by the CDSM Directive can be satisfied simply by not employing illegal means to access the content, whereas the proposed measure introduces a new requirement to proactively ensure that one has lawful access to such content. This would result in a more restrictive framework.
“Lawfully accessible” is a special condition for the enjoyment of the general purpose TDM exception and introducing a commitment standard risks creating an inconsistent and conflicting interpretation of what that open-ended term means. As an autonomous concept of EU law, the contours of this term can only be defined by the Court of Justice. The AI Office has a mandate to draw up codes that contribute to the proper application of the AI Act. Introducing interpretative elements to EU copyright law clearly pushes beyond the proper application of the AI Act. These elements should therefore be deleted.
Compliance with rights reservation
Measures 2.6 and 2.7 lay down the commitments that GPAI model providers are expected to adopt to ensure that they recognise machine-readable identifiers used to opt-out from AI training. Similarly to the approach followed in the first draft, the second draft requires model providers to commit to employ crawlers that respect the Robot Exclusion Protocol (REP) and to make best efforts to comply with other rights reservation approaches.
Despite the strong criticism that this framework received from rights holders and civil society organisations (COMMUNIA included), the chair defends that the REP is currently the only technical solution used at a large scale and, thus, the only full commitment that is possible at the present moment. We believe that a different approach is possible to future-proof these measures.
The relevant stakeholders have not yet agreed on rights reservation standards. In fact, measure 2.7 encourages Signatories to engage in those standardisation efforts. Given that the REP has a number of conceptual shortcomings that make it unsuitable as an expression of rights reservations (e.g. REP has limited usefulness for types of content that are not predominantly distributed via the open internet, such as music and AV content), it is to be expected that other standards may soon emerge that are more effective at expressing rights reservations.
Under the current draft, any standards other than the REP will automatically be subject to a lower level of commitment than that required for the REP. While it may be sensible at this time to introduce the minimum requirement defined by measure 2.6, the Code must also require GPAI model providers to commit to comply with all standardised machine-readable means to express rights reservations that may emerge over time. In order to provide further legal certainty,it should be considered attributing to the AI Office the responsibility for maintaining an up-to-date list of such standards.
Copyright compliance obligations
The first draft of the Code required GPAI model providers to commit to pass onto downstream system providers certain copyright compliance obligations, including the introduction of system-level measures to prevent output similarity. We heavily criticised that provision, since those system-level measures would effectively require the use of output filters, threatening users rights and fundamental freedoms. In this second iteration, the draft Code no longer requires GPAI model providers to introduce such measures. Yet measures 2.9 and 2.10 still target copyright infringement at the output level.
Let’s start by looking at measure 2.9. (“Prevent copyright-related overfitting”), which reads as follows:
Signatories that train a generative general-purpose AI model that will allow for the flexible generation of content, such as in the form of text, audio, images or video, commit to making best efforts to prevent an overfitting of their general-purpose AI model in order to mitigate the risk that a downstream AI system, into which the general-purpose AI model is integrated, generates copyright infringing output that is identical or recognisably similar to protected works used in the training stage. This commitment applies irrespective of whether a Signatory vertically integrates the model into its own AI system(s) or whether the model is provided to another entity based on contractual relations. (emphasis added)
Due to the intricate language used, it is not clear if this model-level measure is targeting a) similar outputs on the condition that they are infringing on copyright or b) any similar outputs, incorrectly assuming that they are necessarily copyright-infringing outputs.
We stress that a similar output can only be qualified as an infringing output if 1) the output triggers the application of copyright law, which is not always the case (e.g. stylistic similarity has no copyright relevance, since artistic style is not protected), 2) the copyright-relevant output does not qualify as an independent similar creation and 3) no copyright exception or limitation (e.g. quotation, caricature, parody and pastiche) applies to the similar output.
Although model-level measures to prevent output similarity do not entail the same risks to end users as system-level measures, they may still prevent the lawful development of models that could support substantial legitimate uses. As highlighted by some participants during the WG1 meeting, so-called “copyright-related overfitting” may actually be needed in certain circumstances in order for the model to fulfil its purpose.
In addition to requiring GPAI model providers to prevent copyright-related overfitting, the second draft contains a new commitment aimed at prohibiting copyright-infringing uses of the model. Measure 2.10 reads as follows:
In order to further mitigate the risk that a downstream AI system, into which a generative general-purpose AI model is integrated, generates copyright infringing output, Signatories commit to prohibiting copyright-infringing uses of their model in their acceptable use policy, terms and conditions, or other equivalent documents.
Like measure 2.9, this measure goes beyond the scope of protection of Article 53(1)(c) of the AI Act, which targets copyright infringement at the input level. These measures should therefore be removed.
Other measures
The second draft of the Code requires GPAI model providers to commit to include in their internal copyright policy information that is essential to facilitate parties with legitimate interests, including rightholders, to exercise and enforce their rights under Union law. However, unless this information is publicly disclosed, these measures will not be meaningful for those parties. While due consideration should be taken of the need to protect trade secrets and confidential business information, the Code must ensure that the model providers commit to include in the publicly available summary of the copyright policy, at least, the following elements: 1) list of so-called “piracy websites” excluded from crawling, 2) list of all crawlers deployed by the model providers, 3) list of other solutions for expressions of rights reservations honoured by the model providers including information on the period of time as of which these solutions have been honoured.
Despite its shortcomings, the second draft is, in many respects, a better version of the Code than the version delivered in the first drafting round. Some improvements include the introduction of distinct due diligence obligations concerning private, non-publicly accessible datasets, on the one hand, and publicly accessible datasets, on the other; a commitment to exclude from crawling activities only websites that are “widely known” for making available to the public copyright-infringing content “on a commercial scale” and that “have no substantial legitimate uses”; and the deletion of the reference to the Commission Counterfeit and Piracy Watch List.