Two Books by Katsushika Hôtei Hokuga (cropped)

3rd draft of the GPAI Code of Practice: copyright transparency is unwanted, and it shows

On March 11 2025, the Chairs and the Vice-Chairs of the General Purpose AI Code of Practice presented the third draft of the Code (download as a PDF file). Similarly to previous drafting rounds, the release of the new version was followed by a series of working group meetings with the stakeholders that take part in the Code of Practice Plenary, who were also invited to provide written feedback. The third draft is the last drafting round subject to consultation — the fourth and final version of the Code is planned for May 2025.

The new version of the Code is a mixed bag. The revised copyright compliance obligations regarding lawful access and output similarity answer the user rights concerns raised by COMMUNIA. However, the copyright transparency obligations have been watered-down and the measures aimed at ensuring compliance with rights reservations are also disappointing. In this blogpost, we share some highlights from our response to the consultation (download our submission as a PDF file), as well as other stakeholders’ positions stated during the meeting. Our analysis of the first draft and the second draft is also available on our blog.

Measures affecting users rights

From a users rights’ perspective, the third draft of the Code contains significant improvements. The most notable change is the deletion of the model-level measures targeting output similarity. In the previous version, GPAI model providers were required to make best efforts to prevent copyright-related overfitting, in order to mitigate the risk that an AI system “generates copyright infringing output that is identical or recognisably similar to protected works used in the training stage”. Model providers are now only required to make reasonable efforts “to mitigate the risk that a model memorizes copyrighted training content to the extent that it repeatedly produces copyright-infringing outputs.” As we explained in previous blog posts, targeting output similarity carries significant risks to users rights and fundamental freedoms. This revision is therefore highly appreciated.

A further improvement to the measures aimed at preventing copyright-infringing outputs relates to open source AI. Open source AI model providers are now explicitly excluded from the commitment to prohibit copyright-infringing uses in their acceptable use policy, terms and conditions, or other equivalent documents. This is a welcome change since these providers would not be able to introduce those contractual limitations to the use of their models, or their licenses would no longer qualify as open source.

With regards to the input side of the equation, the most important change from a users rights perspective is that the proposed Code no longer requires GPAI model providers to commit to make “reasonable and proportionate efforts to ensure that they have lawful access to copyright-protected content”. Instead, it introduces a commitment to not circumvent effective technological measures as defined in Article 6(3) of Directive (EC) 2001/29. This is a welcome change since, as we noted before, the previous commitment standard risked creating an inconsistent and conflicting interpretation of what that lawfulness means in EU copyright law.

Compliance with rights reservation

Despite the above-mentioned improvements, the new draft text continues to fall short of expectations when it comes to introducing adequate measures to protect the rightholders’ opt-out rights. The revised measures regarding rights reservation compliance are far from satisfactory. Similarly to the approach followed in the previous drafts, the new version requires model providers to commit to employ crawlers that respect the Robot Exclusion Protocol (REP) and to make best efforts to comply with other rights reservation approaches.

While there are some attempts to future-proof the commitment to comply with other standards that may emerge over time, those still continue to be subject to a lower level of commitment than that required for the REP. Under measure I.2.3, paragraph (1)(b), Signatories commit to:

make best efforts to identify and comply with other appropriate machine-readable protocols to express rights reservations pursuant to Article 4(3) of Directive (EU) 2019/790, for example through asset-based or location-based metadata, that have either resulted from a cross-industry standard-setting process as referred to in paragraph 3 of this Measure or are state-of-the-art and widely adopted by rightsholders, considering different cultural sectors, and generally agreed through an inclusive process based on bona fide discussions to be facilitated at EU level with the involvement of rightsholders, AI providers and other relevant stakeholders as a more immediate solution, while anticipating the development of cross-industry standards referred in paragraph 3.

As discussed in previous blog posts, we believe that the Code should apply the same level of commitment to all standardized machine-readable means to express rights reservations, since the REP was never designed to opt-out from text and data mining, including from the use of protected works to train GPAI models. The Code should simply require Signatories to commit to comply with all standardised machine-readable means to express rights reservations. However, the chairs argue that the Code can only mandate compliance with rights reservations that are based on an international standard. In our opinion, this argument is unconvincing, since such a requirement can neither be derived from the relevant provisions contained in the CDSM Directive (which only requires machine readability) nor from the provisions in the AI Act (which refer back to the CDSM Directive).

Another problem with this group of measures aimed at ensuring compliance with rights reservations is that they are limited to training data that has been obtained by “crawling the World Wide Web.” As better explained here, this means that training data that is obtained via any other way is not covered by the Code. Since web crawling is just one of many data acquisition strategies for AI model developers, the final version of the Code should seek to broaden these measures beyond the web crawling context.

Copyright transparency

In addition to failing rightholders with regards to right reservation compliance measures, the third draft has also regressed on the topic of copyright transparency. Where before GPAI model providers committed to publicly disclose a summary of their internal copyright policy, in the new draft, they are only encouraged to publicize such information (see paragraph 2 of measure I.2.1). Similarly, the public disclosure measure regarding rights reservation compliance was replaced by a milder commitment (see paragraph 4 of measure I.2.3), which reads as follows:

Signatories will take reasonable measures to enable affected rightsholders to obtain information about the web crawlers employed and their robot.txt features and the measures that a Signatory adopts to identify and comply with rights reservations expressed pursuant to Article 4(3) of Directive (EU) 2019/790 at the time of crawling, for example by making public such information and syndicating a web feed that covers every update of the website informing about the rights reservation compliance.

The second draft’s approach to copyright transparency was far from satisfactory, but at least it protected some of the legitimate information needs of rightholders without creating an excessive burden on the model providers. The third draft exacerbates the concerns expressed by rightholders and civil society organisations that the Code will not serve as a vehicle to fulfil those needs.

COMMUNIA has been advocating for the introduction of public disclosure measures regarding AI training data for the past two years. In our view, copyright transparency is essential to uphold the EU copyright framework for AI training, which relies on creators being able to exercise and enforce their opt-out rights if they wish so. The Code was, therefore, seen as an opportunity to engage stakeholders in the design of reasonable and proportionate copyright transparency standards beyond the disclosure of training data.

While the AI Act only introduces one public disclosure measure in Article 53(1)(d) (the disclosure of a “sufficiently detailed summary” of the training data), the Code of Practice could, in our opinion, provide additional public disclosure commitments, as long as these contributed to the proper application of the AI Act, as foreseen in Article 56. This same understanding was initially shared by the drafters, who proposed in the first and second versions of the Code the introduction of various copyright transparency commitments related with the application of Article 53(1)(c) of the AI Act. However, seeing how these commitments have been diluted in the third draft, it is clear that AI companies do not agree with this point of view.

This tension is also visible in the results of the AI Office consultations on the outline of the template for a summary of the training data. The main take-aways of this consultation were presented at the last WG meeting and clearly show that GPAI providers have a view of transparency that is diametrically opposed to the position of civil society organisations and rightholders. For instance, the proposed template included a threshold-based reporting obligation, under which GPAI providers were not required to list all domain names and datasets used to train the model, but only a subset that exceeded a certain threshold (top 10% domains and “main/large” datasets). In the feedback provided to the AI Office, civil society organisations and rightholders shared the view that it was important for the interested parties to have access to a complete list of domain names and datasets used, whereas GPAI providers positioned themselves against the disclosure of any domain names scrapped and in favour of being granted discretion to determine whether a dataset qualified as “main/large.”

While the AI Office has the power to determine the contours of the template of the summary of the training data, the commitments proposed in the Code of Practice can only go as far as the GPAI providers are willing to sign on to, which does not augur well for the final formulation of the copyright transparency measures and, by extension, for the future of the EU opt-out system.

Two Books by Katsushika Hôtei Hokuga (cropped)
Featured Blog post:
3rd draft of the GPAI Code of Practice: copyright transparency is unwanted, and it shows
Read more
Newer post
SCCR/46: COMMUNIA Statement on Broadcasting Organizations
April 8, 2025
Older post
Is web scraping the only copyright concern for AI? The Code of Practice’s blind spot
March 21, 2025