Cropped etching of an old woman standing next to a doctor holding a liquid in a glass sphere in the light by Quirin Boel.

A proposal for the implementation of the AI Act’s training data transparency requirement

Earlier this week, COMMUNIA member Open Future, together with Mozilla, published a proposal for the implementation of the AI Act’s training data transparency requirement. The paper includes a blueprint for the transparency template for developers of GPAI models that the EU AI Office will need to develop over the next 12 months.

The AI Act creates a legal obligation for providers of general-purpose AI models to publish “a sufficiently detailed summary of the content used to train the general-purpose AI, in accordance with a template provided by the AI Office” (Article 53(1)d). The stated purpose of this obligation is to “facilitate the exercise and enforcement of rights by parties with legitimate interests, including copyright holders” (Recital 107).

As argued in the Open Future/Mozilla paper, the scope of this provision should extend beyond copyright and apply to a range of legitimate stakeholder concerns. The paper argues that in addition to copyright, privacy and data protection concerns, freedom of the arts and sciences, and fair competition are legitimate interests that should be addressed by the AI Act’s transparency obligation.

This position builds on what we argued in our Policy Paper #15 on using copyrighted works for teaching the machine, in which we emphasized that “the EU should enact a robust general transparency requirement for developers of generative AI models.

From a copyright perspective, creators have a legitimate interest in knowing whether their works have been used to train AI. They need to be able to verify a) whether their works have been lawfully accessed, and b) whether opt-outs under Article 4 of the DSM Directive have been respected.

But, as Open Future and Mozilla write, at a more general level, transparency of training data is an essential tool for ensuring accountability in AI development. The detailed blueprint proposed in the paper would go a long way toward effectively operationalizing the transparency requirement by collecting available information in a way that is useful to a wide range of stakeholders with varying levels of expertise, including but not limited to copyright holders.

Several men standing in a bull-fighting arena, one man on a horse
Featured Blog post:
A first look at the Spanish proposal to introduce ECL for AI training
Read more
Newer post
Video recording of COMMUNIA Salon: The hollowing of the Public Domain
June 20, 2024
Older post
New policy paper on Public Domain heritage
June 17, 2024