Earlier this week, COMMUNIA member Open Future, together with Mozilla, published a proposal for the implementation of the AI Act’s training data transparency requirement. The paper includes a blueprint for the transparency template for developers of GPAI models that the EU AI Office will need to develop over the next 12 months.
The AI Act creates a legal obligation for providers of general-purpose AI models to publish “a sufficiently detailed summary of the content used to train the general-purpose AI, in accordance with a template provided by the AI Office” (Article 53(1)d). The stated purpose of this obligation is to “facilitate the exercise and enforcement of rights by parties with legitimate interests, including copyright holders” (Recital 107).
As argued in the Open Future/Mozilla paper, the scope of this provision should extend beyond copyright and apply to a range of legitimate stakeholder concerns. The paper argues that in addition to copyright, privacy and data protection concerns, freedom of the arts and sciences, and fair competition are legitimate interests that should be addressed by the AI Act’s transparency obligation.
This position builds on what we argued in our Policy Paper #15 on using copyrighted works for teaching the machine, in which we emphasized that “the EU should enact a robust general transparency requirement for developers of generative AI models.
From a copyright perspective, creators have a legitimate interest in knowing whether their works have been used to train AI. They need to be able to verify a) whether their works have been lawfully accessed, and b) whether opt-outs under Article 4 of the DSM Directive have been respected.
But, as Open Future and Mozilla write, at a more general level, transparency of training data is an essential tool for ensuring accountability in AI development. The detailed blueprint proposed in the paper would go a long way toward effectively operationalizing the transparency requirement by collecting available information in a way that is useful to a wide range of stakeholders with varying levels of expertise, including but not limited to copyright holders.