AI Act: transparency is no threat to business secrets

January 30, 2024

As we approach the final decisions on the AI Act, with the member state vote scheduled for Friday of this week, there are still rumblings of discontent among member states. Surprisingly, one of the most prominent sticking points is the French government’s dissatisfaction with parts of the copyright transparency provisions in the final compromise text. According to Le Monde, the French government remains reluctant to introduce transparency obligations for GPAI models, noting that the “sufficiently detailed summary” of data used to train models violates the business secrets of model developers (such as French start-up darling Mistral). Instead of the current approach, which requires that the summary be made “publicly available,” France would like to see a third party responsible for centralising requests from rights holders to tech companies.

What may sound like a technicality is in fact a deeply flawed idea that would fundamentally change the nature of the text and undermine one of the key transparency measures in the text. As we have argued since the release of our Policy Paper #15 on Using Copyrighted Works for Teaching the Machine, the copyright transparency provision is important not only because it allows rights holders to understand if and how their works have been used to train AI models, but also because it ensures an absolute minimum level of transparency to the public about how those models have been trained. This is important for researchers, policymakers, and anyone trying to understand the behaviour of these models. Transparency of training data is essential to society’s ability to understand, scrutinise, and regulate AI models.

By limiting transparency to rights holders, the transparency provision would be reduced to a privilege for a single set of stakeholders who would do little more with it than use it as leverage in negotiations with AI developers.

The French proposal is even more problematic because it responds to a nonexistent problem. The issue of trade secrets has already been addressed in the text of the act, which makes it clear that the sufficiently detailed summary should be “comprehensive in its scope instead of technically detailed”. The relevant recital further indicates that this can be achieved by “listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used”.

In other words, the copyright transparency provision does not require model developers to publish their recipes (the Le Monde article quotes a French government source expressing concern about “recettes de fabrication”), but rather, in staying with the metaphor, a list of ingredients. Such regulation is not new and reflects existing practices. One example that comes to mind is the famously secret Coca Cola recipe, the secrecy of which can coexist perfectly with the fact that Coca Cola has to include a list of ingredients on every bottle it ships in the EU.

There is another reason why the French proposal to limit the transparency of training data is terrible: In an age where everyone who publishes on the internet is a rights holder whose work can potentially be incorporated into AI models that incorporate information from publicly available sources, the idea that the AI Office could mediate between rights holders and AI developers is simply impractical. In a situation where everyone can be a right holder, the only practical way to ensure transparency is to make the sufficiently detailed summaries publicly available, as agreed by the co-legislators in December.