Yarmill: How to get legal-safe data for AI assistant

Pavel Čech 07.02.2025

Yarmill is an app developed in the Czech Republic - a professional training diary and data platform for sports associations and teams. In order to increase the competitiveness of Czech athletes, our client Yarmill embarked on a joint project with the Czech Olympic Committee (COC) and started working on the creation of a large professional AI library. From the beginning Yarmill struggled with the complexities of issues.

How did we help him put this complex issue on a firm legal footing?

undefined

Problem: How and where to get legal data?

The AI tool will work with data in the form of professional publications and other sports-related material from Yarmill users.

The primary goal of the AI tool is to provide answers to specific questions in the field of elite sport - using the RAG method, embedded data and publications, and involving global LLM AI.

"The client was faced with the problem of how to obtain permission to upload third-party scientific articles, scientific publications and other information focused on sports, training, sports nutrition and other sports-related topics to the AI tool," explains Tereza Formanová, an expert in intellectual property law.

Tereza Formanová, an expert in intellectual property law.

Challenge 1: Verification of provided industry data for AI 

First of all, we had to verify the data provided to make that Yarmill could handle it when uploading it to the AI assistant, thus allowing the Czech Olympic Committee and the sports associations cooperating with the COC to use the AI tool to answer questions.

Challenge 2: Data acquisition and upload by users

At the same time, it has proven difficult to ensure that the data provided to users is legal. It would be difficult to check for each publication how individual users obtained it, and whether they have the authority to provide it to Yarmill to populate the AI library with it.

"When we first fully realized what a labyrinth the whole copyright issue is (from the perspective of users, authors, AI processing, research, etc.), it seemed like it had no solution. It completely ruined our mood and motivation. We didn't want to concede to the game of some apps that operate under conditions on the edge of legality. It sucked. We wanted to stop the project," recalls Tomas Pošepný, co-founder and CEO of Yarmill.

A comprehensive and secure solution in three steps

Our goal was to advise the client and, as a next step, to ensure that the Yarmill WWTP project was built on an unshakeable legal foundation.

Representatives of the COC and Yarmill agreed on the need to anchor the legal-safe cooperation and find a long-term safe solution. The primary responsibility lays with our client, so it was necessary to legally handle the copyright in several steps.

undefined1. Setting up transparent cooperation with the COC and others

In the first step we started to negotiate the terms between the client and the Czech Olympic Committee. We set up a clear cooperation and the result is a comprehensive contractual documentation (e.g. AI assistant development contract and Yarmill standard application use agreement), which treats both the relationship with the COC and Yarmill customers (sports associations), as well as relationships with authors of publications and users who upload documents to the application for AI analysis.

  1. Extensive analysis of the possibilities of using industry data

In the second step, we performed a comprehensive analysis of the possibilities of using the professional publications that users will upload to the app. The key question was to assess whether the such use of publications is legal at all. Do users have the right to upload publications to the app? And how can Yarmill dispose of them? We had to consider the so-called statutory licenses under the Copyright Act (these are exceptions where the author's consent is not required for certain uses of copyright works).

In a complex analysis, we also looked at whether the RAG (Retrieval Augmented Generation) method that Yarmill needs to use to analyse peer-reviewed publications fulfils the criteria for any of the statutory licences. We have examined all the technical processes, applied the effective legislation and considered the possible impact of recent court decisions in some EU Member States.

We also had to minimize the risk of our client's liability for copyright infringement in case the user provides the application with pirated data.

"The crucial assessment was whether the user gets a legal copy of the publication (buys an e-book) and whether he can provide it to train the AI assistant. Because the fact that I have an e-

book, does not necessarily automatically mean that I can upload it to any AI tool and extract data from it for my own commercial product," says Pavel Čech, an attorney and expert in IT and intellectual property law.
 

Although the Copyright Act allows data mining for AI training, it sets specific conditions for it.

We have found that we cannot use the general statutory licence for automated analysis in the private sector because the processing of publications by the client's AI tool does not meet all the statutory conditions. Another possible legal licence for scientific research, which could be used by the Czech Olympic Academy as part of the COC, was again limited by the prohibition on commercial use. And because Yarmill could not rule out future commercial intention, we have prepared the concept of obtaining rights directly from authors or beneficiaries (e.g. heirs, employers of authors, publishers). At the same time, Yarmill can use the works under selected Creative Commons licenses that allow the use of publications to the extent necessary for the AI tool.

"What has been created is great. 't believe. Pavel, Terka and a few others did an incredible job, they studied the situation, foreign case law, the specifics of the project, the Czech Olympic Committee, big language models and existing AI applications. They revealed to us the complete space of possibilities and legal exceptions to the variants of the establishment of the institute libraries. Everything is structured and very well managed analytically," says Tomáš Pošepný about the collaboration.

3. Setting conditions for uploading data

Finally, the conditions for uploading publications to the user needed to be set correctly so Yarmill could get the necessary processing rights. This included preparing license consent from authors or other rights holders of publications, preparing terms and conditions for uploading and using professional publications, and modifying and supplementing the EULA -  license agreement. Finally, we translated all the documentation into English in order to make it possible to get foreign publications, and we have also tweaked the for the checkboxes in the application.

undefined

Conclusion: Minimising risks in data handling 

Thanks to our cooperation, Yarmill has gained legal security for the use of publications by AI assistants. This is because we minimized the risks associated with illegal handling of publications, including allegations of copyright infringement and the exclusion of even the mere suspicion of possible infringement by the authors of the publications used.

"They guided us to the final solution, combed it, refined it, including the auxiliary explanatory materials for users and in-app texts. I'm very happy that they found such a final legal framework for us that we ended up with the whole module built, launched and which allows us to develop the project in further phases," adds Tomáš Pošepný.

Thanks to a precise legal framework and clearly defined rules, Yarmill can build a professional AI library for Czech professional sport and, together with the COC, offer athletes an innovative solution that will make them internationally competitive.

undefined

Yarmill: Training diary for pro sports

Yarmill is a Czech sportstech startup and a group of people with a unique symbiosis of data analytics know-how and professional sports. Under their hands, they are creating a sports data warehouse that allows coaches and athletes to plan their season, record race and test results, evaluate the success of training or even integrate data from smartwatches, rings and other devices. The app is tailored to specific needs of individual sports associations and sports - these include swimming, skiing, climbing, biathlon and more than 30 other organisations from around the world.

The new AI assistant is being developed in cooperation with the Czech Olympic Committee as part of the Scientific Research Support Programme for the Czech National Team* to achieve better performances at the most important competitions. Its aim will be to provide expert and easy-to-understand answers to specific questions in the field of elite sport.

-
* Programme of Scientific and Research Support of the Czech Representation (application registration number OH2024-00001).

Napište nám
Who will take care of you
Pavel Čech
PartnerLegal assistance to start-upsIntellectual Property LawSoftware law
More services in this area