Dr. Pradeep Mahapatra
Soon after the public launch of Artificial Intelligence powered Large Language Model product ChatGPt during November 30, 2022 controversy surfaced on unauthorised use of a large amount of content available online to train the software. The tussle prolonged throughout the following year 2023. Few authors and a photo agency approached court-of-law complaining infringement of copyright of their works. However, the lawsuit filed in Federal District Court in Manhattan by The New York Times in December 27, 2023 against Open AI, makers of ChatGPT and Microsoft, makers of Bing Chat over unauthorised use of published work to train LLMs (Large Language Models)
opened a new front in the legal battle.
Each LLM is a separate computer programme that understands and generates human-like text based on input it receives. LLMs are trained on vast amount of data to predict and create coherent sentences, making them capable of tasks like answering questions, writing and understanding language patterns. ChatGPT has been trained on a diverse dataset that includes text from multiple languages, not just English. This multilingual exposure allows the LLM to understand and generate text in various languages to some extent.
NYT is the first major American media establishment to sue makers of LLMs. It contends that millions of articles published in the newspaper appears to have been used to train the LLM software. ChatGPT and Bing Chat can produce content nearly identical to the newspapers published articles. NYT has massive investment in its journalism. It complained in its law suit that LLMs build substitutive products without permission or payment based upon its intellectual properties.
NYT has complained that general public increasingly approach the chatbots with prompts to learn about the current affairs. The LLMs generate answers that rely on past journalism by the newspaper used as training material for the software. As a result, traffic of readers to the website of the newspaper is in decline, because consumers are satisfied with the answers offered by Chatbots. In the process there are greater chances of decrease in visitors to the website which translate into restricted flow of advertisement and subscription revenue. In onehand, the newspaper looses copyright and on the otherhand confronts business loss.
In the law suit petition NYT has not mentioned any monetary compensation for copyright infringement from OpenAI or Microsoft. However claimed that the defendants should be held responsible for a huge amount in statutory and actual damage to the newspaper. It prayed the court to order for destroy LLM models which used copyright protected newspaper content for training. NYT approached OpenAI and Microsoft in last April 2023 to putforth its concerns about the use of its intellectual property and explore negotiations. But the effort reached no amicable resolution.
It is difficult to predict the future of the dispute as the litigation is in the early stage. The US legal experts comment that litigation on copyright may take long time. The ruling of the Federal District Court may be appealed and the appellate decisions may be challenged in the US Supreme court. Roughly it may take a time period of a decade. Generally disputes in copyright matters are solved with settlements. OpenAI has already reached data licensing agreement with the Associated Press and Axel Springer, Publishers of Politico and Business Insider.
Question arises, in case the copyright litigation by NYT against Open AI and Microsoft will prolong for 10 years, the suit may loose its relevance as advancements in the field of AI doubles within six months as per calculations made during 2023. AI technologies have expanded to nook and corners across geographical barriers and became an important part in day-to-day life of people. The concept of ‘copyright’ was developed as a business treaty during the expansion of printing technology and the very first resolution was adopted in 1710 in the Britain. However, the human civilization had undergone through earlier models of communication including gesture, language and literature thousands of years without copyright protection. So, is it appropriate to restrict the training of LLMs on online content and restrict the humankind from the wonderful benefits of generative AI ?
Generally one learns language by imitating others. LLMs get themselves trained with the available online content to learn human like language. Business deals and political equations are temporary manifestations. ChatGPT is a modern technological innovation. Since it is dedicated for public use, should the public cause be applicable for legal protection of the software ?
“A Large Language Model is a computer programme that understands and generates human-like text based on the input it receives. It does not have personal experiences, opinions, or intentions. The responsibility for its use lies with the users. If someone uses ChatGPT to generate content that is illegal, unethical, or infringes on someone else’s right, the responsibility would typically fall on the user rather than the model itself. ChatGPT as a tool or software developed by Open AI, and it does not have legal personality. Any actions or consequences resulting from the use of ChatGPT would typically be attributed to the individuals or entities using it, rather than the tool itself,” the LLM explains on prompts.
On the otherhand NYT has claimed in its petition before the court that in cases ChatGPT generates text almost similar to previously published copyright protected articles of the newspaper. To illustrate the charge attached a hundred examples. Experts in the field remark that empirical studies confirm that at times LLMs copy from the trained content. Question arises how can the consumers could know the extent of plagiarism in the content generated by LLMs ? Shall they be not entangled in legal consequences ? NYT law suit against Open AI and Microsoft on the matter copyright expected to open-up a new chapter in the AI landscape.
(English translation of the original Odia newsletter by the author circulated on January 12, 2024. https://tinyletter.com/pradeepmahapatra/letters/message-335. It is an open-access content, free for translation and reproduction)
Dr. Pradeep Mahapatra is a retired faculty of Journalism, Berhampur University, Odisha.https://about.me/pradeepmahapatra
Grynbanm, Michel M. & Ryan Mac. NYT sues Microsoft, Open AI over AI’s use of copyright work. Business Standard (Bhubaneswar Edition). December 20, 2023
Moreno, J. Edward. Boom in AI prompts a test of copyright law. Business Standard (Bhubaneswar Edition). January 1, 2024
Marcus, Gary & Raid Southon. Generative AI has a visual plagiarism problem. IEEE Spectrum. Jan 6, 2024.