UK – Data mining: Government backs creative industries over AI developers

On 1 February 2023, a UK Government Minister confirmed that a recent proposal to broaden existing copyright exceptions to allow commercial text and data mining will not go ahead. Text and data mining is often used in training AI systems (such as the chatbot ChatGPT), and the proposed copyright exception was part of a plan to incentivise investment in AI in the UK. However, the proposal received a backlash from the creative industries and has now been scrapped. This development will be welcomed by content creators but leaves unanswered questions about the legality of many AI systems.


For the last few years, the UK Government has been keen to incentivise investment in AI in the UK. In 2021, the National AI Strategy was published, with the express aim of making the UK “a global AI superpower’’. Against that backdrop and following two public consultations, in June 2022, the UK Intellectual Property Office (UKIPO) announced a proposal to broaden existing copyright exceptions to permit text and data mining for any purpose.

“Text and data mining” means using computational techniques to analyse existing digital works for the purpose of identifying patterns, trends and other useful information. These methods are often significant in AI use and development, as mined datasets are widely used to train AI systems, e.g. machine learning solutions that rely on access to high quality data to learn.

However, data mining methods typically involve copying the underlying works as part of the data extraction process and can therefore constitute copyright and/or database right infringement. Under the existing UK regime, text and data mining can only legitimately be performed on third party materials for non-commercial research[1] or with the permission of the rightsholder, e.g. under an open access scheme or bespoke licence.

Under the UKIPO’s proposal, text and data mining would have been permitted for any purpose. Rightsholders would not have been able to charge for licences for text and data mining and would have had no option to contract or opt-out. However, rightsholders would still have been able to choose whether and how to give access to their datasets.

The UKIPO’s proposal was welcomed by those involved in AI and other research and development activities in the UK. However, it also received swift and forceful criticism from traditional content creators on the basis that it would undermine licensing opportunities and drastically weaken copyright. It was also described as a “landgrab” feeding Whitehall’s “obsession with AI”.

The latest Government announcement

On 1 February 2023, George Freeman MP, the Minister for Science, Research and Innovation, confirmed in a House of Commons debate that the proposal would not proceed. He said:

Although the Government need to be on the front foot in anticipating the regulatory framework and getting it right, the proposals have clearly elicited a response that we did not hear when they were being drafted… I have made it clear that we do not want to proceed with the original proposals. We will engage seriously, cross-party and with the industry, through the IPO, to ensure that we can, when needed, frame proposals that will command the support required.”

This does not come as a complete surprise. Back in November 2022, Julia Lopez MP, the Minister responsible at the Department for Culture, Media and Sport, was also keen to distance herself from it, telling a select committee that she was “not convinced of the value of this piece of work”. Moreover, earlier this year, the House of Lords Communications and Digital Committee recommended that all work on the proposal cease while an impact assessment on the implications for the creative industries was conducted.


This announcement comes amid a veritable explosion in generative AI platforms, including Dall-E 2 and Stable Diffusion, which generate images from text prompts, and Open AI’s chatbot ChatGPT. As these platforms proliferate, we are also seeing an intensification in the conflict between them and the creative industries. While 2022 saw generative AI take off, 2023 might be the year to test its legality. Legal challenges have been brought in the UK and US against Stability AI, the company behind Stable Diffusion, alleging that its training data inputs and image outputs infringe IP. These conflicts bring into sharp relief the disruptive nature of these platforms and the threat they pose to traditional content creators – and the importance of getting right the legislative framework in which they operate.

The UKIPO’s proposal was not unprecedented internationally, but many regimes, including the EU exception for text and data mining (introduced in its 2019 Directive on Copyright in the Digital Single Market) allow rightsholders to opt-out. In the US, the doctrine of “fair use” of copyright works has generally been viewed as favourable to text and data mining practices, but this will be put to the test in the legal challenges brought against Stability AI before the US courts.

For now it is back to the drawing board for the UKIPO’s proposal, but I suspect we have not heard the last of it.

[1]    Under an existing exception in s29 Copyright, Designs and Patents Act 1988.