Investors back start-ups aiding copyright deals to AI groups

Stay informed with free updates

Investors are backing a crop of start-ups helping the creative industries sell content to artificial intelligence groups, as OpenAI, Meta and Google face scrutiny over the use of copyrighted material to train AI models.

Fledgling groups such as Pip Labs, Vermillio, Created by Humans, ProRata, Narrativ and Human Native are building tools and marketplaces where writers, publishers, music studios and moviemakers can be paid for allowing their content to be used for AI training purposes.

These content licensing and data marketplace start-ups have secured $215mn in funding since 2022, according to data from Dealroom.co. Over this time, AI companies have sought media deals to obtain high-quality training data, which can also help them avoid being sued over copyright claims or being targeted by regulators.

“The licensing of content that doesn’t exist on the open internet is going to be a big business,” said Dan Neely, chief executive and co-founder of Vermillio, which works with major studios and music labels such as Sony Pictures and Sony Music.

The start-up, which detects whether AI outputs have copyrighted content as well as licenses content, projects the AI licensing market to expand from about $10bn in 2025 to $67.5bn by 2030. Sony Music and DNS Capital led Vermillio’s latest $16mn funding round in March.

The number of AI licensing deals has risen in the past year, with 16 agreed in December 2024 — a record number, according to the data by the Centre for Regulation of the Creative Economy at the University of Glasgow.

Read More MG OMD makes senior appointments as chief investment officer departs
ChatGPT maker OpenAI and AI search engine Perplexity have each made more that 20 deals with media groups since 2023, particularly with news organisations.

“You need three things to build AI models: talent, compute and data,” said James Smith, chief executive and co-founder of UK-based Human Native. “[AI companies] have spent millions on the first two. They’re just getting around [to] spending millions on the third.”

Andreessen Horowitz raised $80mn for Pip Labs in August. In November, ProRata was valued at $130mn after signing licensing deals with major UK publishers such as The Guardian and the Daily Mail owner DMG Media.

The investment deals come amid global scrutiny over what data is used to train AI models. The UK is weighing relaxing copyright rules for AI training, but tech companies such as OpenAI and Google are facing attempts to force them to pay more for valuable content through lawsuits in the US and new regulations in the EU.

Meta this month faced authors in a US court in one of the first big tests over whether AI groups should pay for copyrighted training data that has been scraped from the internet.

Recommended

OpenAI, which has done numerous data sharing deals, including with the Financial Times, is still facing copyright lawsuits from some media groups including the New York Times.

Jason Zhao, the co-founder of Pip Labs, which uses blockchain technology to track and license intellectual property, said: “Instead of trying to spend a ton of time changing the law to fit, what we’re trying to do is show that this is just a better solution that both AI companies and IP holders would rather use.”

Read More Instagram Will Now Enable You to Create Stickers from Entities in Your Photos
Stability AI, which is also being sued by artists who claim the company used their intellectual property to train their models, is looking into starting its own licensing marketplace, says its chief executive Prem Akkaraju.

“[It’s] something we’re working on, where artists can actually have a marketplace or a portal where they can say, ‘hey, you could train on this’,” said Akkaraju. “I think it’s really smart.”

The nascent AI training data marketplace faces several challenges. The start-ups need to find enough data set providers to create a working business model.

They also need to find data at a high enough quality, and make it easily and quickly available. Many online data sets include unwanted content, such as child sexual abuse material or other harmful material, which could expose companies to reputational harm or litigation.

Another obstacle will be convincing artists and creatives that selling their content to train AI model will be beneficial.

“So many of the companies and creators we talk to don’t yet have confidence in the technical solutions that are either out there or being developed,” said Gina Neff, professor of responsible AI at Queen Mary University of London. “It feels like a really bad trade off to them.”

But Human Native’s Smith said: “We can’t have a situation where we decimate industries that we hold dear, like journalism or music. We have to find a way to make this work.”