Sony Puts AI Companies ‘On Notice’

We're about to test the limits of existing copyright law.

James Joyner · Friday, May 17, 2024 · 9 comments

FT (“Sony Music warns global tech and streamers over AI use of its artists“):

Sony Music is sending warning letters to more than 700 artificial intelligence developers and music streaming services globally in the latest salvo in the music industry’s battle against tech groups ripping off artists.

The Sony Music letter, which has been seen by the Financial Times, expressly prohibits AI developers from using its music — which includes artists such as Harry Styles, Adele and Beyoncé — and opts out of any text and data mining of any of its content for any purposes such as training, developing or commercialising any AI system.

Sony Music is sending the letter to companies developing AI systems including OpenAI, Microsoft, Google, Suno and Udio, according to those close to the group.

The world’s second-largest music group is also sending separate letters to streaming platforms, including Spotify and Apple, asking them to adopt “best practice” measures to protect artists and songwriters and their music from scraping, mining and training by AI developers without consent or compensation. It has asked them to update their terms of service, making it clear that mining and training on its content is not permitted.

Sony Music declined to comment further.

The letter, which is being sent to tech companies around the world this week, marks an escalation of the music group’s attempts to stop the melodies, lyrics and images from copyrighted songs and artists being used by tech companies to produce new versions or to train systems to create their own music.

The letter says that Sony Music and its artists “recognise the significant potential and advancement of artificial intelligence” but adds that “unauthorised use . . . in the training, development or commercialisation of AI systems deprives [Sony] of control over and appropriate compensation”.

It says: “This letter serves to put you on notice directly, and reiterate, that [Sony’s labels] expressly prohibit any use of [their] content.”

It’ll be fascinating to watch this play out. Offhand, it seems to me that either using copyrighted content to train AI models is fair use or it isn’t. Given the radically transformative nature of it, I would lean toward it qualifying under existing law.

If it’s not fair use, large language models simply can’t work under American law. If the only content available is government publications, works released under an unrestrictive Creative Commons model or equivalent, or extremely old works in the public domain, it would be practically useless. If it is fair use, I don’t know why a Beyonce single differs from an article in the New York Times.

The main precedents that come to mind are the various copyright infringement lawsuits against Google, all of which the company has won. Google’s search engine crawls the Internet and indexes copyrighted materials, monetizing this content to sell ads. That’s fair use. Google Images indexes copyrighted photos and artwork, displaying thumbnails of them. That’s fair use. Google Books directly scans the content of copyrighted, well, books. So long as they only make a limited amount of the text available in any given search, that’s fair use. (And note that the whole book is scanned.) They even won a recent case where they used bits of Oracle’s possibly copyrighted (that court didn’t even decide that question) code in its products.

While this is all “emerging technology” in DoD parlance, the contours of the future and its implications are visible. It may well be that Congress should craft new laws to deal with AI, in that it’s in many ways fundamentally different from previous Internet-based cases. But, of course, it has not been known for effectively legislating in recent years.

Comments

drj says:

Friday, 17 May 2024 at 07:40

If it’s not fair use, large language models simply can’t work under American law without adequately compensating the original copyright holders.

Fixed it for you.

I don’t really see the problem. Why should AI developers get rich from other people’s work without paying suitable compensation?

9
MarkedMan says:

Friday, 17 May 2024 at 07:44

I think the real concern here is training on specific artists directly leads to something along the lines of “Make a love song in the style and sound of Beyonce”. How close can a such a creation come to one of Beyonce’s actual numbers before it is copyright infringement? I imagine a human Beyonce imitator can go pretty darn close as long as they don’t use her name or sing one of her actual songs. The number of people who can do that is severely limited by the talent needed to sound like Beyonce and to write in her style. But a generative AI could crank out thousands of songs that sound exactly like something Beyonce would release, and do so all day every day. I think Sony sees a lot of expensive litigation lasting for years, if not decades, before these issues get sorted out and are hoping to head it off at the pass.

6
James Joyner says:

Friday, 17 May 2024 at 07:55

@drj: If that were in fact the ruling on the law, it would mean only companies that are already fantastically rich could afford to get into the game (which, in fairness, seems to be what’s happening now). And even then, I don’t know that it’s even logistically possible to seek out all the copyright holders.

Beyond that, it’s not immediately obvious to me why ChatGPT is different from Google. Is there a point at which a difference in degree becomes a difference in kind? If so, what’s that point?
Not the IT Dept. says:

Friday, 17 May 2024 at 08:00

Good for Sony. (I cannot believe I actually typed that.) Artists need to be protected for their work and if it takes a major international corporation to do it, well there have been other ironies in the 21st century. If Sony is successful, this will have an impact on artists further down the remuneration ladder than Beyonce.

4
MarkedMan says:

Friday, 17 May 2024 at 08:06

@James Joyner:

it’s not immediately obvious to me why ChatGPT is different from Google

Remember, while generative AI can augment or replace traditional search engines, that’s not their breakthrough use. The can create written, audio or visual media, and do so in the style of existing artists. See my comment above for an example. So this is not a difference in degree leading to a difference in kind, it is a difference in kind in and of itself, and a huge difference at that.

I don’t know what the answers are, but can certainly see the problems.

8
Chris says:

Friday, 17 May 2024 at 09:25

It’s the pot calling the kettle black. Sony, which has a history of ripping off artists, is crying foul over AI ripping off artists. Sony might have a little higher ground to stand on, in as much as many of the artists they exploited and continue to exploit entered into contracts that benefited their sound company. However, we should see through both Sony and AI for their unrepentant sinning ways. AI must not prevail in skirting accountability and Sony should be leveled to do better.

1
Dave Schuler says:

Friday, 17 May 2024 at 10:45

@James Joyner:

it would mean only companies that are already fantastically rich could afford to get into the game

You have lurched uncontrollably into the problem with our system of intellectual property. The law protects the IP of “fantastically rich” companies from infringement while allowing “fantastically rich” companies to infringe on the IP of individuals or organization not similarly positioned. Its present purpose is to protect established companies from upstarts rather than its constitutional purpose.

There are exceptional instances of the “little guy” defending him- or herself from big companies cf. Flash of Genius but they are exceptions.

My question is how will they know that AI has infringed on their IP?

3
Gustopher says:

Friday, 17 May 2024 at 13:28

Given the radically transformative nature of it, I would lean toward it qualifying under existing law.

Is it transformative without any artistic intent, or is that merely derivative? This is not any easy question.

And we often restrict transformative works that are not transformative enough or transformative in the right way.

If a black barber shop quartet were to sing Lynyrd Skynyrd’s “Sweet Home Alabama” it would absolutely be transformative. The meaning of every line of the song would change because of the context change — on the level of DuChamp putting a urinal on display in a museum, or at least that Obama painting.

And they would still likely owe royalties (or the costs of fighting would be higher than just paying, as all the boundaries are fuzzy)

Offhand, it seems to me that either using copyrighted content to train AI models is fair use or it isn’t.

The current iteration of AI is very bad at citing its sources. But that doesn’t have to be how AI is built. It is a deliberate choice to not track this information so they can then say that they cannot pay royalties or license fees.

This, by the way, is a deliberate choice that needs to change just for the AI to be useful. It turns out there’s a lot of crazy shit on the Internet, and AIs have terrible media literacy skills. They also hallucinate by pulling in unrelated content. So, untraceable sources is a temporary thing.

I can definitely see a version of AI fair use that limits the amount of inputs from any given source. You will find a lot of precedent on “sampling” in music, and this would be the equivalent.

Even there, you would run into issues of AI using trademarked (not just copyrighted) images and sounds where the fair use is much more restricted.

(I’m very much in favor of artists using trademarked material in the most transformative, protected ways possible, so it gets scraped by AI and then output in unprotected ways)

As well as “image or likeness” and its vocal equivalent — things that Sony doesn’t have an exclusive right to, as the artists retain the original.

And on the technical side, there’s also the case of CSAM (child sexual assault materials, for those who don’t recognize the acronym). It exists on the internet, and any generic “slurp up all the data” approach will find it. Whatever tools and methods that are put in place to deal with AI training off and outputting CSAM can be put in place for any content.

(CSAM also brings up legal issues, of course. Should AI CSAM be allowed? What if elements are used for other purposes — say, a distinctive hand gesture, that is used for an adult-fun-time image?)

I worked with community generated content in several jobs. The two questions you always have to ask are “what is our CSAM story?” and “how will this be used to harass women?” — the harassing women is probably thornier in the case of generative AI than CSAM (women can consent, or not… kids simply cannot, so boundaries are more defined), but this has dragged on long enough.

2
Gustopher says:

Friday, 17 May 2024 at 13:45

They even won a recent case where they used bits of Oracle’s possibly copyrighted (that court didn’t even decide that question) code in its products.

It’s worth understanding what Google was doing here — they copied “code” was the APIs, not the implementations.

To attempt to put that in layman’s terms: Oracle has a recipe for Eggs Benedict. In fact, Oracle bought the recipe from Sun Microsystems, and the qualities of Eggs Benedict — gooey sauce, poached eggs, etc — were published and standardized. Not what made up the gooey sauce, but how that gooey sauce tastes, how lumpy and viscous it is, etc.

Google examines the Eggs Benedict and makes their own. Some elements are going to be the same, by virtue of eggs being a known quantity. Google might spin the water clockwise rather than counter-clockwise while poaching the eggs. The gooey sauce may be very different in its creation, but the poaching of the egg is straightforward.

Oracle was claiming ownership over the concept and name “eggs Benedict”, and spinning the water while poaching an egg.