Garrett T. Capps’ cosmic cowboy sound is included among datasets being using to train generative AI models, according to The Atlantic. Credit: Oscar Moreno

Work by some of San Antonio’s best-known musicians has been found in datasets being used to train generative AI models to create music, according to an online database offered by The Atlantic magazine.

The Atlantic uncovered four large datasets used for AI development — one with 12 million tracks, one with 9 million and two with more than 100,000 each. The magazine’s staff compiled the results into a searchable database, which turned up the names of multiple Alamo City musicians.

Active San Antonio-area acts included in the database include cosmic country and krautrock artist Garrett T. Capps, conjunto stalwart Santiago Jimenez Jr., corrido singer-songwriter and congressional candidate Bobby Pulido and indie-rock bands Girl In A Coma and INOHA.

“Publishing rights are incredibly important, and situations like this highlight why artists need to understand and protect their work,” members of Girl In A Coma, a band that’s released music on Joan Jett’s Blackheart Records, told the Current in a private message. “We’re not supporters of using AI to create music and believe human creativity should remain at the heart of the art form.”

Legacy San Antonio acts whose music has been used to train AI, according to The Atlantic’s database, include the Sir Douglas Quintet, Texas Tornados, Flaco Jimenez, Butthole Surfers and the Royal Jesters.

“I can’t keep track of all the technological ‘progress’ taking place in these bizarre times. But I will say that I believe in my music, and if robots wanna listen to it and do something with it, then maybe something positive will come from it,” Capps told the Current. “Maybe it will lead to more songs about breakfast tacos? Algorithmic breakfast tacos. Or maybe it will all just lead to the end of the world as we know it.”

The San Antonio artists’ works were just some of millions of songs available to developers as they train their AI models to mash up existing music into more and more convincing dupes of human output, according to an accompanying article by The Atlantic.

Users can also search the database for screenwriters, actors, YouTube channels and books.

San Antonio authors whose works can be found in the datasets include Sandra Cisneros, Jonny Garza Villa, Johnny Compton and Xavier Garza.

The Atlantic discovered the datasets as they were passed around online within the AI development community, though the magazine was unable to identify the exact models using them and to what extent. Even so, the news outlet’s investigation found that the datasets have been downloaded thousands of times.

The magazine also offered up a disclaimer that the presence of a work in a dataset isn’t definitive proof that it’s been used to train AI. Companies also typically use multiple datasets — and not all are included in the database — so the absence of a work doesn’t constitute proof that it hasn’t been used, either.

AI music generators can mimic human music with uncanny accuracy, but to do so, they must be trained with enormous quantities of human-created recordings. The actual recordings fed into any model are closely guarded and typically treated as proprietary information by such companies.

One of the smaller datasets obtained by The Atlantic is distributed as 100,000 MP3s called the Free Music Archive, which Google has admitted to using.

The other three datasets are distributed as a list of links to songs on YouTube or Spotify, The Atlantic reports.

AI developers download the actual audio from YouTube or Spotify using automated functions, which allow them to bypass logins, advertisements and other mechanisms that might earn the musicians or their label money or subscribers.

Such tools violate the terms of service of the online platforms, according to The Atlantic.

But it’s not new for AI companies to scrape the internet of all human creative output to train their models.

In 2022, Google trained such a model on 44 million tracks, amounting to 42 years of music.

After being sued by several labels, AI firm Suno said in a 2024 court filing that it trained its models on “essentially all music files of reasonable quality” that it could download from the internet.

In 2020, OpenAI scraped 1.2 million songs from the web to train a model called Jukebox, designed for the sole purpose of generating variations on existing music.

Musicians and labels have filed at least 12 lawsuits against AI companies for training models on copyrighted music, The Atlantic reports.

Such companies typically defend their right to train models on unlicensed music by arguing the training is “fair use” under copyright law. For example, Google wrote in a March blog post that its Lyria 3 Pro model uses “materials that YouTube and Google has a right to use under our terms of service, partner agreements and applicable law.”

Sign Up for SA Current newsletters.

Vote NOW for your Best of San Antonio® Finalists!

Are San Antonio musicians’ songs being used to train generative AI models?

Stephanie KoithanDigital Content Editor

Vote NOW for your Best of San Antonio® Finalists!

Related Stories

Musical Bridges Around the World presenting examination of German music and culture

Ahead of San Antonio show, Todd Rundgren talks AI, the algorithm, and the ghost of Robert Johnson

Ram Jam Returns: Annual fest remembering Taco Land’s late proprietor is packed with San Antonio acts

Stephanie KoithanDigital Content Editor