An activist group says it has scraped tens of millions of tracks from [Spotify](https://amzn.to/4ausZac), raising fresh concerns about copyright enforcement and the use of creative work to train artificial intelligence systems.
The group, called Anna's Archive, claims it has collected 86 million audio files from Spotify, along with 256 million rows of metadata, including artist names, album titles, and track information. Spotify hosts more than 100 million tracks and confirmed that the material taken does not represent its full catalogue.
Spotify said it had already acted against those responsible. The Stockholm-based company, which has more than 700 million users worldwide, said it had "identified and disabled the nefarious user accounts that engaged in unlawful scraping".
In a statement, Spotify said an investigation found that a third party had scraped public metadata and used illicit methods to bypass digital rights management protections to access some audio files. The company added that it does not believe the music has been released publicly so far.
Anna's Archive, best known for hosting links to pirated books, said in a blog post that its aim was to create what it called a "preservation archive" of music. The group claimed the files represented 99.6% of all music listened to by Spotify users and said it planned to distribute them via torrents, a file-sharing method commonly used to distribute large datasets.
"Of course Spotify doesn't have all the music in the world, but it's a great start," the group wrote. It describes its mission as preserving humanity's knowledge and culture, arguing that music should be protected from loss caused by disasters, conflict, or funding cuts.
Critics say the likely outcome is not cultural preservation but widespread reuse of the material by AI developers. Ed Newton-Rex, a composer and campaigner for artists' rights, said the scraped music would almost certainly be used to train AI models.
"Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models," he said. Newton-Rex argued that governments should require AI companies to disclose the data used to build their systems.
Anna's Archive also points to Library Genesis, or LibGen, a vast repository of pirated books. LibGen has been linked to past legal disputes involving Meta, whose founder and chief executive, Mark Zuckerberg, approved the use of the dataset for AI training, according to US court filings, despite internal warnings that the material was pirated. Meta later successfully defended a copyright infringement claim, though the authors involved are seeking to amend their case.
Some in the tech sector have openly discussed the potential value of a mass music dataset. Yoav Zimmerman, a co-founder of the startup Third Chair, wrote on LinkedIn that such a collection could allow individuals to create "their own personal free version of Spotify" or enable companies to train AI systems on modern music at scale.
"The only thing stopping them is copyright law and the deterrent of enforcement," he wrote.
Spotify said it has since introduced new safeguards to prevent similar attacks and is actively monitoring for suspicious behaviour across its platform.
The episode highlights a growing clash between creative industries and AI developers. Artists, authors, and musicians argue that their work is being absorbed into training datasets without consent or compensation. At the same time, AI companies say large-scale access to data is essential for innovation.
In the UK, the debate has become increasingly political. Creative professionals have protested against proposals that would allow AI companies to use copyright-protected material unless rights holders explicitly opt out. Almost every respondent to a government consultation backed the concerns raised by artists.
Liz Kendall, the UK secretary of state for science, innovation and technology, told parliament this month that there was "no clear consensus" on the issue and said ministers would take more time before finalising policy. The government has promised to publish proposals on AI and copyright by 18 March next year.
For now, the claimed Spotify scrape sits at the intersection of piracy, technology, and cultural ownership, a reminder of how quickly digital archives can become raw material in the race to build more powerful AI systems.
The group, called Anna's Archive, claims it has collected 86 million audio files from Spotify, along with 256 million rows of metadata, including artist names, album titles, and track information. Spotify hosts more than 100 million tracks and confirmed that the material taken does not represent its full catalogue.
Spotify said it had already acted against those responsible. The Stockholm-based company, which has more than 700 million users worldwide, said it had "identified and disabled the nefarious user accounts that engaged in unlawful scraping".
In a statement, Spotify said an investigation found that a third party had scraped public metadata and used illicit methods to bypass digital rights management protections to access some audio files. The company added that it does not believe the music has been released publicly so far.
Anna's Archive, best known for hosting links to pirated books, said in a blog post that its aim was to create what it called a "preservation archive" of music. The group claimed the files represented 99.6% of all music listened to by Spotify users and said it planned to distribute them via torrents, a file-sharing method commonly used to distribute large datasets.
"Of course Spotify doesn't have all the music in the world, but it's a great start," the group wrote. It describes its mission as preserving humanity's knowledge and culture, arguing that music should be protected from loss caused by disasters, conflict, or funding cuts.
Critics say the likely outcome is not cultural preservation but widespread reuse of the material by AI developers. Ed Newton-Rex, a composer and campaigner for artists' rights, said the scraped music would almost certainly be used to train AI models.
"Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models," he said. Newton-Rex argued that governments should require AI companies to disclose the data used to build their systems.
Anna's Archive also points to Library Genesis, or LibGen, a vast repository of pirated books. LibGen has been linked to past legal disputes involving Meta, whose founder and chief executive, Mark Zuckerberg, approved the use of the dataset for AI training, according to US court filings, despite internal warnings that the material was pirated. Meta later successfully defended a copyright infringement claim, though the authors involved are seeking to amend their case.
Some in the tech sector have openly discussed the potential value of a mass music dataset. Yoav Zimmerman, a co-founder of the startup Third Chair, wrote on LinkedIn that such a collection could allow individuals to create "their own personal free version of Spotify" or enable companies to train AI systems on modern music at scale.
"The only thing stopping them is copyright law and the deterrent of enforcement," he wrote.
Spotify said it has since introduced new safeguards to prevent similar attacks and is actively monitoring for suspicious behaviour across its platform.
The episode highlights a growing clash between creative industries and AI developers. Artists, authors, and musicians argue that their work is being absorbed into training datasets without consent or compensation. At the same time, AI companies say large-scale access to data is essential for innovation.
In the UK, the debate has become increasingly political. Creative professionals have protested against proposals that would allow AI companies to use copyright-protected material unless rights holders explicitly opt out. Almost every respondent to a government consultation backed the concerns raised by artists.
Liz Kendall, the UK secretary of state for science, innovation and technology, told parliament this month that there was "no clear consensus" on the issue and said ministers would take more time before finalising policy. The government has promised to publish proposals on AI and copyright by 18 March next year.
For now, the claimed Spotify scrape sits at the intersection of piracy, technology, and cultural ownership, a reminder of how quickly digital archives can become raw material in the race to build more powerful AI systems.