Meta CEO Mark Zuckerberg appears to be using YouTube and his fight to crack down on pirated content to defend his company’s use of its own datasets containing copyrighted e-books to train AI models, newly released footage revealed. Deposition, which is part of the complaint submitted to the court by the plaintiff’s lawyer, related to AI copyright case Kadrey v. Meta. This is one of many cases in the US court system pitting AI companies against authors and other IP holders. For the most part, the defendants in the case — the AI companies — claim that training on copyrighted content is “fair use.” Many copyright holders disagree. “For example, YouTube, I think, may have been hosting some stuff that people pirated for a while, but YouTube is trying to take that stuff down,” Zuckerberg said during the deposition, according to a portion of the transcript made available Wednesday. night. “And the majority of the stuff on YouTube, I think, is good and they have a license to do that.” Excerpts from Zuckerberg’s deposition provide some clues about Zuckerberg’s thinking about copyrighted content and fair use. However, it should be noted that the full transcript of the deposition was not released. TechCrunch has reached out to Meta for additional context and will update the article if the company responds. Based on deposition nuggets, Zuckerberg appeared to defend Meta’s use of training data sets from an e-book called LibGen to develop a family of AI models known as Llama. Meta’s Llama competes with flagship models from AI companies like OpenAI. LibGen, which describes itself as a “link aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued numerous times, ordered shut down, and fined tens of millions of dollars for copyright infringement. According to court filings unsealed this week, Zuckerberg allegedly removed the use of LibGen to train at least one of his Meta Llama models despite concerns among the company’s AI executives and research team about the legal implications. Counsel for the plaintiffs, who include best-selling authors Sarah Silverman and Ta-Nehisi Coates, quoted a Meta employee who referred to LibGen as “a collection of data that we know will be hacked” and warned that its use “could be damaging. [Meta’s] position to negotiate with the regulator,” According to the legal filing, during the deposition, Zuckerberg said that he “hasn’t really heard” LibGen. “It’s just that I don’t have any knowledge of that.” In response to a question from one of the plaintiffs’ lawyers, David Boies, Zuckerberg explained why it doesn’t make sense to ban the use of data sets like LibGen YouTube because some content may be copyrighted? No,” he said. “[T]here is a case where the blanket ban may not be true. Zuckerberg stated that Meta should be “careful” in its practice of copyrighted material. “You know, [if there’s] people who provide websites and intentionally try to violate people’s rights … obviously we want to be cautious or careful about how we participate or maybe prevent teams from participating,” Zuckerberg said in his deposition, according to the transcript .New allegations Plaintiffs in the Kadrey v. Meta case have amended the complaint several times since it was filed in the United States District Court for the Northern District of California in 2023. The most recent amended complaint filed by the attorney The plaintiffs on Wednesday contained new allegations against Meta, including that the company referred to certain copyrighted books as available for licensing used LibGen to train the latest family of Llama models, Llama 3, according to the revised submission. Plaintiffs also say that Meta used the data set to train the next generation of Llama 4 models. According to the amended filing, the Meta researchers allegedly tried to hide the fact that the Llama model was trained on copyrighted material by inserting “supervised examples” into the Llama setting. And Meta downloaded pirated e-books from another source, Z-Library, for Llama’s training in April 2024, according to the amended complaint. Z-Library, or Z-Lib, has been the subject of several legal actions taken by the publisher, including domain seizure and removal. In 2022, the Russian national charged with maintaining it was charged with copyright infringement, wire fraud, and money laundering.