Elon Musk agrees with other AI experts that there is no real-world data to train AI models. “We have now exhausted the cumulative amount of human knowledge … in AI training,” Musk said during a live chat with Stagwell chairman Mark Penn that aired on X late Wednesday. “That happened last year.” Musk, who owns AI company xAI, echoed the theme of OpenAI’s former chief scientist, Ilya Sutskever, at NeurIPS, a machine learning conference, during a speech in December. Sutskever, who says the AI industry has reached what he calls “peak data,” predicts the lack of training data will force a change from the way models are developed today. Indeed, Musk suggests that synthetic data – data generated by AI models themselves – is the way forward. “The only way is to improve [real-world data] it is with synthetic data, where AI creates [training data],” he said. “With synthetic data… [AI] will sort itself out and go through this self-learning process.” Other companies, including tech giants like Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train leading AI models. Gartner estimates that 60% of data is used for AI and analytics projects in 2024. Microsoft’s Phi-4, which is open-sourced early on, is trained on synthetic data along with Google’s Gemma model to develop one of its best systems, Claude 3.5 Sonnet’s latest Llama model using AI-generated data has other advantages, such as cost savings. The Palmyra X 004 model, developed using almost all synthetic resources, costs only $700,000. developed – compared to the $4.6 million estimate for an OpenAI model of the same size.But there are also flaws. Some research suggests that synthetic data can lead to model collapse, where models become less “creative” – and more biased – in their results, ultimately impairing their functionality. Because the model creates synthetic data, if the data used to train the model has biases and limitations, its output will also be affected.