Home News OpenAI Data Fuels Suspected Chinese AI Clone

OpenAI Data Fuels Suspected Chinese AI Clone

Author: Aria Mar 13,2025

OpenAI has voiced concerns that China's DeepSeek AI models, known for their remarkably low cost, may have been developed using data from OpenAI. This has prompted strong reactions, with Donald Trump calling DeepSeek a "wake-up call" for the U.S. tech industry following a significant drop in Nvidia's market value—a loss of nearly $600 billion. The emergence of DeepSeek triggered a sharp decline in the stock prices of major AI-focused companies. Nvidia, a key player in GPU technology crucial for AI model operation, suffered the most substantial loss in Wall Street history, with a 16.86% drop in share value. Microsoft, Meta Platforms, Alphabet, and Dell Technologies also experienced significant declines.

DeepSeek promotes its R1 model as a significantly more affordable alternative to Western AI models like ChatGPT. Built upon the open-source DeepSeek-V3, it reportedly requires far less computing power and was trained for an estimated $6 million. While this cost has been debated, it has raised questions about the massive investments made by American tech companies in AI, causing investor apprehension. DeepSeek's popularity surged, becoming a top downloaded free app in the U.S., fueled by discussions surrounding its effectiveness.

Bloomberg reported that OpenAI and Microsoft are investigating whether DeepSeek leveraged OpenAI's API to integrate OpenAI's AI models into its own. OpenAI confirmed its awareness of such attempts by Chinese and other companies to utilize data from leading U.S. AI companies. This process, known as distillation, involves extracting data from larger models to train smaller ones, violating OpenAI's terms of service. OpenAI emphasized its commitment to protecting its intellectual property and collaborating with the U.S. government to safeguard its technology.

David Sacks, President Donald Trump's AI czar, stated there's substantial evidence suggesting DeepSeek used distillation to extract knowledge from OpenAI models. He anticipates leading AI companies will implement measures to prevent similar incidents.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

This situation has highlighted the irony of OpenAI's position, given previous accusations of its own use of copyrighted material in developing ChatGPT. Ed Zitron, a tech PR writer, pointed out this hypocrisy on Twitter. OpenAI previously stated in a submission to the UK's House of Lords that creating AI models like ChatGPT without copyrighted material is impossible. This statement aligns with their defense against lawsuits from the New York Times and 17 authors alleging copyright infringement. OpenAI maintains that its training practices constitute "fair use."

The legal landscape surrounding AI training data remains complex, as evidenced by a 2018 U.S. Copyright Office ruling that AI-generated art cannot be copyrighted due to the lack of a human mind-creative expression nexus.