Reading Stats
Several industry insiders and DeepSeek researchers describe Liang Wenfeng as a rare talent in China’s AI community today — someone who combines strong infrastructure engineering capabilities with cutting-edge model research skills, while also being able to mobilize resources. He is a person who can "make precise judgments from a high-level perspective while surpassing frontline researchers in detail-oriented tasks." Liang possesses a "terrifying ability to learn" and "does not resemble a traditional boss at all but rather comes across as a geek." This interview provides a unique perspective. Liang offers a voice sorely missing in China’s tech scene: he is one of the rare individuals who prioritizes a sense of "right and wrong" over "profit and loss" and reminds everyone to recognize the inertia of the times and put "original innovation" on the agenda.
The Birth of a Price War
Waves: After DeepSeek V2’s release, the industry was swept into a large-model price war. Some say you’ve become a disruptive "catfish" in the market.
Liang Wenfeng: We didn’t intend to be a catfish—it happened accidentally. Waves: Were you surprised by the outcome?
Liang: Very surprised. We didn’t expect such sensitivity to pricing. We simply set prices based on costs—our principle is neither to subsidize nor to profit excessively. The pricing reflects our cost plus a modest margin.
Waves: Five days later, Zhipu AI followed suit, followed by tech giants like ByteDance, Alibaba, Baidu, and Tencent.
Liang: Zhipu reduced prices on an entry-level product, but its models comparable to ours remain expensive. ByteDance was the first to match our flagship model prices, triggering other tech giants to follow. Since the cost structures of major companies are much higher, we didn’t expect anyone to lose money to compete. This has become similar to the Internet Era’s logic of burning money to provide subsidies.
Waves: From an external perspective, price cuts seem like a way to grab users, much like the typical price wars of the Internet Era.
Liang: User acquisition wasn’t the primary objective. We lowered prices because our next-gen model structure reduced costs, and we believe AI APIs should be affordable and accessible to everyone.
Waves: Why did you choose to innovate the model structure instead of following Llama's established framework like most Chinese companies?
Liang: If the goal is applications, following Llama’s structure for fast deployment makes sense. But our destination is AGI, which requires research on model structures to achieve greater capability with limited resources. This is one of the fundamental research tasks required for model scaling up.
Beyond structural innovation, we’ve explored data construction, human-like behavior modeling, and more. These advancements are embedded in our released models. Additionally, Llama’s structure is estimated to lag two generations behind leading global standards in training efficiency and inference costs.
Waves: Where does this gap primarily come from?
Liang: First, there’s a gap in training efficiency. We estimate that compared to the best international standards, even the best domestic efforts face about a twofold gap in terms of model structure and training dynamics. This means we need twice the computing power to achieve the same results. Additionally, there’s about a twofold gap in data efficiency, meaning we need twice the training data and computing power to reach comparable outcomes. Combined, this requires four times the computing power. Our goal is to continuously work on narrowing these gaps.
Waves: Most Chinese companies choose to focus on both models and applications. Why has DeepSeek chosen to focus solely on research and exploration for now?
Liang: Because we believe the most important thing right now is to participate in the global wave of innovation. For many years, Chinese companies have been accustomed to leveraging others’ technological innovations for application and monetization. But this isn’t something that should be taken for granted. In this wave, our starting point isn’t to capitalize on an opportunity but to push to the forefront of technology and contribute to the development of the broader ecosystem.
Waves: In the Internet Era and the Mobile Internet Era, the common perception is that the U.S. excels at technological innovation, while China is better at applications.
Liang: We believe that with economic development, China must gradually become a contributor rather than simply a free rider. Over the past 30 years of the IT wave, we have largely missed out on genuine technological innovation. We’ve grown accustomed to Moore’s Law falling into our laps, where better hardware and software arrive every 18 months without effort. Scaling laws are being treated the same way.
But in reality, these advancements are the result of generations of relentless effort by Western-dominated tech communities. Because we didn’t participate in this process, we’ve overlooked its existence.
True Gaps: Originality vs. Imitation
Waves: Why did DeepSeek V2 surprise so many people in Silicon Valley?
Liang: Within the U.S., this level of innovation happens daily, so it’s quite ordinary for them. What surprised them was that this came from a Chinese company contributing as an innovator in their game. Most Chinese companies are accustomed to following, not innovating.
Waves: But within the Chinese context, such a choice might seem extravagant. Building large models is a capital-intensive game, and not all companies can afford to focus solely on research and innovation without prioritizing commercialization.
Liang: Innovation costs are certainly high, but the inertia of Borrowrism (拿来主义) in the past was tied to earlier national conditions. However, looking at China’s current economic scale or the profits of major companies like ByteDance and Tencent, they rank among the top globally. What we lack in innovation is certainly not capital, but rather confidence and the ability to effectively organize high-density talent to drive meaningful innovation.
Waves: Why do Chinese companies, even those with abundant capital, prioritize quick commercialization?
Liang: For the past 30 years, we’ve been focused on making money, often at the expense of innovation. True innovation is driven not only by commercial incentives but also by curiosity and the desire to create. We’ve been constrained by past habits, but these are transitional.
Waves: But after all, you are a commercial organization, not a nonprofit research institute. If you choose to innovate and then share it through open source, where will you build your moat? For instance, the MLA architecture innovation this May will likely be quickly copied by others, right?
Liang: In the face of disruptive technology, a closed-source moat is temporary. Even OpenAI’s closed-source approach hasn’t stopped others from catching up. Our value lies in our team, which grows and accumulates know-how through this process. Building an organization and culture that can consistently innovate is our real moat. Open sourcing and publishing papers don’t mean we lose anything. For technologists, being followed is an achievement. Open sourcing is more of a cultural act than a commercial one. Giving is a form of honor, and it attracts talent by fostering a unique culture.
Waves: How do you view market-driven beliefs like those held by Zhu Xiaohu?
Liang: Zhu’s perspective is self-consistent and works well for companies focused on rapid monetization. But if you look at America’s most profitable companies, they are those that have invested deeply and patiently in high-tech.
Waves: With large models, pure technical leadership is unlikely to form an absolute advantage. What’s the bigger bet you’re making?
Liang: We believe Chinese AI can’t stay in a perpetual following position. People often say there’s a one- or two-year gap between Chinese and American AI, but the real gap lies between originality and imitation. If this doesn’t change, China will remain a follower. Some explorations are inevitable. NVIDIA’s leadership, for example, is not the result of a single company’s effort but the collective work of an entire Western technological community and industry. They can anticipate future technology trends and have a roadmap. Similarly, Chinese AI needs such an ecosystem. Many domestic chips fail to develop because they lack the supporting technical community, relying only on secondhand knowledge. This makes it imperative for someone to stand at the forefront of technology in China.
More Investment Doesn't Always Yield More Innovation
Waves: DeepSeek exudes an early OpenAI-like idealism and openness. Will you consider going closed-source in the future, like OpenAI and Mistral?
Liang: No, we won’t. We believe building a strong technological ecosystem is more important at this stage.
Waves: Do you have fundraising plans? Reports suggest that High-Flyer has plans to spin off DeepSeek for an independent listing. In Silicon Valley, AI startups inevitably tie themselves to major firms.
Liang: We don’t have short-term fundraising plans. Our problem has never been funding; it’s the embargo on high-end chips.
Waves: Many argue that AGI and quant trading are fundamentally different pursuits. While quant trading can be done quietly, AGI may require forming alliances to amplify investments.
Liang: More investment doesn’t always lead to more innovation. If it did, tech giants could monopolize all innovation.
Waves: Are you avoiding applications because you lack operational expertise?
Liang: We believe this is a period of technological innovation, not application explosion. In the long run, we aim to create an ecosystem where the industry directly uses our technology and output. We’ll focus on foundational models and frontier innovations, while other companies build To-B (business-facing) and To-C (consumer-facing) businesses on DeepSeek’s foundation. If a complete industrial value chain forms, we won’t need to do applications ourselves.
Of course, if necessary, there are no obstacles for us to develop applications, but research and technological innovation will always be our top priority.
Waves: If companies are choosing an API, why should they pick DeepSeek over a major corporation?
Liang: The future is likely to feature specialized divisions of labor. Foundational large models require continuous innovation. Big companies have capability boundaries, and they might not be the best fit for this need.
Waves: But can technology really create a significant gap? You’ve also said there are no absolute technological secrets.
Liang: There are no secrets in technology, but resetting and catching up require time and cost. For example, NVIDIA’s GPUs theoretically have no secret ingredients—they’re easy to replicate. However, reorganizing a team and catching up with the next generation of technology requires time, so the practical moat remains quite wide.
Waves: After your price cuts, ByteDance was the first to follow. This indicates they felt some level of threat. How do you view new strategies for startups to compete with large companies?
Liang: To be honest, we don’t care much about this—it’s something we did incidentally. Providing cloud services isn’t our primary goal. Our focus remains on achieving AGI.
So far, I haven’t seen any new strategies. Major companies don’t have a clear advantage either. They have existing users, but their cash-flow-dependent businesses can also act as a burden, making them vulnerable to disruption at any time.
Waves: How do you see the future of the six other large-model startups outside of DeepSeek?
Liang: I think 2 to 3 of them will survive. Right now, everyone is still in the money-burning stage. Those startups with clear self-positioning and better refined operations will have a better chance of survival. Others might have to reinvent themselves. Valuable efforts won’t disappear but may take on a different form.
Waves: During the High-Flyer era, your competitive stance was described as “sticking to your own path,” with little concern for lateral comparisons. How do you approach competition?
Liang: My focus is always on whether something can increase societal operational efficiency and whether you can find your unique strengths within the division of labor in the industry chain. If the endgame results in higher societal efficiency, then it’s a valid approach. Many things are transitional in nature, and paying too much attention to them leads to confusion.
Young Innovators Tackling "Mysterious" Challenges
Waves: Jack Clark, former policy director at OpenAI and co-founder of Anthropic, described DeepSeek as employing a group of enigmatic geniuses. What kind of people built DeepSeek V2?
Liang: There’s no mystery. Our team is composed of recent graduates from top universities, Ph.D. interns in their fourth or fifth year, and a few young professionals just a few years into their careers.
Waves: Many large-model companies are fixated on recruiting overseas talent, believing the top 50 experts in this field aren’t in Chinese companies. Where do your people come from?
Liang: None of the team members for V2 were from overseas—they’re all local talent. While the top 50 experts might not be in China, maybe we can cultivate such people ourselves.
Waves: How did the MLA innovation come about? I heard the idea originated from the personal interest of a young researcher.
Liang: After summarizing the mainstream evolutionary trends of attention architectures, the researcher had a sudden inspiration to design an alternative solution. However, turning the idea into reality was a long process. We formed a team specifically for this purpose, and it took several months of work before we made it functional.
Waves: The emergence of such divergent inspiration seems to be tied to your completely innovation-focused organizational structure. During the High-Flyer era, you rarely assigned goals or tasks from the top down. But for cutting-edge AGI research with so much uncertainty, has the approach required more management effort?
Liang: DeepSeek remains fully bottom-up. We generally avoid pre-defining roles, opting instead for natural divisions of labor. Everyone has a unique growth trajectory and inherent ideas; there’s no need to push them. During exploration, if someone encounters an issue, they’ll proactively pull in others to discuss it. However, when an idea shows promise, we do allocate resources top-down as needed.
Waves: DeepSeek seems to be very flexible with allocating computing power and personnel.
Liang: There’s no upper limit on computing resources or personnel allocation for any team member. If someone has an idea, they can freely access the training cluster without approval. Similarly, since we have no hierarchical structure or departmental boundaries, people can mobilize anyone as long as the other party is also interested.
Waves: Such a loose management style relies on selecting individuals who are strongly driven by passion. I’ve heard you’re skilled at identifying exceptional talent using non-traditional criteria.
Liang: Our hiring standards have always been based on passion and curiosity. As a result, many of our team members have unique, fascinating experiences. Their desire to conduct research often far outweighs their focus on financial rewards.
Waves: Transformers were born in Google’s AI Lab, while ChatGPT emerged from OpenAI. What’s the difference in the value big-company AI labs and startups bring to innovation?
Liang: Whether it’s Google Labs, OpenAI, or even the AI labs of major Chinese companies, they all bring significant value. OpenAI ultimately succeeded, but that also involved some historical serendipity.
Waves: Is innovation largely a matter of chance? I noticed your office has doors on both sides of the central conference rooms that can be opened freely. Your colleagues say this is to leave room for serendipity, similar to how a passerby overheard the idea of the Transformer framework and joined the discussion, ultimately turning it into a general architecture.
Liang: I think innovation first comes down to a matter of belief. Why is Silicon Valley so innovative? It’s because they dare to try. When ChatGPT launched, the domestic scene lacked confidence in frontier innovation—from investors to big companies, many felt the gap was too great and preferred to focus on applications. But innovation requires confidence, and this confidence is often more evident in young people.
Waves: Since you don’t seek funding and rarely make public statements, your societal presence isn’t as loud as those companies actively fundraising. How do you ensure DeepSeek becomes the top choice for those working on large models?
Liang: Because we’re tackling the hardest problems. The biggest draw for top talent is definitely to solve the world’s toughest challenges. In fact, top-tier talent in China is often undervalued because society offers so few opportunities for hardcore innovation, making it hard for them to be recognized. By working on the hardest problems, we naturally attract them.
Waves: In OpenAI’s recent announcement, GPT-5 wasn’t unveiled, leading many to believe the technology curve is slowing down. Some are even questioning Scaling Laws. What’s your take?
Liang: We’re relatively optimistic. The entire industry seems to be progressing as expected. OpenAI isn’t omnipotent—they can’t always lead the charge.
Waves: How far off do you think AGI is? Before releasing DeepSeek V2, you worked on code generation and mathematical models and switched from dense models to MoE. What’s the roadmap for your AGI?
Liang: It could be 2, 5, or 10 years away, but it will definitely happen in our lifetime. As for the roadmap, even within our company, there’s no unified vision. However, we are placing our bets on three main directions: mathematics and code, multimodality, and natural language itself.
Mathematics and code serve as natural testing grounds for AGI, much like Go—a closed, verifiable system where high intelligence can potentially be achieved through self-learning. On the other hand, multimodality and learning from interaction within the real human world may also be essential for AGI. We remain open to all possibilities.
Waves: What do you think the final pattern of large models will be?
Liang: There will be specialized companies providing foundational models and services, forming long chains of professional divisions of labor. More players will build upon these foundations to meet the diverse needs of society.
All the Tricks are Products of the Previous Generation
Waves: Over the past year, Chinese startups in large models have seen many changes—for example, Wang Huiwen exited, and new companies are beginning to differentiate themselves.
Liang: Wang took on all the losses himself, allowing others to withdraw without harm. He made a choice that was disadvantageous to him but beneficial to others. I admire his integrity.
Waves: Where is most of your energy focused now?
Liang: Primarily on researching the next generation of large models. There are still many unresolved problems.
Waves: Other startups are sticking to “having it all”, balancing both technology and product development and knowing that technology alone won’t guarantee permanent leadership. It is also very important to seize the time window to translate technological advantages into products. Is DeepSeek focused on model research because the model’s capabilities aren’t mature enough yet?
Liang: All the tricks are products of the past generation and may not hold true for the future. Using the business logic of the Internet Era to discuss the future profitability model of AI is like discussing General Electric and Coca-Cola when Ma Huateng was starting Tencent. It’s very likely a case of clinging to the past in a changing context.
Waves: High-Flyer had a strong foundation in technology and innovation and grew smoothly. Is that why you’re so optimistic?
Liang: High-Flyer reinforced our belief in technology-driven innovation, but it wasn’t all smooth sailing. It took us a long time to accumulate expertise. People often see High-Flyer’s progress post-2015, but in reality, we’d been at it for 16 years.
Waves: Let’s return to the topic of original innovation. With the economy entering a downturn and capital in a cooling cycle, do you think this will suppress original innovation?
Liang: I don’t think so. Adjustments in China’s industrial structure will increasingly rely on hardcore technological innovation. When people realize that the quick money they made in the past was likely due to the luck of the times, they’ll be more willing to commit to genuine innovation.
Waves: So you’re optimistic about this too?
Liang: I grew up in the 1980s in a fifth-tier city in Guangdong. My father was an elementary school teacher. In the 1990s, when opportunities to make money were plentiful in Guangdong, many students’ parents came to my house saying they thought education was useless. But if you look at it now, those attitudes have changed. Making money is no longer easy—even the opportunity to make a living by driving a taxi has likely disappeared now. These shifts occur within a single generation.
In the future, hardcore innovation will only grow. It’s hard to understand now because society as a whole needs to be educated by facts. When society rewards and celebrates innovators, collective mindsets will shift. We just need a lot more tangible examples and time for the process to unfold.