News 2024-11-02

Next for Large Model Competition?

Recently, the 2024 Yunqi Conference captured significant attention as Alibaba once again emerged as a focal point in the ongoing revolution of artificial intelligence. In May, Alibaba Cloud made headlines with a groundbreaking decision to drastically cut prices on its Tongyi Qianwen models, with price reductions reaching up to an astonishing 97%. The conference saw yet another wave of price cuts, with three of its primary models seeing reductions of up to 85%.

This aggressive pricing strategy prompted a ripple effect throughout the industry, with key players like ByteDance's Volcano Engine, Baidu Intelligent Cloud, Tencent Cloud, and iFlytek following suit, marking an industry-wide decline of nearly 90% in model pricing. Moreover, in July, OpenAI contributed to the price wars by introducing GPT-4o mini, priced over 60% less than its predecessors, further emphasizing a shift that has begun within the AI landscape.

As Alibaba reinvigorates the "price war," it's anticipated that AI model prices will continue to plummet, potentially even entering a realm of "negative margin." Throughout the history of the internet sector, the strategy of "selling at a loss to gain scale" has been a common tactic. Altering the business model of an entire industry often necessitates higher operational costs upfront.

Yet, this journey raises significant questions about how companies can balance pricing, quality, and customer service. In an environment where survival dictates a shift in strategy, businesses must consider that merely capitalizing on "easy gains" won't suffice.

Advertisement

The prevalence of large models in China has shifted from a "price per fraction" metric to a more aggressive "price per liberal measurement." The cost of Alibaba's API calls for the Tongyi Qianwen model dropped from 0.02 yuan per thousand tokens to an eye-opening 0.0005 yuan. Following another round of cuts in September, the minimum calling prices for the Qwen-Turbo, Qwen-Plus, and Qwen-Max models were further reduced to historical lows of 0.0003 yuan, 0.0008 yuan, and 0.02 yuan, respectively.

CTO of Alibaba Cloud, Zhou Jingren, articulated that each pricing decision undergoes rigorous evaluation rooted in industry growth and user feedback, stressing that while reductions are not merely a "price war," the market still perceives large model costs as relatively high. The maturation of an industry necessitates devaluation, akin to Moore's Law, which predicts that processing capabilities double approximately every two years while costs drop significantly.

Remarkably, the pace of decline in pricing for large models has outstripped Moore's expectations, nearing 100%. This elicits the question: can companies sustain profitability with high-scale operation models? For the realm of large models, it appears that optimizing scale at the moment is more crucial than the pursuit of profit.

The current consensus points to an industry willing to sacrifice immediate profits for scalability. There are indications of a "loss-leading" phase within the sector. Reports suggest that prior to May of this year, domestic large model inference margins exceeded 60%, paralleled with international competitors. However, following recent drastic pricing shifts, margins have spiraled into negative figures.

As usage surges post-price cuts, the imperative of frequent usage compounds losses, given the expensive computational resources consumed with each model invocation. This scenario underscores that while companies slash prices, they also grapple with escalating costs.

On the flip side, the positive impact of these price reductions is palpable. For instance, Alibaba's Bai Lian platform witnessed a staggering over 200% increase in paid users following price reductions. More companies have pivoted towards utilizing services like Bai Lian instead of pursuing private deployments, with over 300,000 customers now served.

In the last year, Baidu's Wenxin model also experienced a formidable price decline exceeding 90%. However, Baidu's second-quarter financial calls revealed that the Wenxin model's daily call volume surpassed 600 million, indicating a tenfold increase within just six months.

This trend suggests that rather than clinging to immediate profits, large model enterprises are pursuing "expectation" — sacrificing short-term gains to secure long-term returns.

Basic estimates reveal that revenue from model invocation for most companies remains under one billion, dwarfed by total revenue figures in the hundreds of billions. Yet, projections within the next one to two years foresee an exponential growth in invocation frequency increasing at least tenfold. In the short term, the larger the user base, the higher the computational expenses incurred, but in the long term, operational costs in cloud services should gradually mitigate as demand expands, ushering in a "return phase."

As the industry evolves, the driving force of AI on computational demand will become increasingly pronounced. Alibaba's CEO Wu Yongming has noted that over half of new market needs are propelled by AI, with large models accelerating commercialization.

In addition to significantly lowering thresholds for enterprise clients, which reduce risks associated with trial and error, large models will benefit traditional industries like government, manufacturing, and energy where business scales and incremental growth potential are significant.

When large models become as accessible as conventional infrastructure, a massive market expansion is anticipated, though current realities necessitate some price concession from large model developers towards both enterprises and creators.

Furthermore, while the existing revenue may decline with continued reductions, incremental income is on the rise. Baidu, for example, has witnessed its AI-native application ecosystem bolstered by Wenxin’s direct income alongside indirect growth in its cloud services.

In the face of skepticism surrounding Baidu’s cloud strategies in prior years, their position within the AI cloud market is beginning to gain traction, exhibiting an increase in large model contribution from 4.8% in Q4 2023 to 9% in Q2 2024.

This evolving perspective underscores the broader sentiment in the large model ecosystem: the emphasis on scaling over profits. This notion resonates strongly across various internet initiatives, historically exemplified by competitive battles in ridesharing, e-commerce, and more. As large model companies confront the realities of a price war, survival hinges on emerging as the beneficiaries post-consolidation.

Additionally, Alibaba's recent dialogue surrounding "AI infrastructure" highlights its awareness of the current dynamics. Vice President of Alibaba Cloud, Zhang Qi, illustrated that today's AI landscape is akin to the early days of the internet circa 1996. At that time, access fees were prohibitively expensive, limiting internet growth. Lowering costs is crucial for fostering the potential for an application explosion.

Apart from the recent model price drops announced at the 2024 Yunqi Conference, Alibaba also introduced a new generation of open-source models, launching over 100 different models encompassing a range of large language models, multi-modal models, mathematical models, and code models, thus achieving a record in open-source model variety.

Further commenting, CTO Zhou Jingren reaffirmed Alibaba Cloud's commitment to an open-source strategy aimed at allowing developers the autonomy to optimize model capabilities according to their unique business scenarios, catering to specific enterprise needs.

As of mid-September 2024, the Tongyi Qianwen open-source model downloads surpassed 40 million, with over 50,000 derivative models in the Qwen series, positioning it as one of the leading open-source model collections worldwide, second only to the Llama framework which boasts nearly 350 million downloads globally.

In the aftermath of what some are calling a "model war," industry leaders have begun to pivot focus, suggesting that creating robust applications is more crucial than mere model proliferation. 百度 Chairman Li Yanhong emphasized the necessity for a rich AI-native application ecosystem built on foundational models for those models to hold any substantial value.

Despite over 190 registered large models indicated by the National Cyberspace Administration and exceeding 600 million registered users, the industry continues to struggle with the so-called "last mile" issue: the scarcity of practical applications and the challenge of making models workable and relevant, especially in specialized sectors like healthcare and finance. Pure data feeding won't suffice to yield applicable models.

Large enterprises cannot feasibly dive into every niche sector to complete this last mile but could instead cultivate an encompassing application ecosystem, allowing downstream enterprises or developers to create tailored models that meet specific needs, optimizing resource allocation while gathering valuable data for foundational model improvements.

By implementing price cuts and promoting open-source initiatives, Alibaba effectively reduces entry barriers for utilizing large models, hoping to validate their application potential through increased accessibility, encouraging participation from more enterprises and creators. Only when large models can truly address the intricate needs of businesses can the ecosystem flourish, culminating in an industry evolution into a new phase.

However, it seems likely that the end of the "model war" may result in only a handful of dominant players remaining, poised to establish themselves as the backbone of the large model industry. Consequently, prominent firms in this sector are unlikely to step back from the price competition, protecting their market share fiercely. Additionally, numerous unicorns are striving to innovate and carve out their niche through price strategy, with some believing smaller models might offer better cost-effectiveness.

Notably, this recent price war initiated by Alibaba wasn't the first spark; it arose from "DeepSeek V2," which significantly undercut standard pricing models by offering API framework at prices starting at 1 yuan per million tokens for computation and 2 yuan per million tokens for inference amidst a backdrop where prices hovered around several hundred per token.

Looking ahead, the expected consolidation phase for large models might unfold over the next 2-3 years. Although the ultimate survivors in this space will be few, each firm is compelled to exhaust available strategies for survival. The pivotal question remains: once the easy-to-reach gains have been harvested, will current paradigms suffice as viable strategies going forward?

While discussions of the ongoing price war abound, perspectives within the industry vary substantially. Li Kaifu, founder of Zero One Technology, remarked that an aggressive price war is unnecessary since assessing large models goes beyond mere affordability; an understanding of the technology underpinning the models is vital. Valuing subpar technology as being cheap won't establish a sustainable business.

Tan Daitian, president of Volcano Engine, also stressed that the primary focus should reside in expanding application coverage rather than revenue figures, emphasizing the importance of robust model capabilities to unlock new applications.

Consequently, the essence of the price war surfaces from inadequacies in product performance. With most model capabilities converging, significant gaps between competitors remain elusive, leading firms to leverage pricing tactics to boost adoption and market share.

Nonetheless, after low-hanging fruits are picked clean, new challenges await: Will enterprises withstand subsequent price wars? Can large models differentiate themselves from competitors? Will they be among the few to endure? These questions must be answered.

As such, while engaging in price wars, large model firms are also acutely aware of the critical nature of product quality, technological advancement, and cash flow management. Balancing the pressure to lower prices while striving to enhance technological distinctions and model performance will be requisite for building a sustainable business cycle.

Typically, large model firms do not rely solely on price aggression. Model inference depends on three critical variables: time, cost, and the number of generated tokens, where frequency and concurrent capabilities matter significantly. As complexity rises in real operational scenarios, the need for concurrent processing often escalates. However, the prevalent focus on price cuts involves largely pre-structured models that lack support for increased concurrency. Truly scalable, high-performance models that permit high concurrency have yet to see substantive price reductions.

Additionally, employing technology to optimize inference costs is essential. For example, Baidu's Zhizuo platform has showcased optimizations in model training processes, achieving over 98.8% utilization efficiency across its computation clusters, which not only enhances performance but also mitigates shortcomings in computing capacity.

Microsoft's CEO Satya Nadella spotlighted a similar trajectory, revealing that GPT-4's performance had escalated sixfold over the past year, while costs reduced to merely one-twelfth of previous figures, exemplifying transformative gains in performance-to-cost ratios.

Ultimately, the emphasis lies in fostering unique and differentiated products. Low-price strategies generate ecosystem development prospects, yet with AI's constant evolution, the rapid pace of technological advancements and product lifecycle shortening compel a focus on addressing user challenges beyond mere competitive pricing.

Currently, the commercial rationale underpinning the large model sector has transitioned from merely competing on costs to fostering ecosystems that embrace both innovation and performance enhancements. While price reductions may play a role in shaping ecological barriers, leveraging technology as a means to decrease costs is fundamental to accelerating large models into generating tangible value.

In the periods ahead, the new battleground for large model enterprises will center upon the cost-to-performance ratio, demanding an elevation in both quality and robustness against the current pricing landscape, cultivating richer, more diverse model capabilities. This shift may not yield immediate "super applications," but can undoubtedly draw in further participation from small enterprises and startups, potentially supplying large model players with opportunities for explosive growth in the evolving landscape.

Say something