For many average users, the term "AI smartphone" still feels predominantly abstract, even though it has been over seven years since phone manufacturers first dubbed phones as "AI smartphones." Despite this, "AI" has yet to emerge as a significant driving force in the smartphone market.
In contrast, even the relatively newer concept of "foldable screens" appears to resonate more with consumers, establishing itself as a mainstay in the high-end Android market. The introduction of "triple foldable phones" this year has sparked widespread discussion online, creating a buzz that underscores interest in innovative hardware. So, why has a groundbreaking technology, for which global tech giants like Nvidia and Microsoft have poured resources into, failed to make a substantial impact in mobile devices?
At the heart of this question lies the nature of "AI smartphones" themselves. Similar to AI in PCs, they have not significantly altered the physical form factor of devices. The advancements attributed to AI mostly stem from functional enhancements rather than revolutionary changes in user experience. The foundational driving force of this AI wave can be traced back to that small yet powerful "chip" housed within every smartphone.
Reflecting on the early days of smartphone and AI integration, we can trace back to September 2017—a pivotal month in the tech landscape. In this brief period, two significant announcements graced the smartphone industry. On September 2, Huawei unveiled its Kirin 970 chip, recognized as the world's first smartphone AI computing platform equipped with an integrated Neural Processing Unit (NPU).
Advertisement
The excitement surrounding this development coincided with the launch of Apple's iPhone X, which would shape the company's design ethos for the following five years. While much attention was paid to Apple's foray into the "full-screen era" and the introduction of the "notch" design, fewer people recognized that the A11 chip inside the iPhone X also enhanced the device's "AI capabilities."
The built-in neural engine, designed by Apple, was specifically developed for machine learning. This dual-core architecture achieved impressive processing power, capable of up to 600 billion operations per second (0.6 TFlops), and was tasked primarily with machine learning responsibilities such as facial recognition and Animoji functionality. This mechanism allowed the chip to effectively offload tasks from the CPU and GPU, thereby optimizing computational efficiency while minimizing energy consumption.
Indeed, terms like "NPU," "AI computing power," and "machine learning," which have become ubiquitous in discussions by tech manufacturers in recent years, were already being employed in smartphone chips seven years ago. At that time, however, NPUs were primarily utilized to accelerate everyday functions rather than handle complex "AI tasks." Functions like scene recognition during photography, color optimization, and facial recognition in low-light conditions were some applications of these early technologies.
Dr. Lu Zhongli, Deputy General Manager of MediaTek's Computing and AI Technology Division, shared insights with Tai Media App: "The most perceptible AI capabilities users experience typically manifest during photography, encompassing multiple functionalities in both image capture and video display. For instance, AI technology has been widely applied to features like automatic frame rate switching and dynamic range enhancement in photos and videos, as well as intelligent noise reduction."
Nevertheless, these innovations represent early-stage AI applications, often referred to as "analytical AI." This contrasts sharply with the flourishing "generative AI" discourse today. The former primarily focuses on enhancing specific experiences in predetermined contexts, which can be likened to a "minor upgrade" of traditional fixed intelligent algorithms.
Contemporary conversations around "generative AI" highlight its capabilities to use deep learning and big data analytics to generate entirely new content—be it text, images, or audio. This technology not only mimics existing data patterns but also innovatively extrapolates from them to produce outputs characterized by diversity and unpredictability.
Compared to traditional AI functionalities reliant on fixed algorithms or enhancements to existing content, generative AI encompasses a far broader application scope—from natural language processing to artistic creation—offering efficient solutions that markedly elevate automation and overall productivity.
The transformations in AI capabilities are closely tied to chip performance advancements. Early NPUs and mobile chips exhibited limited computational power; attempting to run generative AI on these chips would result in delays, with image generation taking several minutes and text processing taking even longer, making real-world applications virtually unfeasible.
In recent years, the rise of "AI chips" has emphasized dedicated AI capabilities. For example, with the launch of the iPhone 16 series last month, Apple introduced the A18 Pro chip, showcasing a leap in performance metrics. While the A11 chip featured a dual-core neural network engine capable of 600 billion operations per second, the A18 Pro’s neural engine boasts a stunning 16-core architecture, delivering 35 TOPS (trillion operations per second).
Rough calculations suggest that the AI power of the A18 Pro is approximately 58 times that of the A11 chip. Similarly, MediaTek recently launched the Dimensity 9400, enhancing its AI capabilities by integrating the company's cutting-edge eighth-generation AI processor, NPU 890.
The Dimensity 9400 significantly boosts the understanding of lengthy texts and extends AI model support, enabling the device to operate models at a remarkable speed of 50 tokens per second, while also accommodating multimodal AI applications. This broadens the practical AI application scenarios significantly for smartphones.
To illustrate this with a real-world example, consider how current smartphone voice assistants handle the command "I'm hungry." They typically respond by mechanically opening delivery apps, maps, or search engines. However, the AI capabilities supported by the Dimensity 9400 empower the assistant to learn your preferences and habits, facilitating smarter restaurant suggestions nearby.
Additionally, if you photograph a math problem, traditional smart assistants may only identify it as a math problem or search and provide a standardized answer online. In contrast, an assistant powered by the Dimensity 9400 can perform local inference, acting like a tutor to walk users through the entire process from understanding the problem to arriving at the solution.
This evolution shows that powerful, versatile AI capabilities do exist, but previously required the extensive computational resources of cloud services or workstations. Tools like ChatGPT and other AI-powered assistants exhibit exceptional content generation and text summarization capabilities thanks to robust server infrastructures that underpin them.
Looking ahead, the competition among "AI smartphones" will likely hinge less on concepts and more on-chip capabilities. For device manufacturers, needing to invest hugely in large-scale cloud computational resources may not satisfy consumer expectations, as users are typically resistant to paying extra for what they assume should be standard "AI functionalities".
Moreover, relying extensively on cloud-based AI functionalities might lead to issues of consistency in user experience due to network latency and fluctuations. This inconsistency undermines the potential of "zero differentiation" AI features to entice consumers to upgrade their devices. Thus, the transition of AI capabilities to on-device processing is increasingly imperative.
This trend isn’t exclusive to "AI smartphones"—adjacent "AI PCs" have been following a similar trajectory. Predictions indicate that over the next few years, companies like Apple, MediaTek, Qualcomm, and other smartphone chipmakers will focus on integrating AI capabilities into their chip iterations. This shift will not only support exceptional applications but will also fortify hardware-level advancements that can unlock further innovations in functionality.