By: JC Hsu, Corporate Vice President and General Manager of MediaTek’s Wireless Communications Business Unit
The revolution in AI technology continues to enable users to create and benefit from more sophisticated content. Generative AI is leading this revolution by uncovering new perspectives and possibilities, and by introducing a new level of creativity into our daily lives that are easier to access than ever before. The user experience is being transformed; from AI-generated musical performances, artwork, and text-based documents, to AI-assisted programming and code generation.
The trending areas for generative AI development are in natural language processing (NLP) models like ChatGPT, which is a large language model specifically designed for chatbots, and text-to-image generation models, such as Midjourney and DALL-E. With these innovative solutions, previously impossible things are now within our reach. We can now generate stunning visuals based on text descriptions and create realistic conversations through simple chatbot interfaces. The possibilities extend far beyond; by allowing us to tap into new sources of creativity, from both machines and humans, the potential of generative AI is just beginning to be explored.
Generative AI is based on Transformer
Transformer, a breakthrough AI model in natural language processing, was introduced in 2017 and has since become the foundation model of generative AI. In 2020, Transformer was extended to vision and voice, demonstrating superior performance compared to existing solutions such as convolutional neural networks (CNN) and recurrent neural networks (RNN) in terms of both accuracy and quality.
Since 2021, MediaTek’s AI Processing Unit (APU) has been optimized to handle Transformer models and has been utilized by smartphone brands to bring vision and voice applications to the market. Through its collaboration with leading mobile companies, MediaTek is proactively improving the way we interact with our smartphones with AI apps that use Transformer models.
MediaTek APU & NeuroPilot is ready for Transformer
NeuroPilot, MediaTek’s AI Platform, is a comprehensive solution for deploying Transformer-based AI applications. It is designed to overcome the complex computation flow associated with these models, while also taking advantage of MediaTek’s APU design that reduces DRAM bandwidth to ensure optimal SoC performance and power efficiency. NeuroPilot includes an integrated suite of powerful tools that help simplify the development and deployment of AI models, with end-to-end execution of Transformer AI models on the APU. With NeuroPilot, developers have everything they need to quickly and easily create cutting-edge Transformer-based applications.
MediaTek NeuroPilot is already enabling manufacturers to take advantage of the proven Vision Transformer (ViT) and Voice Transformer capabilities of the APU.
Real-world implementation and advantages
The vivo X90 Pro, an incredible new smartphone that uses the MediaTek Dimensity 9200, our latest flagship 5G smartphone chip that includes the new MediaTek APU 690, sets a new standard for mobile device photography and voice recognition through its innovative use of Vision and Voice Transformer technology.
By leveraging ViT technology, the vivo X90 Pro can achieve unprecedented accuracy in object segmentation, enabling it to adjust and correct photography and videography at the object level, drastically improving low-light photography. The ViT technology is also capable of accurately extracting a person from the background (portrait capture), even down to their hair, and then applying different background filters in real-time to create stunning special effects that truly set the X90 Pro apart from the competition in video capture and live-streaming.
The Dimensity 9200 platform also boasts Transformer-based Voice AI, which provides on-device automatic speech recognition features, greatly upgrading response speed and user privacy by ensuring that data is not sent to the Cloud for processing. This cutting-edge technology marks the first time that Transformer Voice AI models have been optimized for use on a mobile APU, providing a 30% improvement in power consumption and a 50% improvement in performance compared to the previous-generation CPU solution.