David Imel / Android Authority
Smartphone chipsets have come a long way since the dawn of Android. While the vast majority of budget phones were woefully underpowered just a few years ago, today’s mid-range smartphones are doing just as well as their one or two year old flagships.
Since the average smartphone is more than up to the general everyday tasks, both chip manufacturers and developers have set themselves higher goals. From this perspective, it becomes clear why assistive technologies like artificial intelligence and machine learning (ML) are now taking center stage instead. But what does machine learning mean on the device, especially for end users like you and me?
In the past, machine learning tasks had to send data to the cloud for processing. There are many disadvantages to this approach, ranging from slow response times to privacy concerns and bandwidth limitations. However, modern smartphones can make predictions completely offline thanks to advances in chipset design and ML research.
To understand the impact of this breakthrough, let’s examine how machine learning has changed the way we use our smartphones on a daily basis.
The birth of machine learning on the device: improved photo and text predictions
Jimmy Westenberg / Android Authority
In the mid-2010s, there was an industry-wide race to improve camera image quality year over year. This in turn turned out to be an important incentive for the introduction of machine learning. Manufacturers realized that the technology could help bridge the gap between smartphones and dedicated cameras, even if the former had inferior hardware.
To this end, almost every large tech company began making their chips more efficient on machine learning-related tasks. By 2017 Qualcomm, Google, Apple and Huawei had all released SoCs or smartphones with accelerators for machine learning. In the years after that, smartphone cameras have largely improved, particularly in terms of dynamic range, noise reduction, and low-light photography.
More recently, manufacturers like Samsung and Xiaomi have found more novel use cases for the technology. For example, the former’s single take feature uses machine learning to automatically create a high-quality album from a single 15-second video clip. Xiaomi’s use of the technology has now evolved from just detecting objects in the camera app to swapping the entire sky if you so desire.
By 2017, almost every major tech company started making their chips more efficient on machine learning-related tasks.
Many Android OEMs are now also using machine learning on the device to automatically tag faces and objects in your smartphone’s gallery. This is a feature previously only offered by cloud-based services like Google Photos.
Of course, machine learning on smartphones goes far beyond photography. It’s safe to say that text-based applications have been around for as long, if not longer.
Swiftkey was perhaps the first to use a neural network for better keyboard predictions back in 2015. The company claimed that it had trained its model on millions of sentences to better understand the relationship between different words.
Another hallmark came a few years later when Android Wear 2.0 (now Wear OS) was given the ability to predict relevant responses to incoming chat messages. Google later called the feature Smart Reply and brought it into the mainstream with Android 10. You will most likely take this feature for granted every time you reply to a message from your phone’s notification shade.
Voice and AR: Tougher nuts to crack
While on-device machine learning is mature in text prediction and photography, speech recognition and computer vision are two areas that are still seeing significant and impressive improvements every few months.
Take, for example, Google’s instant camera translation function, which displays a real-time translation of third-party text directly into your live camera feed. While the results aren’t as accurate as their online equivalent, the feature is more than useful for travelers on a limited data plan.
High fidelity body tracking is another futuristic sounding AR feature that can be achieved with powerful machine learning on the device. Imagine the LG G8’s Air Motion gestures, but infinitely smarter and for larger uses like exercise tracking and sign language interpretation.
More about Google Assistant: 5 tips and tricks you may not know yet
As for language, at this point both speech recognition and dictation have been around for well over a decade. However, it took until 2019 before smartphones could do it completely offline. A quick demo of this can be found on Google’s recorder application, which uses machine learning on the device to automatically transcribe speech in real time. The transcription is saved as editable text and is also searchable – a boon for journalists and students.
The same technology also supports Live Caption, a feature of Android 10 (and above) that automatically generates closed captions for any media playing on your phone. In addition to serving as an accessibility feature, it can be useful when you are trying to decipher the contents of an audio clip in a noisy environment.
While these are certainly exciting features in themselves, there are also several ways they can evolve in the future. Improved speech recognition could, for example, enable faster interactions with virtual assistants, even for people with atypical accents. While Google’s assistant can process voice commands on the device, this function is unfortunately only available for the Pixel series. Nevertheless, it offers a glimpse into the future of this technology.
Personalization: The Next Frontier for On-Device Machine Learning?
The vast majority of machine learning applications today are based on pre-trained models that are generated in advance on powerful hardware. Deriving solutions from such a pre-trained model – for example generating a context-related Smart Reply on Android – only takes a few milliseconds.
Right now, a single model is being trained by the developer and distributed to all the phones that need it. However, this one-size-fits-all approach doesn’t take into account every user’s preferences. Neither can it be fed with new data that has been collected over time. As a result, most models are relatively static and only get updates every now and then.
To solve these problems, the model training process needs to be moved from the cloud to individual smartphones – a huge feat given the differences in performance between the two platforms. Nevertheless, this would allow a keyboard app, for example, to tailor its predictions specifically to your typing style. Going a step further, it could even take into account other contextual cues, such as: B. Your relationships with other people during a conversation.
Currently, Google’s Gboard uses a mix of on-device and cloud-based training (called federated learning) to improve the quality of predictions for all users. However, this hybrid approach has its limitations. For example, Gboard predicts your next likely word rather than whole sentences based on your individual habits and past conversations.
An as yet unrealized idea that SwiftKey had in mind for its keyboard back in 2015
It is imperative that this type of individual training be performed on the device as the effects of sending sensitive user data (such as keystrokes) to the cloud would be catastrophic. Apple even acknowledged this when it announced CoreML 3 in 2019, which for the first time allowed developers to retrain existing models with new data. But even then, the majority of the model must first be trained on powerful hardware.
On Android, this type of iterative model retraining is best represented by the adaptive brightness function. Ever since Android Pie, Google has been using machine learning to “watch a user’s interactions with the screen brightness slider” and retrain a model that is tailored to each individual’s preferences.
Exercising on the machine will evolve in new and exciting ways.
With this feature enabled, Google claimed a noticeable improvement in Android’s ability to predict the correct screen brightness within just a week of normal smartphone interaction. I didn’t know how well this feature worked until I migrated from a Galaxy Note 8 with adaptive brightness to the newer LG Wing, which amazingly only contains the older “automatic” brightness logic.
Why training on the device has so far been limited to a few simple use cases is pretty clear. Aside from the obvious computing, battery, and performance limitations on smartphones, there aren’t many training techniques or algorithms designed for this purpose.
While this unfortunate reality won’t change overnight, there are several reasons to be optimistic about the next decade of ML on mobile. As both technology giants and developers focus on ways to improve user experience and privacy, on-device training will continue to evolve in new and exciting ways. Maybe then we can finally call our phones smart in the truest sense of the word.