There’s an extensive blog piece on the Android Developers Blog, where the team opens up about the latest enhancement of the screen reader feature from the Android Accessibility Suite.
Today, thanks to Gemini Nano with multimodality, TalkBack automatically provides users with blindness or low vision more vivid and detailed image descriptions to better understand the images on their screen.
– Android Developers Blog, September 2024
TalkBack includes a feature that provides image descriptions when developers haven’t added descriptive alt text. Previously, this feature relied on a small machine learning model called Garcon, which generated brief and generic responses, often lacking specific details like landmarks or products.The introduction of Gemini Nano with multimodal capabilities presented an ideal opportunity to enhance TalkBack’s accessibility features. Now, when users opt in on eligible devices, TalkBack leverages Gemini Nano’s advanced multimodal technology to automatically deliver clear and detailed image descriptions in apps like Google Photos and Chrome, even when the device is offline or experiencing an unstable network connection.
Google’s team provides an example that illustrates how Gemini Nano improves image descriptions. First, Garcon is presented with a panorama of the Sydney, Australia shoreline at night – and it might read: “Full moon over the ocean”. Gemini Nano with multimodality, however, can paint a richer picture, with a description like: “A panoramic view of Sydney Opera House and the Sydney Harbour Bridge from the north shore of Sydney, New South Wales, Australia”. Sounds far better, right?
Utilizing an on-device model like Gemini Nano was the only practical solution for TalkBack to automatically generate detailed image descriptions, even when the device is offline.
The average TalkBack user comes across 90 unlabeled images per day, and those images weren’t as accessible before this new feature. The feature has gained positive user feedback, with early testers writing that the new image descriptions are a “game changer” and that it’s “wonderful” to have detailed image descriptions built into TalkBack
.
– Lisie Lillianfeld, product manager at Google
When implementing Gemini Nano with multimodality, the Android accessibility team had to choose between inference verbosity and speed, a decision partly influenced by image resolution. Gemini Nano currently supports images at either 512 pixels or 768 pixels.
While the 512-pixel resolution generates the first token almost two seconds faster than the 768-pixel option, the resulting descriptions are less detailed. The team ultimately prioritized providing longer, more detailed descriptions, even at the cost of increased latency. To reduce the impact of this delay on the user experience, the tokens are streamed directly to the text-to-speech system, allowing users to begin hearing the response before the entire text is generated.
While I’m not yet boarding the AI hype train fully, AI-powered features like this are stunning – just think about the potential! And then, there are stories like this one that makes you want to tone down this “wonderful” progress of ours:
👇Follow more 👇
👉 bdphone.com
👉 ultraactivation.com
👉 trainingreferral.com
👉 shaplafood.com
👉 bangladeshi.help
👉 www.forexdhaka.com
👉 uncommunication.com
👉 ultra-sim.com
👉 forexdhaka.com
👉 ultrafxfund.com