Be My Eyes uses GPT-4 to transform visual accessibility
Be My Eyes announces new tool powered by OpenAI’s GPT-4 to improve accessibility for people who are blind or have low-vision
Be My Eyes, the mobile app that allows anyone to assist visually impaired people through live video calls, today announced Virtual Volunteer™, the first-ever digital visual assistant powered by OpenAI’s new GPT-4 language model
The new tool integrates OpenAI’s latest generative AI technology, with the goal of providing an unprecedented level of accessibility and power to the 253 million people who are blind or have low vision globally.
Be My Eyes: connecting people
Since 2012, Be My Eyes has been creating technology for the community of over 250 million people who are blind or have low vision. The Danish startup connects people who are blind or have low vision with volunteers for help with hundreds of daily life tasks like identifying a product or navigating an airport.
How the App Works
When the blind or visually impaired user requests assistance through the app, Be My Eyes sends the notification to various volunteers. The app works by linking the blind or visually impaired to a sighted volunteer depending on language and time zone. The first volunteer who responds to the request is connected with that specific user and receives the signal from the back camera of the user’s phone. The audio connection allows the user and the volunteer to carry out the task together.
With the new visual input capability of GPT-4*, Be My Eyes began developing a GPT-4 powered Virtual Volunteer™ within the Be My Eyes app that can generate the same level of context and understanding as a human volunteer.
The Virtual Volunteer
The Virtual Volunteer feature will be integrated into the existing app and contains a dynamic new image-to-text generator powered by GPT-4.
How does it work?
- Users send images via the app to an AI-powered Virtual Volunteer
- Virtual Volunteer will answer any question about that image and provide instantaneous, conversational visual assistance for a wide variety of tasks
Framing the context: the fridge example
What sets the Virtual Volunteer tool apart from other image-to-text technology available today is the context it provides, with highly nuanced explanations and conversational abilities not yet seen in the digital assistant field. For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify the items within, but also extrapolate and analyze what can be prepared with those ingredients. The tool can also then offer a number of recipes for those ingredients and send a step-by-step guide to prepare them.
The use cases are almost unlimited.
This new feature promises to not only better support the blind and low-vision community through the Be My Eyes app, but it will also offer a way for businesses to better serve their customers by prioritizing accessibility.
Tests to be run soon
Be My Eyes plans to begin beta testing this with corporate customers in the coming weeks, and to make it broadly available later this year as part of the company’s Accessible CX™ offering.
“In the short time we’ve had access, we have seen unparalleled performance to any image-to-text object recognition tool out there,” says Michael Buckley, CEO of Be My Eyes. “The implications for global accessibility are profound. In the not so distant future, the blind and low vision community will utilize these tools not only for a host of visual interpretation needs, but also to have a greater degree of independence in their lives.”
“That’s game changing,” says Buckley. “Ultimately, whatever the user wants or needs, they can re-prompt the tool to get more information that is usable, beneficial and helpful, nearly instantly.
The company has already shared a case where a user was able to navigate the railway system—arguably an impossible task for the sighted as well—not only getting details about where they were located on a map, but point-by-point instructions on how to safely reach where they wanted to go.
GPT-4: a breakthrough
The difference between GPT-4 and other language and machine learning models, explains Jesper Hvirring Henriksen, CTO of Be My Eyes, is both the ability to have a conversation and the greater degree of analytical prowess offered by the technology. “Basic image recognition applications only tell you what’s in front of you”, he says. “They can’t have a discussion to understand if the noodles have the right kind of ingredients or if the object on the ground isn’t just a ball, but a tripping hazard—and communicate that.”
Fixing the digital difficulties
Traversing the complicated physical world is only half the story. Understanding what’s on a screen can be twice as arduous for a person who isn’t sighted. Screen readers, embedded in most modern operating systems, read through the pieces of a web page or desktop application line by line, section by section, speaking each word. Images, the heart of communication on the web, can be even worse.
At Be My Eyes they’re able to show GPT-4 the webpage and the system knows—after countless training hours where deep learning algorithms build relationships to understand the “important” part of a webpage—which part to read or summarize.
This can not only simplify tasks like reading the news online, but grants people who need visual assistance access to some of the most cluttered pages on the web: shopping and e-commerce sites. GPT-4 is able to summarize the search results the way the sighted naturally scan them—not reading every minuscule detail but bouncing between important data points—and help those needing sight support make the right purchase, in real-time.
This is a fantastic development for humanity, but it also represents an enormous commercial opportunity.
Maker Faire Rome – The European Edition has been committed since its very first editions to make innovation accessible and usable to all, with the aim of not leaving anyone behind. Its blog is always updated and full of opportunities and inspiration for makers, makers, startups, SMEs and all the curious ones who wish to enrich their knowledge and expand their business, in Italy and abroad.
Follow us, subscribe to our newsletter: we promise to let just the right content for you to reach your inbox.