CS Majors Win 'Best Hack for Health' with App for the Visually Impaired
A team of first-year Georgia Tech computer science (CS) majors has created an AI-powered app that empowers visually impaired people to lead more independent lives.
JARVIS –yes, the team members are big Iron Man fans– uses open-source AI tools, machine learning frameworks, and computer vision techniques to provide users with a richer, more thorough understanding of their immediate surroundings.
CS majors Arnav Chintawar, Dhruv Roongta, and Sahibpreet Singh developed JARVIS as their entry for Cal Hacks 10.0, a hackathon organized by a University of California Berkeley’s student group. The team won multiple awards, including Best Hack for Health, Best Use of Zilliz, and Best Use of GitHub during the in-person event in late October that attracted more than 2,000 participants.
The multifunctional app acts much like a personal assistant for users. It can be integrated with a smartwatch and can:
- Recognize and interpret a person’s environment, offering detailed scene descriptions.
- Read text and make recommendations.
- Recognize friends and family.
Along with these capabilities, JARVIS can perceive, interpret, and describe non-verbal cues of an individual near the user.
A visually impaired family member inspired the team to develop JARVIS. Along with volunteering with local organizations, the team consulted advocates from the Center of the Visually Impaired to understand better the difficulties faced by the community and how the app could help.
“We set out to bridge the accessibility gap for blind and visually impaired individuals by giving them unprecedented situational awareness of their surroundings. We hope that JARVIS can improve the quality of life for this community,” said Roongta.
The team started by making JARVIS easy to use. Responding to spoken queries, JARVIS could help a user meet a friend for dinner. The app would describe the general setting and layout of the restaurant, and provide an approximate number of people present and the activities observed.
“We used a speech-to-text and text-to-speech model similar to Siri or Alexa to ensure JARVIS would be easy to access and seem familiar to users,” said Chintawar.
The user would then ask JARVIS to scan the room for friends, family, or others based on a database of images uploaded by the user. Once it recognizes someone, the app says their name and where they are in the room. The team estimates its identity classification model supporting this functionality to be about 95% accurate.
JARVIS would then analyze the friend’s facial expressions as the pair chats before ordering. It conveys the detected emotions via audio descriptions or haptic pulses through the user’s smartwatch. The pulses vary in intensity based on the level of emotion it observes.
When they are ready to review the menu, the user could ask JARVIS to list the appetizers or the vegetarian options. This capability integrates optical character recognition technology with the team’s text-to-speech model, which allows the app to make relevant recommendations.
“This project has broadened our technical knowledge and instilled in us a profound sense of empathy and a commitment to enhancing the lives of visually impaired individuals,” said Singh.
Building on its success, the team is pushing JARVIS forward with an eye toward future entrepreneurial competitions. Planned upgrades include extending compatibility to a broader range of wearable computing devices and developing more robust description capabilities.
“We look forward to participating in the 2024 Georgia Tech InVenture Prize competition with an improved version of JARVIS. This will likely include customizing the vision model and fine-tuning it on custom data,” said Roongta.
Additional details about the technologies behind JARVIS and the team’s development approach are available on its Cal Hack 10.0 hackathon development site.