Project Overview
Project Summary:
Sori Album is a gallery application designed to enhance visual accessibility for individuals
with visual impairments. Through Sori Album, users can upload or capture photos and
receive concise image descriptions generated by Google’s Gemini AI. These descriptions
are automatically saved with the corresponding photos, enabling screen readers to read
each caption as users browse their gallery.
For deeper insights, users can tap the "Detailed Image Descriptions" button, which
activates GPT-4o to provide a richer interpretation—covering not only the visual content but
also the mood, expressions, and contextual elements of the image. Additionally, by
selecting the "Scan Text" feature, users can extract and access printed text within images.
After saving a photo in the album, users can easily search, edit the descriptions
themselves, and share the images with others or on different platforms.
Sori Album empowers visually impaired users to explore, organize, and take full ownership
of their visual content in a more meaningful and independent way.
Identifying the Challenge
The Social Problem:
Visually impaired individuals access information in digital environments using screen
readers, which convert text into speech. However, appropriate alternative text for the
majority of images is still not provided, making it impossible for them to perceive visual
content. Although various legal regulations and accessibility guidelines have been
introduced to address this issue, they remain largely ineffective.
This gap in accessibility was clearly revealed through interviews conducted by the Sigongan
team with over 150 visually impaired individuals. These interviews showed that most
image-based information in digital settings—such as photos uploaded to social media or
shared via messaging platforms like KakaoTalk—is inaccessible to visually impaired users.
Beyond the inability to perceive image content, they face additional challenges in
situations requiring image-based authentication, storing and retrieving important image
information, or sharing personal photos online. In such cases, visually impaired individuals
experience barriers due to the limitations of the digital environment.
Innovation and Uniqueness
Why Our Project Stands Out:
Sori Album is the first AI-powered gallery app designed exclusively for the visually impaired, enabling
users to independently access and manage their photos. Unlike existing services that provide brief,
one-time descriptions that disappear after viewing, Sori Album stores images alongside detailed
descriptions, allowing users to revisit and organize their visual memories effortlessly.
Leveraging advanced technologies—including Google's Gemini AI for initial captioning, GPT-4o for
in-depth contextual explanations, and NAVER's HyperCLOVA OCR for precise text extraction—Sori
Album offers comprehensive and meaningful access to visual information. Developed through
extensive interviews with over 200 blind individuals and adherence to accessibility guidelines, the
app's user-centered design addresses real-world needs and behaviors.
By transforming passive image consumption into an active, engaging experience, Sori Album bridges
a critical gap in digital inclusivity, empowering visually impaired users to fully own and interact with
their visual content.
Insights and Development
Learning Journey:
Throughout the development of Sori Album, our team gained a deep understanding of how digital
exclusion impacts the visually impaired—especially regarding photo accessibility. Interviews with
over 200 blind users revealed a key insight: they don’t just want to “know” what’s in a photo—they
want to organize, revisit, and share it like sighted users. This shifted our focus from generating simple
descriptions to building a fully navigable gallery.
To achieve this, we optimized every interface for screen reader compatibility, studying how blind
individuals use smartphones in real-life contexts. We also applied image description guidelines to
craft captions that are context-rich and truly helpful. These insights led us to build not just an
accessible app, but a truly usable, user-centered one.
Development Process:
Through over 150 interviews with visually impaired individuals, as well as feedback from
visually impaired developers, we create the most straightforward UI/UX design. The screen
layout is designed using Figma, and the development is done using Flutter. For the
backend, we use Firebase, and AI functionalities are developed using Python. After
deployment, team members systematically use the app with screen readers to review the
user flow, carefully checking if the focus moves correctly through each button, ensuring
proper labeling that, while not critical for sighted users, directly affect the experience of
visually impaired users. Furthermore, we are advancing the AI-based alternative text
generation model through prompt engineering, utilizing various methodologies such as
Few-Shot learning.