Ocularis

Shaaran Lakshminarayanan

Shaaran Lakshminarayanan

Chennai, Tamil Nadu

0 0
  • 0 Collaborators

A virtual assistant for the visually impaired – a containerized and unified solution to tackle some of the daily challenges faced by millions worldwide who have complete/partial visual impairment. ...learn more

Project status: Published/In Market

Networking, Internet of Things, Artificial Intelligence

Intel Technologies
DevCloud

Code Samples [1]

Overview / Usage

A central promise of technology and all its advancements is to bring a better world through solving inherent problems of human society and its individuals. The predominance of smart devices is so obvious that it is now impossible to imagine a world without them around. However, there appears to be a lack of consideration regarding accessibility in contemporary technology, with the industry being heavily biased towards people without impairments of any kind. Nevertheless, things like a smaller market size and weak representation in government and industry are no longer viable justifications for a pervasive tendency towards the aforementioned specific catering. Apart from non-technical impediments, there are also some major technical challenges in providing these people with high quality services on par with natural human capabilities. However with the advent of new computing milestones such as machine learning and computer vision, we believe that we are in a unique era of human history such that these challenges have become easier to surmount. One major challenge where we see potential to capitalize on these technologies and make a difference is in the realm of visual impairment.

Visual Impairment is a decreased ability to see to a degree that causes problems not fixable by usual means, such as glasses [1]. Hence, ordinary tasks become difficult and make them less independent. Simple things such as being aware of their immediate surroundings and recognizing who they are conversing with become exceedingly difficult. We believe we can address these issues using Microsoft Cognitive Services and eventually with Windows IoT core cognitive capabilities, along with a cheap device we call ‘Ocularis’ – meaning ‘eye’ in Latin - that we intend to design and build. Ocularis is an intelligent digital assistant who can understand requests of its user via voice commands and can provide thoughtful responses through her voice. Ocularis tries to find answers to questions by gathering information from its integrated knowledge, the internet, or from the environment that she sees through a high-quality built-in camera. The concept for Ocularis is a simple one, with the aim to make it as seamless as possible to integrate into daily lives. Hence, we want to incorporate the device as a wearable necklace to provide a nice viewpoint for the camera while minimizing impedance with the user’s activity. Essentially, the device is intended to be a third eye for the user and make tasks, simple or complex, much easier to deal with.

Methodology / Approach

The heart of Ocularis is empowered by a Python application hosted on Windows IoT Core. We have also developed a UWP version of the application that is available on the repository, however the latter platform is still under development. This software operates a wearable light-weight device which is supposed to placed as a necklace around the neck of its user. Ocularis consists of a Camera, Audio I/O Module, Wi-Fi Module, Bluetooth Module, Cellular Module, a low consumption processor, and RAM. These modules enable the device to interact with its user via voice and delegate the processing needs to the cloud. The variety of connectivity modules ensure us that Ocularis is always connected to the internet when available. Having high quality internet connection is crucial to some functionalities because Ocularis is heavily dependent on Microsoft Azure Cognitive Services such as Computer Vision and several Bing APIs to bring the incredible features to the customer. Obviously, internet connection introduces latency and uncertainty to our product. We have taken two measures to mitigate this problem. First, Ocularis tries to return the answer to its user’s request by going through the Ocularis Internal Knowledge Base which is sufficient for some basic needs such as asking about time, date, calendar, or anything that is accessible from user’s phone like messages, emails, and contacts. Secondly, we try to minimize

the number of requests that we need to make to Azure Cognitive Services by using a server that acts as a mediator between Ocularis and Azure. Without this mediator, we would need several back and forth between Ocularis and Azure Cognitive Services to be able to serve some requests. However, by putting this Mediator in place, Ocularis just needs to send one request to the Mediator and then Mediator will be responsible to handle the required conversation with Azure. Further, we have minimized the number of API calls even more by having pre-loaded audio files that will consistently be used throughout interacting with Ocularis in a static audio folder. In addition to Ocularis Internal Knowledge Base, this device has a built-in Internal Image Processor in order to alarm the user in case of detecting potential danger of falling down, getting out of a sidewalk, or getting close to an obstacle. This functionality is not supposed to be a replacement to guide dogs but is complementary to them. It affords a higher level of independence than with previous arrangements. It should be mentioned that existing Depth and Obstacle Detection approaches do not guarantee 100% accuracy, but we aim to implement and train the model which provides the most confidence and deterministic behavior. This process needs to be done by the device and cannot be delegated to the cloud because we need a near real-time alarming system to increase its reliability. This feature is and will consistently be under development, and will be released in beta phases to the users with accuracy warnings.

Lastly, to make the conversations between user and Ocularis seem intelligent and pleasant, Ocularis attaches a context to user requests and provides a response based on this new context and previous contexts that have changed the state of conversation. In other words, Ocularis does not interpret user’s commands in isolation but it takes the context of conversation and its state into consideration. As an example, when a user asks a question like “Who is the current prime minister of Canada?” and Ocularis notifies its user that she has found 5 relevant pieces of information, Ocularis initiates a state machine (context) to the pair of request and response. So, when the user asks for further questions such as “give me all your findings”, “give me the first relevant one”, or “repeat it again please”, Ocularis knows how to respond to the request according to the state of the conversation. Ocularis State Manager is responsible for transitioning through the possible set of states and even memorizing the state of previous requests. As an example, a user can ask a question like “who was in front of me?” after getting the required information about Canada Prime Minister and since device has answered to the question “who is in front of me?” just before the question about Prime Minister, she can respond back quickly without sending requests to Azure Cognitive Services. Moreover, Ocularis State Manager helps with navigating Ocularis internal settings.

Technologies Used

Windows IoT Core. We decided to use Windows IoT core since it is free and its application development model

boosts our productivity in comparon to other existing development models. Windows IoT fosters development

by providing us with a rich set of features such as Windows Machine Learning, Windows IoT Edge, a fully

compatible Speech Synthesizer with SSML, a fully compatible Speech Recognizer with SRGS grammars. Unlike its

Linux counterparts, this operating system is designed for IoT projects from the ground up and it’s not a full blows

operating system which we believe would result in more efficient power consumption. Lastly, since one of our

future goals is to integrate Cortana into Ocularis, we believe Windows IoT is viable choice for our requirements.

Azure Cognitive Services: Computer Vision. This service enables Ocularis to describe the environment in front of

the user and recognize celebrities in a picture. Furthermore, this service can extract text and handwriting from

images so Ocularis can read them to the user.

Azure Cognitive Services: Face. This service can provide a detailed description about people in a picture. Face

service provides us with gender, age, emotion, accessories and several other appearance attributes of detected

faces in the picture. Moreover, it can recognize familiar faces and return their names. Also, it returns the exact

location faces in picture and Ocularis leverages this information to associate a face description with its location in

the picture. For example, it says “the man on the right seems angry but the one on left seems calm”.

Azure Cognitive Services: Speech to Text. If we cannot catch the user intent on the device, then we will forward

the user’s request to Speech to Text service via Mediator. In addition to the main functionality, Bing Speech API

can pass the extracted text to LUIS in order to detect the actual intent of the user. So, we can have both the text

Repository

https://github.com/gaurav-karna/Ocularis

Comments (0)