Speaker "Anirudh Koul" Details Back
-
Name
Anirudh Koul
-
Company
Microsoft
-
Designation
Data Scientist
Topic
How advances in deep learning can empower the blind community
Abstract
Motivated by making technology more accessible, we’ll explore how deep learning can enrich image understanding that can, in turn, enable the blind community to experience and interact with the physical world in a more holistic manner than has ever been possible before. The intersection of vision and language is a ripe area of research and, fueled by advances in deep learning, is shaping the future of artificial intelligence. Exploring how computer vision has evolved through history and outlining cutting-edge research in this area, we’ll explore the areas of object recognition, image captioning, visual question answering, and emotion recognition. Using a 152-layer neural network, we first discuss the successes and pitfalls of object recognition. Going beyond object classification, we attempt to understand objects in context (as well as their relationships) and describe them in a sentence. We conclude by examining the exciting area of visual question answering, which enables blind users to get answers to questions asked about their surroundings. We also briefly cover Microsoft’s Cognitive Services, the set of machine-learning APIs for vision, speech, facial, and emotion recognition, whose APIs make it straightforward for developers to integrate state-of-the-art image understanding into their own applications. By the end of the session, you’ll develop intuition about what works and what doesn’t, understand the practical limitations during development, and know how to use these techniques for your own applications.