Search beyond the keyword: AI and image recognition

Apr 22nd, 2016

Thinking about the future is a frustrating experience. The future mocks us, dangling the carrot of technological advancement on a long stick of development torture. And no matter how much we march towards the future, the carrot is always tantalisingly yet infuriatingly just beyond our reach

Every now and again though, we make a breakthrough. We outfox the future and grab hold of the carrot. I believe we have just seized a huge carrot in search and are on the brink of something monumental which will change the way we think about our industry.

Developments in machine learning and artificial intelligence mean that the world of Search is finally moving beyond keywords. Our machines are becoming more human with the ability to process and recognise visual stimulus. What was once science fiction has now become a science reality. Put simply, we are teaching machines to see.

What makes a chair, a chair?

As a masters graduate of philosophy I have spent many long hours discussing the metaphysical merits of what makes a chair, a chair. However, philosophy aside, in terms of visual search this is a very real problem that we have to grapple with. How do you give a search engine the parameters to decide what is a chair and what isn’t? How do we distil the essence of “chairness” into something an algorithm can recognise? This sounds crazy until you try and describe what a chair is without simply pointing at one.

A quick Bing search will show you that the dictionary definition for chair is “a separate seat for one person, typically with a back and four legs”. But clearly there are so many varieties of chair that this is an inadequate description for a machine to go on.

In my kitchen for example, I have chairs with four legs and a back, just like the dictionary definition, but my office chair has one central leg and then 5-spider legs that come off it. Meanwhile my colleague uses a kneeling posture chair that has neither legs nor a back. And yet, as humans we can divine fairly easily that these are all chairs, implements for sitting on, without getting them confused with other sitting accessories like sofas. The complexity of defining rules for what constitutes a simple object like a chair is mind boggling.

Now imagine having to construct a set of rules and definitions not just for chairs, or indeed just for chairs and tables and other furniture but for every object in the world. That is the task of the visual search engine. And believe me it is no small thing.

Click Consult’s Carmen Jones on her kneeling chair

The breakthrough moment – Project Adam

There is a team who reside in Microsoft HQ in Redmond, Seattle who have attempted to do the impossible: to categorise every known object in the world. They’ve used the power of Bing in their quest to catalogue the billions of images that we index and then use the machine learning algorithms that we have within the Bing search engine to learn millions of visual patterns and connections.

What they came up with was Project Adam, a machine which could recognise the breed of any given dog just by taking a photo and then running that image against our vast catalogue of known data to make a correct match. The technology and pattern recognition in Adam is phenomenal and in 2014 we ran this demo to show how visual recognition could really work for a machine.

We’ve now taken that same technology to the next level and created something that I think is close to magic, because what we’ve done is to allow the blind to see…

Giving vision to the blind

The extrapolation of Adam’s pattern recognition and visual identification is to move beyond simply recognising dogs and to give a spatial awareness of what is happening around an individual. This has some obvious applications for the blind and partially sighted.

Microsoft’s mission is to empower every individual and organisation on the planet to achieve more. That mission only works if we can truly empower every individual, including those with disabilities. This technology is being used to allow the blind to see the world around them.

The video shows some of the use cases of how this technology allows blind people like Saqib to become more independent. Reading menus, navigating busy streets or even just illustrating what is happening around them. This technology is incredibly empowering, allowing people to do more and achieve more in their everyday lives.

If you want to try the visual recognition software for yourself, why not upload a photo to CaptionBot and see how the engine describes what is in your image? By doing so you will be making the algorithm smarter and be an active contributor to driving this innovation further.

The evolution of search

It’s tempting to see this new technology as being remarkably cool but not really Search. Anything shiny and futuristic is bound to jar with our established concepts and expectations. So that whilst we might appreciate the potential of something like visual recognition, when we go back to our desk Search remains a keyword driven entity.

The world is changing though, and innovations like this are just the beginning. We have to stop thinking of Search as a box and 10 blue links. The evolution of Search is a world where the box doesn’t exist anymore, but encompasses voice, image, context and myriad other things. Search is the intelligence fabric which will bind everything together, and the future beyond the box is exceptionally exciting.

Our one-day Benchmark Search Conference returns on 12th July 2016 at Manchester’s Bridgewater Hall. As well as James Murray, it features industry experts from Google, Late Rooms and Vodafone. Last year’s event was awesome, but we’re confident this year will be even better. It’s free to attend, so sign-up today!