To my surprise, Google Goggles actually launched last night, not 12 hours after I posted about it yesterday. I’ve just spent a while playing around with it on my Android handset. Search times are, as expected, much more than one second, more in the anticipated 5-10 second range. Good to see that even Google can’t break the laws of physics. The app shows a pretty-but-pointless image analysis animation to make the wait seem shorter, almost exactly like my tongue-in-cheek suggestion from yesterday.
The engine covers all the easy verticals (books, DVDs, logos, landmarks, products, text, etc). The recognition quality is very good, though the landing pages are often a bit useless. It will take a bit of living with it to see how much use it is as a tool rather than a tech demo.
The major worry is that it may end up being too broad-but-shallow. For example, they do wine recognition, but the landing pages are generic. Perhaps visual wine recognition would be better built into Snoot or some other dedicated iPhone wine app. Or Google could take the route Bing recently took with recipes, and build rich landing pages for each vertical. Because of the nature of current visual search technology, Goggles is essentially a number of different vertical searches glued together, so this is more feasible than it would be for web search.
The tubes are a-buzz with some news that Google are working on a mobile visual search system. No surprise there really, but it is interesting to see the current state of their prototype. The info comes from a CNBC documentary called Inside Google. A clip is up on YouTube, and the relevant section starts three minutes in.
The main thing that surprised me is that the interviewer mentioned response times of less than a second. I find that somewhat unlikely, and I think it’s probably a case of a journalist misunderstanding some fine (but crucial) distinctions.
In terms of the actual visual search engine, there’s no problem. Our own visual search engine can serve searches in a couple of hundred milliseconds, even for millions of images. The issue is transmitting the user image to the server over the mobile phone network. Cell networks are still pretty slow, and have high latency even to transmit 1 byte. Even after aggressive compression of the image and after playing some other fancy tricks, we see typical user waiting times between 4 and 5 seconds over a typical 3G connection. On a 2G connection, the system is basically unusably slow . I strongly suspect the Google app will perform similarly. In many cases it’s still much faster than pecking away on an on-screen keyboard, though because the user is waiting, it can feel longer. It would almost be worth giving the user a pointless maths puzzle to solve, tell them it’s powering the recognition, and they would probably be happier!
In any case, the Google demo is an interesting, if not unexpected, development in the visual search space. While we’re waiting to see the final version of Google Visual Search, Android users can try out our PlinkArt app, which does visual search for famous paintings. It’s live right now!