To my surprise, Google Goggles actually launched last night, not 12 hours after I posted about it yesterday. I’ve just spent a while playing around with it on my Android handset. Search times are, as expected, much more than one second, more in the anticipated 5-10 second range. Good to see that even Google can’t break the laws of physics. The app shows a pretty-but-pointless image analysis animation to make the wait seem shorter, almost exactly like my tongue-in-cheek suggestion from yesterday.
The engine covers all the easy verticals (books, DVDs, logos, landmarks, products, text, etc). The recognition quality is very good, though the landing pages are often a bit useless. It will take a bit of living with it to see how much use it is as a tool rather than a tech demo.
The major worry is that it may end up being too broad-but-shallow. For example, they do wine recognition, but the landing pages are generic. Perhaps visual wine recognition would be better built into Snoot or some other dedicated iPhone wine app. Or Google could take the route Bing recently took with recipes, and build rich landing pages for each vertical. Because of the nature of current visual search technology, Goggles is essentially a number of different vertical searches glued together, so this is more feasible than it would be for web search.
The tubes are a-buzz with some news that Google are working on a mobile visual search system. No surprise there really, but it is interesting to see the current state of their prototype. The info comes from a CNBC documentary called Inside Google. A clip is up on YouTube, and the relevant section starts three minutes in.
The main thing that surprised me is that the interviewer mentioned response times of less than a second. I find that somewhat unlikely, and I think it’s probably a case of a journalist misunderstanding some fine (but crucial) distinctions.
In terms of the actual visual search engine, there’s no problem. Our own visual search engine can serve searches in a couple of hundred milliseconds, even for millions of images. The issue is transmitting the user image to the server over the mobile phone network. Cell networks are still pretty slow, and have high latency even to transmit 1 byte. Even after aggressive compression of the image and after playing some other fancy tricks, we see typical user waiting times between 4 and 5 seconds over a typical 3G connection. On a 2G connection, the system is basically unusably slow . I strongly suspect the Google app will perform similarly. In many cases it’s still much faster than pecking away on an on-screen keyboard, though because the user is waiting, it can feel longer. It would almost be worth giving the user a pointless maths puzzle to solve, tell them it’s powering the recognition, and they would probably be happier!
In any case, the Google demo is an interesting, if not unexpected, development in the visual search space. While we’re waiting to see the final version of Google Visual Search, Android users can try out our PlinkArt app, which does visual search for famous paintings. It’s live right now!
So the visual search story of the day is that Amazon has acquired SnapTell. This is a really natural fit – SnapTell have solid technology, and Amazon are one of the best use cases. Not too surprised to hear the deal has been done – SnapTell has been conspicuously quiet for several months, and word was that they either had to exit or secure another funding round before the end of the year. So congratulations are in order to everyone at SnapTell on securing what seems like an ideal exit.
The big question now is how this changes the playing field for other companies in the visual search space. I would assume Amazon will move SnapTell’s focus away from their enhanced print advertising service and concentrate on image recognition for books, CDs, DVD, etc. (Up to now, Amazon has been doing this with human-powered image recognition, which was nuts.) While this makes perfect sense for Amazon, it’s going to mean more rather than less opportunities for companies still focused on the general visual search market.
So I guess this is an ideal point to mention the open secret that I’m currently co-founding Plink, a new visual search engine similar in capability to SnapTell. While our demo shows some familiar use cases, we’re working on taking the technology in some entirely new directions. Visual search is very young, there’s a whole lot still to do! Anyone interested in visual search, feel free to contact me.
When it rains, it pours. Amazon is joining the visual search party, but with a twist.
Today Amazon released an iPhone app with a feature called “Amazon Remembers”. You take a picture of some product you’re interested in and the app uploads it to Amazon. You can then revisit your list later from a browser, e.g. to buy the item. The interesting bit is that Amazon attempts to generate a link to the product page based on your image. Examples here. This isn’t instant, and may take anywhere from five minutes to 24 hours.
Fascinatingly, the back end is apparently not computer vision based. It uses Mechanical Turk, where Amazon is paying people $0.10 per image to generate the links. See here. This is quite amazing to me. I have no idea if Mechanical Turk is deep enough to support this app if it truly becomes popular, but I suppose in that case Amazon could set up a dedicated call-centre type operation to handle the image recognition.
So, visual search companies now have a very direct measure of the value of a correct image search. Judging by the current set of images on Mechanical Turk, a fully automated solution is not possible. However, a hybrid system where some easy categories like books are recognised automatically and harder cases are farmed out to Mechanical Turk would clearly translate into significantly lower costs.
(Of course it’s possible Amazon are already doing this, though I did see several books in the Mechanical Turk requests).
It seems I missed something fairly major in my round up of mobile visual search companies last week. Nokia have a serious project in the area called Point and Find. You can see a demo here. From MSearchGroove:
“Nokia is committed to owning the visual search space and has committed a staff of 30 to build the business and further develop the technology. The business area has the buy-in of Nokia senior execs and â€œquite largeâ€ funding from the company“
The technology comes from an acquisition of a valley startup called Pixto just over a year ago. Nokia’s service is apparently due for launch soon, initially recognising movie posters only.
This post is an overview of all the companies developing mobile image recognition search engines. For people to whom that means nothing, see theseposts. My robotics readers will have to forgive me for my sudden obsession with this topic, but I work on visual search engines for my PhD, and I’m giving serious thought to joining this industry in one form or another in the next six months, so I’m rather excited to see commercial applications taking off. More
I finally got a chance to try out SnapTell Explorer, and I have to say that I’m impressed. Almost all of books and CDs I had lying around were correctly recognised, despite being pretty obscure titles. With 2.5 million objects in their index, SnapTell can recognise just about any book, CD, DVD or game. Once the title is recognised, you get back a result page like this with a brief review and the option to buy it on Amazon, or search Google, Yahoo or Wikipedia. For music, there is a link to iTunes.
I spent a while “teasing” the search engine with badly taken photos, and the recognition is very robust. It has no problems with blur, rotation, oblique views, background clutter or partial occlusion of the object. Below is a real recognition example:
I did find the service pretty slow, despite having a WiFi connection. Searches took about five seconds. I had a similar experience with kooaba earlier. There are published visual search algorithms that would let these services be as responsive as Google, so I do wonder what’s going on here. It’s possible the speed issue is somewhere else in the process, or possibly they’re using brute-force descriptor comparison to ensure high recognition rate. For a compelling experience, they desperately need to be faster.
While the recognition rate was generally excellent, I did manage to generate a few incorrect matches. One failure mode is where multiple titles have similar cover design (think “X for Dummies”) – a picture of one title randomly returns one of the others. I saw a similar problem with a CD mismatching to another title because both featured the same record company logo. Another failure mode that might be more surprising to people who haven’t worked with these systems was mismatching on font. A few book searches returned completely unrelated titles which happened to use the same font on the cover. This happened particularly when the query image had a very plain cover, so there was no other texture to disambiguate it. The reason this can happen is that the search method relies on local shape information around sets of interest points, rather than attempt to recognise the book title as a whole by OCR.
My overall impression, however, is that this technology is very much ready for prime time. It’s easy to see visual search becoming the easiest and fastest way to get information about anything around you.
If you haven’t got an iPhone, you can try it by sending a picture to [email protected].
Well well. Hot on the heals of kooaba, competitor SnapTell just released an iPhone client for their visual search engine. A little sleuthing reveals that the index contains 2.5 million items – apparently most books, DVDs, CDs and game covers. If the recognition rate is as high as it should be, that’s a pretty impressive achievement. In principle the service was already available via email/mms. In practice, an iPhone client changes the experience completely. Image search becomes the fastest way to get information about anything around you.
I really think this technology is going to take off big-time in the near future. The marketing intelligentsia are aware of this too. There is an adoption challenge, but SnapTell in particular are already running an excellent high profile education/promotion campaign with print magazines. They’re not messing about either: The campaign is running in Rolling Stone, GQ, Men’s Health, ESPN, Wired and Martha Stewart Weddings. In short, publications that reach a substantial chunk of the reading public. Whether the message will carry over that the technology is good for more than signing up for free deoderant samples is something I’m a little skeptical about, but in the short term it’s a ready revenue stream for the startup, and a serious quantity of collateral publicity.
Usage report later when I can get hold of an iPhone.
Update: I just got to try it, and it’s really rather good. First impressions here.
I’m still rather excited about yesterday’s kooaba launch. I’ve been thinking about how long this technology will take to break into the mainstream, and it strikes me that getting people to adopt it is going to take some work.
When people first started using the internet, the idea of search engines didn’t need much promotion. People were very clearly lost, and needed some tool to find the interesting content. Adopting search engines was reactive, rather than active.
Visual search is not like that. If kooaba or others do succeed in building a tool that lets you snap a picture of any object or scene and get information, well, people may completely ignore it. They’re not lost – visual search is a useful extra, not a basic necessity. The technology may never reach usage levels seen by search engines. That said, it’s clearly very useful, and I can see it getting mass adoption. It’ll just need education and promotion. Shazam is great example of a non-essential search engine that’s very useful and massively popular.
So, promotion, and lots of it. What’s the best way? Well, most of the different mobile visual search startups are currently running trail campaigns involving competitions and magazine ads (for example this SnapTell campaign). Revenue for the startups, plus free public education on how to use visual search. Not a bad deal, easy to see why all the companies are doing it. The only problem is that it may get the public thinking that visual search is only about cheap promotions, not useful for anything real. That would be terrible for long-term usage. I rather prefer kooaba’s demo based on movie posters – it reinforces a real use case, plus it’s got some potential for revenues too.
Today kooaba released their iPhone client. It’s a visual search engine – you take a picture of something, and get search results. The YouTube clip below shows it in action. Since this is the kind of thing I work on all day long, I’ve got a strong professional interest. Haven’t had a chance to actually try it yet, but I’ll post an update once I can nab a friend with an iPhone this afternoon to give it a test run.
At the moment it only recognises movie posters. Basically it’s current form is more of a technology demo than something really useful. Plans are to expand to recognise other things like books, DVDs, etc. I think there’s huge potential for this stuff. Snap a movie poster, see the trailer or get the soundtrack. Snap a book cover, see the reviews on Amazon. Snap an ad in a magazine, buy the product. Snap a resturant, get reviews. Most of the real world becomes clickable. Everything is a link.
The technology is very scalable – The internals use an inverted index just like normal text search engines. In my own research I’m working with hundreds of thousands of images right now. It’s probably going to be possible to index a sizeable fraction of all the objects in the world – literally take a picture of anything and get search results. The technology is certainly fast enough, though how the recognition rate will hold up with such large databases is currently unknown.
My only question is – where’s the buzz, and why has it taken them so long?
Update: I gave the app a spin today on a friend’s iPhone, and it basically works as advertised. It was rather slow though – maybe 5 seconds per search. I’m not sure if this was a network issue (though the iPhone had a WiFi connection), or maybe kooaba got more traffic today than they were expecting. The core algorithm is fast – easily less than 0.2 seconds (and even faster with the latest GPU-based feature detection). I am sure the speed issue will be fixed soon. Recognition seemed fine, my friend’s first choice of movie was located no problem. A little internet sleuthing shows that they currently have 5363 movie posters in their database. Recognition shouldn’t be an issue until the database gets much larger.