Progress at Willow Garage

Just came across this new video of the Willow Garage PR2 robot. They’re making rapid progress. When they reach their goal of distributing these platforms to research groups around the world, it will be a good day for robotics. One neat package that comes out of the box up many different near-state-of-the-art capabilities. Right now every research group is independently re-creating platforms from scratch, and it’s a huge obstacle to progress.
If you haven’t heard of Willow Garage, I have an overview here.

 You need to a flashplayer enabled browser to view this YouTube video

Update: Another new video, celebrating two successive days of autonomous runs.

You need to a flashplayer enabled browser to view this YouTube video

Amazon Remembers

When it rains, it pours. Amazon is joining the visual search party, but with a twist.

Today Amazon released an iPhone app with a feature called “Amazon Remembers”. You take a picture of some product you’re interested in and the app uploads it to Amazon. You can then revisit your list later from a browser, e.g. to buy the item. The interesting bit is that Amazon attempts to generate a link to the product page based on your image. Examples here. This isn’t instant, and may take anywhere from five minutes to 24 hours.

Fascinatingly, the back end is apparently not computer vision based. It uses Mechanical Turk, where Amazon is paying people $0.10 per image to generate the links. See here. This is quite amazing to me. I have no idea if Mechanical Turk is deep enough to support this app if it truly becomes popular, but I suppose in that case Amazon could set up a dedicated call-centre type operation to handle the image recognition.

So, visual search companies now have a very direct measure of the value of a correct image search. Judging by the current set of images on Mechanical Turk, a fully automated solution is not possible. However, a hybrid system where some easy categories like books are recognised automatically and harder cases are farmed out to Mechanical Turk would clearly translate into significantly lower costs.
(Of course it’s possible Amazon are already doing this, though I did see several books in the Mechanical Turk requests).

Nokia Point and Find

It seems I missed something fairly major in my round up of mobile visual search companies last week. Nokia have a serious project in the area called Point and Find. You can see a demo here. From MSearchGroove:

“Nokia is committed to owning the visual search space and has committed a staff of 30 to build the business and further develop the technology. The business area has the buy-in of Nokia senior execs and “quite large” funding from the company

The technology comes from an acquisition of a valley startup called Pixto just over a year ago. Nokia’s service is apparently  due for launch soon, initially recognising movie posters only.

A Round-Up of Mobile Visual Search Companies

This post is an overview of all the companies developing mobile image recognition search engines. For people to whom that means nothing, see these posts. My robotics readers will have to forgive me for my sudden obsession with this topic, but I work on visual search engines for my PhD, and I’m giving serious thought to joining this industry in one form or another in the next six months, so I’m rather excited to see commercial applications taking off. More

Silicon Valley Comes to Oxford

I’ll be at Silicon Valley Comes to Oxford all day today. This has been an excellent event in previous years, and there is a strong line-up again this year. Anyone interested in visual search technology, do come say hello.

Update: I heard some great talks today and met lots of interesting people during coffee. Chris Sacca’s pitch workshop was especially good. No bullshit. The most valuable thing was the perspective – all those bits of knowledge that are obvious from the inside but very hard to come by from the outside. And of course hearing Elon Musk was just fantastic.

For those people who were interested in our lab’s visual search engine, there’s an online demo here (scroll down to where it says Real-time Demo). The demo is actually of some older results from about a year ago by a colleague of mine. Things have gotten even better since then.

Computer Vision in the Elastic Compute Cloud

In a datacenter somewhere on the other side of the planet, a rack-mounted computer is busy hunting for patterns in photographs of Oxford.  It is doing this for 10 cents an hour, with more RAM and more horsepower than I can muster on my local machine. This delightful arrangement is made possible by Amazon’s Elastic Compute Cloud.

For the decreasing number of people who haven’t heard of EC2, it’s a pretty simple idea. Via a simple command line interface you can “create” a server running in Amazon’s datacenter. You pick a hardware configuration and OS image, send the request and voilà – about 30 seconds later you get back a response with the IP address of the machine, to which you now have root access and sole use.  You can customize the software environment to your heart’s content and then save the disk image for future use. Of course, now that you can create one instance you can create twenty. Cluster computing on tap.

This is an absolutely fantastic resource for research. I’ve been using it for about six months now, and have very little bad to say about it. Computer vision has an endless appetite for computation. Most groups, including our own, have their own computing cluster but demand for CPU cycles typically spikes around paper deadlines, so having the ability to instantly double or triple the size of your cluster is very nice indeed.

Amazon also have some hi-spec machines available. I recently ran into trouble where I needed about 10GB of RAM for a large learning job. Our cluster is 32-bit, so 4GB RAM is the limit. What might have been a serious headache was solved with a few hours and $10 on Amazon EC2.

The one limitation I’ve found is that disk access on EC2 is a shared resource, so bandwidth to disk tends to be about 10MB/s, as opposed to say 70MB/sec on a local SATA hard drive. Disk bandwidth tends to be a major factor in running time for very big out-of-core learning jobs. Happily, Amazon very recently released a new service called Elastic Block Store which offers dedicated disks, though the pricing is a little hard to figure out.

I should mention that for UK academics there is a free service called National Grid, though personally I’d rather work with Amazon.

Frankly, the possibilities opened up by EC2 just blow my mind. Every coder in a garage now potentially has access to Google-level computation. For tech startups this is a dream. More traditional companies are playing too. People have been talking about this idea for a long time, but it’s finally here, and it rocks!

Update: Amazon are keen to help their scientific users. Great!

An Insider’s Guide to BigDog

In common with half of YouTube, I was mesmerized by the BigDog videos from Boston Dynamics earlier in the year, though I couldn’t say much about how the robot worked. For everyone hungry for some more technical details, check out the talk by Marc Raibert at Carnegie Mellon’s Field Robotics 25 event. There’s some interesting discussion of the design of the system, where’s it’s headed, and more great video.

There are a bunch of other worthwhile talks from the event. I particularly enjoyed Hugh Durrant-Whyte’s description of building a fully automated container terminal “without a graduate student in 1000km”.

A Simple Thing Done Perfectly

I’ve been blown away by Dropbox. It’s such a simple thing – online storage easily shared between different computers. The concept is simple, but there are so many ways to do it wrong.  With Dropbox, the execution is pretty near perfect.

Amazon AWS has made scaling so easy that great little tools like this are suddenly popping up everywhere.

Hat tip: Andy Davison

Snaptell Explorer – First Impressions

I finally got a chance to try out SnapTell Explorer, and I have to say that I’m impressed. Almost all of books and CDs I had lying around were correctly recognised, despite being pretty obscure titles. With 2.5 million objects in their index, SnapTell can recognise just about any book, CD, DVD or game. Once the title is recognised, you get back a result page like this with a brief review and the option to buy it on Amazon, or search Google, Yahoo or Wikipedia. For music, there is a link to iTunes.

I spent a while “teasing”  the search engine with badly taken photos, and the recognition is very robust. It has no problems with blur, rotation, oblique views, background clutter or partial occlusion of the object. Below is a real recognition example:


I did find the service pretty slow, despite having a WiFi connection. Searches took about five seconds.  I had a similar experience with kooaba earlier.  There are published visual search algorithms that would let these services be as responsive as Google, so I do wonder what’s going on here. It’s possible the speed issue is somewhere else in the process, or possibly they’re using brute-force descriptor comparison to ensure high recognition rate. For a compelling experience, they desperately need to be faster.

While the recognition rate was generally excellent, I did manage to generate a few incorrect matches. One failure mode is where multiple titles have similar cover design (think “X for Dummies”)  – a picture of one title randomly returns one of the others. I saw a similar problem with a CD mismatching to another title because both featured the same record company logo. Another failure mode that might be more surprising to people who haven’t worked with these systems was mismatching on font. A few book searches returned completely unrelated titles which happened to use the same font on the cover. This happened particularly when the query image had a very plain cover, so there was no other texture to disambiguate it. The reason this can happen is that the search method relies on local shape information around sets of interest points, rather than attempt to recognise the book title as a whole by OCR.

My overall impression, however, is that this technology is very much ready for prime time. It’s easy to see visual search becoming the easiest and fastest way to get information about anything around you.

If you haven’t got an iPhone, you can try it by sending a picture to [email protected].

SnapTell Explorer – Mobile Visual Search Heats Up

Well well. Hot on the heals of kooaba, competitor SnapTell just released an iPhone client for their visual search engine. A little sleuthing reveals that the index contains 2.5 million items – apparently most books, DVDs, CDs and game covers.  If the recognition rate is as high as it should be, that’s a pretty impressive achievement. In principle the service was already available via email/mms. In practice, an iPhone client changes the experience completely. Image search becomes the fastest way to get information about anything around you.

I really think this technology is going to take off big-time in the near future.  The marketing intelligentsia are aware of this too. There is an adoption challenge, but SnapTell in particular are already running an excellent high profile education/promotion campaign with print magazines. They’re not messing about either: The campaign is running in Rolling Stone, GQ, Men’s Health, ESPN, Wired and Martha Stewart Weddings. In short, publications that reach a substantial chunk of the reading public. Whether the message will carry over that the technology is good for more than signing up for free deoderant samples is something I’m a little skeptical about, but in the short term it’s a ready revenue stream for the startup, and a serious quantity of collateral publicity.

Usage report later when I can get hold of an iPhone.

Update: I just got to try it, and it’s really rather good. First impressions here.