Silicon Valley Comes to Oxford

I’ll be at Silicon Valley Comes to Oxford all day today. This has been an excellent event in previous years, and there is a strong line-up again this year. Anyone interested in visual search technology, do come say hello.

Update: I heard some great talks today and met lots of interesting people during coffee. Chris Sacca’s pitch workshop was especially good. No bullshit. The most valuable thing was the perspective – all those bits of knowledge that are obvious from the inside but very hard to come by from the outside. And of course hearing Elon Musk was just fantastic.

For those people who were interested in our lab’s visual search engine, there’s an online demo here (scroll down to where it says Real-time Demo). The demo is actually of some older results from about a year ago by a colleague of mine. Things have gotten even better since then.

Computer Vision in the Elastic Compute Cloud

In a datacenter somewhere on the other side of the planet, a rack-mounted computer is busy hunting for patterns in photographs of Oxford.  It is doing this for 10 cents an hour, with more RAM and more horsepower than I can muster on my local machine. This delightful arrangement is made possible by Amazon’s Elastic Compute Cloud.

For the decreasing number of people who haven’t heard of EC2, it’s a pretty simple idea. Via a simple command line interface you can “create” a server running in Amazon’s datacenter. You pick a hardware configuration and OS image, send the request and voilà – about 30 seconds later you get back a response with the IP address of the machine, to which you now have root access and sole use.  You can customize the software environment to your heart’s content and then save the disk image for future use. Of course, now that you can create one instance you can create twenty. Cluster computing on tap.

This is an absolutely fantastic resource for research. I’ve been using it for about six months now, and have very little bad to say about it. Computer vision has an endless appetite for computation. Most groups, including our own, have their own computing cluster but demand for CPU cycles typically spikes around paper deadlines, so having the ability to instantly double or triple the size of your cluster is very nice indeed.

Amazon also have some hi-spec machines available. I recently ran into trouble where I needed about 10GB of RAM for a large learning job. Our cluster is 32-bit, so 4GB RAM is the limit. What might have been a serious headache was solved with a few hours and $10 on Amazon EC2.

The one limitation I’ve found is that disk access on EC2 is a shared resource, so bandwidth to disk tends to be about 10MB/s, as opposed to say 70MB/sec on a local SATA hard drive. Disk bandwidth tends to be a major factor in running time for very big out-of-core learning jobs. Happily, Amazon very recently released a new service called Elastic Block Store which offers dedicated disks, though the pricing is a little hard to figure out.

I should mention that for UK academics there is a free service called National Grid, though personally I’d rather work with Amazon.

Frankly, the possibilities opened up by EC2 just blow my mind. Every coder in a garage now potentially has access to Google-level computation. For tech startups this is a dream. More traditional companies are playing too. People have been talking about this idea for a long time, but it’s finally here, and it rocks!

Update: Amazon are keen to help their scientific users. Great!

“But I’m Not Lost!” – Adoption Challenges for Visual Search

I’m still rather excited about yesterday’s kooaba launch. I’ve been thinking about how long this technology will take to break into the mainstream, and it strikes me that getting people to adopt it is going to take some work.

When people first started using the internet, the idea of search engines didn’t need much promotion. People were very clearly lost, and needed some tool to find the interesting content. Adopting search engines was reactive, rather than active.

Visual search is not like that. If kooaba or others do succeed in building a tool that lets you snap a picture of any object or scene and get information, well, people may completely ignore it. They’re not lost – visual search is a useful extra, not a basic necessity. The technology may never reach usage levels seen by search engines. That said, it’s clearly very useful, and I can see it getting mass adoption. It’ll just need education and promotion. Shazam is great example of a non-essential search engine that’s very useful and massively popular.

So, promotion, and lots of it. What’s the best way? Well, most of the different mobile visual search startups are currently running trail campaigns involving competitions and magazine ads (for example this SnapTell campaign).  Revenue for the startups, plus free public education on how to use visual search. Not a bad deal, easy to see why all the companies are doing it. The only problem is that it may get the public thinking that visual search is only about cheap promotions, not useful for anything real. That would be terrible for long-term usage. I rather prefer kooaba’s demo based on movie posters – it reinforces a real use case, plus it’s got some potential for revenues too.

A Visual Search Engine for iPhone

Today kooaba released their iPhone client. It’s a visual search engine – you take a picture of something, and get search results. The YouTube clip below shows it in action.  Since this is the kind of thing I work on all day long, I’ve got a strong professional interest. Haven’t had a chance to actually try it yet, but I’ll post an update once I can nab a friend with an iPhone this afternoon to give it a test run.

You need to a flashplayer enabled browser to view this YouTube video

At the moment it only recognises movie posters. Basically it’s current form is more of a technology demo than something really useful. Plans are to expand to recognise other things like books, DVDs, etc. I think there’s huge potential for this stuff. Snap a movie poster, see the trailer or get the soundtrack. Snap a book cover, see the reviews on Amazon. Snap an ad in a magazine, buy the product. Snap a resturant, get reviews. Most of the real world becomes clickable. Everything is a link.

The technology is very scalable – The internals use an inverted index just like normal text search engines. In my own research I’m working with hundreds of thousands of images right now. It’s probably going to be possible to index a sizeable fraction of all the objects in the world –  literally take a picture of anything and get search results. The technology is certainly fast enough, though how the recognition rate will hold up with such large databases is currently unknown.

My only question is – where’s the buzz, and why has it taken them so long?

Update: I gave the app a spin today on a friend’s iPhone, and it basically works as advertised. It was rather slow though – maybe 5 seconds per search. I’m not sure if this was a network issue (though the iPhone had a WiFi connection), or maybe kooaba got more traffic today than they were expecting. The core algorithm is fast – easily less than 0.2 seconds (and even faster with the latest GPU-based feature detection).  I am sure the speed issue will be fixed soon. Recognition seemed fine, my friend’s first choice of movie was located no problem. A little internet sleuthing shows that they currently have 5363 movie posters in their database. Recognition shouldn’t be an issue until the database gets much larger.

Off to ISRR

For the next week I’ll be in Japan attending the International Symposium of Robotics Research. Should be lots of fun, and a good time to find out about all the new developments outside of my little corner of the robot research universe. Come say hello if you’re at the conference.

Post #1

So, welcome to yet another blog, clearly the solution to all the world’s problems. Educating Silicon will post news and thoughts on robotics, computer vision, machine learning and related Things Of Interest. Hope you enjoy the show!