Highlights of Robotics: Science and Systems 2012

I spent last week at RSS 2012 in Sydney. Here are a few of the papers that caught my attention. This year I went to more talks on manipulation, but I still find myself picking a SLAM paper as my favourite :)

Robust Estimators for SLAM

For me, the most interesting work at the conference were two related papers, one from Ed Olson and another from Niko Sünderhauf.

Figure 1 from Olson and Agarwal 2012

The Universal Robotic Gripper

I just saw a video of device that consists of nothing more than a rubber balloon, some coffee grounds and a pump. I’m pretty sure it’s going to change robotics forever. Have a look:

You need to a flashplayer enabled browser to view this YouTube video

It’s a wonderful design. It’s cheap to make. You don’t need to position it precisely. You need only minimal knowledge of the object you’re picking up. Robotic grasping has always been too hard to be really practical in the wild. Now a whole class of objects just got relatively easy.

Clearly, the design has it’s limitations. It’s not going to allow for turning the pages of a book, making a cheese sandwich, tying a dasiy chain, etc. But for relatively straightforward manipulation of rigid objects, it’s a beautiful solution. This one little idea could help start a whole industry.

The research was a collaboration between Chicago, Cornell and iRobot, with funding from DARPA. It made the cover of PNAS this month. The research page is here.

Autonomous Helicopters!

I’ve let this blog go very quiet while I was working on finishing my thesis (done now!). However, today my brother got a helicopter pilot’s license, so I though I would mark the occasion by posting some videos showing how his fancy skill might soon be redundant :). Here are some cool results from Nick Roy’s group at MIT:

It’s a pretty cool system. Robots that do the full autonomous shebang, from SLAM to path planning to obstacle avoidance, are still quite rare. To do it all on a helicopter is just showing off.

A Thousand Kilometers of Appearance-Only SLAM

I’m off to RSS 2009 in Seattle next week to present a new paper on FAB-MAP, our appearance-based navigation system. For the last year I’ve been hard at work on pushing the scale of the system. Our initial approach from 2007 could handle trajectories about 1km long. This year, we’re presenting a new system that we demonstrate doing real-time place recognition over a 1,000km trajectory. In terms of accuracy, the 1,000km result seems to be on the edge of what we can do, however at around the 100km scale performance is really rather good. Some video results below.

You need to a flashplayer enabled browser to view this YouTube video

One of the hardest things to get right was simply gathering the 1,000km dataset. The physical world is unforgiving! Everything breaks. I’ll have a few posts about the trials of building the data collection system over the next few days.

Autonomous Marathon!

Congratulations to everyone at Willow Garage for reaching Milestone 2 in the development of the PR2 robot. 26.2 miles of autonomous indoor navigation, including opening eight doors and plugging in to nine power sockets. We’ve been watching the video in the lab with serious robot envy. Very cool!

You need to a flashplayer enabled browser to view this YouTube video

FAB-MAP in the News

Today’s edition of the New Scientist news feed includes an article about my PhD research. How nice! They called the article ‘Chaos filter stops robots getting lost’. This is kind of  a bizarre title – ‘chaos filter’ seems to be a term of their own invention :).  Still, they mostly got things mostly right. I guess that’s journalism!

Whatever about the strange terminology, it’s great to see the research getting out there. It’s also nice to see the feedback from Robert Sim, who made a rather impressive vision-only robotic system with full autonomy a few years ago, still quite a rare accomplishment.

For anyone interested in the details of the system, have a look at my publications page. New Scientist’s description more or less resembles how our system works, but many of the specifics are a little wide of the mark. In particular, we’re not doing hierarchical clustering of visual words as the article describes – instead we learn a Bayesian network that captures the visual word co-occurrence statistics. This achieves a similar effect in that we implicitly learn about objects in the world, but with none of the hard decisions and awkward parameter tuning involved in clustering.

Progress at Willow Garage

Just came across this new video of the Willow Garage PR2 robot. They’re making rapid progress. When they reach their goal of distributing these platforms to research groups around the world, it will be a good day for robotics. One neat package that comes out of the box up many different near-state-of-the-art capabilities. Right now every research group is independently re-creating platforms from scratch, and it’s a huge obstacle to progress.
If you haven’t heard of Willow Garage, I have an overview here.

 You need to a flashplayer enabled browser to view this YouTube video

Update: Another new video, celebrating two successive days of autonomous runs.

You need to a flashplayer enabled browser to view this YouTube video

An Insider’s Guide to BigDog

In common with half of YouTube, I was mesmerized by the BigDog videos from Boston Dynamics earlier in the year, though I couldn’t say much about how the robot worked. For everyone hungry for some more technical details, check out the talk by Marc Raibert at Carnegie Mellon’s Field Robotics 25 event. There’s some interesting discussion of the design of the system, where’s it’s headed, and more great video.

There are a bunch of other worthwhile talks from the event. I particularly enjoyed Hugh Durrant-Whyte’s description of building a fully automated container terminal “without a graduate student in 1000km”.

“But I’m Not Lost!” – Adoption Challenges for Visual Search

I’m still rather excited about yesterday’s kooaba launch. I’ve been thinking about how long this technology will take to break into the mainstream, and it strikes me that getting people to adopt it is going to take some work.

When people first started using the internet, the idea of search engines didn’t need much promotion. People were very clearly lost, and needed some tool to find the interesting content. Adopting search engines was reactive, rather than active.

Visual search is not like that. If kooaba or others do succeed in building a tool that lets you snap a picture of any object or scene and get information, well, people may completely ignore it. They’re not lost – visual search is a useful extra, not a basic necessity. The technology may never reach usage levels seen by search engines. That said, it’s clearly very useful, and I can see it getting mass adoption. It’ll just need education and promotion. Shazam is great example of a non-essential search engine that’s very useful and massively popular.

So, promotion, and lots of it. What’s the best way? Well, most of the different mobile visual search startups are currently running trail campaigns involving competitions and magazine ads (for example this SnapTell campaign).  Revenue for the startups, plus free public education on how to use visual search. Not a bad deal, easy to see why all the companies are doing it. The only problem is that it may get the public thinking that visual search is only about cheap promotions, not useful for anything real. That would be terrible for long-term usage. I rather prefer kooaba’s demo based on movie posters – it reinforces a real use case, plus it’s got some potential for revenues too.

A Visual Search Engine for iPhone

Today kooaba released their iPhone client. It’s a visual search engine – you take a picture of something, and get search results. The YouTube clip below shows it in action.  Since this is the kind of thing I work on all day long, I’ve got a strong professional interest. Haven’t had a chance to actually try it yet, but I’ll post an update once I can nab a friend with an iPhone this afternoon to give it a test run.

You need to a flashplayer enabled browser to view this YouTube video

At the moment it only recognises movie posters. Basically it’s current form is more of a technology demo than something really useful. Plans are to expand to recognise other things like books, DVDs, etc. I think there’s huge potential for this stuff. Snap a movie poster, see the trailer or get the soundtrack. Snap a book cover, see the reviews on Amazon. Snap an ad in a magazine, buy the product. Snap a resturant, get reviews. Most of the real world becomes clickable. Everything is a link.

The technology is very scalable – The internals use an inverted index just like normal text search engines. In my own research I’m working with hundreds of thousands of images right now. It’s probably going to be possible to index a sizeable fraction of all the objects in the world –  literally take a picture of anything and get search results. The technology is certainly fast enough, though how the recognition rate will hold up with such large databases is currently unknown.

My only question is – where’s the buzz, and why has it taken them so long?

Update: I gave the app a spin today on a friend’s iPhone, and it basically works as advertised. It was rather slow though – maybe 5 seconds per search. I’m not sure if this was a network issue (though the iPhone had a WiFi connection), or maybe kooaba got more traffic today than they were expecting. The core algorithm is fast – easily less than 0.2 seconds (and even faster with the latest GPU-based feature detection).  I am sure the speed issue will be fixed soon. Recognition seemed fine, my friend’s first choice of movie was located no problem. A little internet sleuthing shows that they currently have 5363 movie posters in their database. Recognition shouldn’t be an issue until the database gets much larger.