Pointy

Two and a half years ago I left Google and set out to build a new kind of search engine. This may sound a little crazy, but all the best things are like that :-)

We’ve been avoiding the tech press and trying to build things quietly, but this week we’re launched our user-facing app. I’m really proud of what the team has built, so it’s exciting to finally be able to say a bit more about it.

The problem we’ve been working on is finding specific items locally. For example, a light bulb just broke and it’s a strange fitting, where’s the nearest place you can get a new one?  Or you’re half way through a recipe and realise you’re missing an ingredient – where do you get it?

Existing search engines do a really bad job of answering questions like this. The reason is that, in order to provide a good answer, you need to know what products are stocked in all the local shops. It turns out that nobody has this data – not even the shops themselves in many cases. It’s kind of strange to think that you can search the entire internet in a fraction of a second, but the contents of the shop around the corner remains a mystery unless you go there in person. But that’s the state of the world in 2015. At Pointy we’ve been working on solving that problem.

The core challenge is data collection – building a scalable way to index the contents of every local shop. Enter the Pointy box:

D16929-0051_clipped_rev_1

The Pointy box is a piece of hardware we designed, which sits inline between a shop’s barcode scanner and Point of Sale system. Whenever the shop scans something, we intercept the barcode information and transmit it back to our servers over a built-in cellular connection. From this data we can figure out what products the shop stocks, and get a pretty good estimate of stock levels. How it all works is illustrated on our retailer signup page.

Technically, the magic is in how we interface (or don’t interface) with the POS system. We pull the data directly off the wire between the barcode scanner and the POS. Since we get it at such a low level, we don’t have to worry about integrating with every piece of POS software in the world. There’s still a lot of work to do, but it makes the problem much more tractable. At this stage we can support basically anything we find in the wild1, everything from ancient cash registers that look like they belong in a Western, right up to iPad based systems.

Pointy box

From the retailers’ point of view, it’s extremely simple. They just plug in the box and within a few minutes they have a nice website for their shop, which automatically lists everything they sell. They’re also part of the Pointy local search app. There’s no extra work and no configuration, it just fits in with their existing systems.

app_overview

Happily, retailers seem to love this. We started to roll it out widely in June this year, and by December roughly 1 in 8 of all shops in our launch city (Dublin, Ireland) are using the system.

map

There’s a vast variety of shops now on the platform, basically the whole range of local shops: bike stores, pharmacies, hardware, convenience, pet shops, delis, supermarketswine stores, toy shops, book stores, garden centres, even horse supply shops. There’s a huge data challenge in identifying the right name and picture to go with a barcode, and that actually occupies a big chunk of our engineering team, but that’s a topic for another post.

app_product

Infrastructure

Our system is built on Google Cloud Platform, which has let us scale quickly without having to spend time on non-core problems. We use a mixture of services including App Engine, Compute Engine and Cloud Storage. Services like Task Queues, Cron and Logs give us some great tools for building reliable services with minimal effort.

Processing all of the point of sale transaction data at the level of cities or countries is no small task. We need to deal with everything in real time in order to rapidly detect out-of-stock events and keep the search results accurate. We use Cloud Datastore for logging all of our transaction data. With a little bit of initial design work, we have a system that should scale pretty much indefinitely without us having to worry about it. Server load is concentrated around busy lunchtime and evening shopping hours, so being able to dynamically scale with demand is also helpful.

We use Compute Engine to manage our IoT device deployment, including over-the-air firmware updates and remote debugging. Our product search engine is also hosted on Compute Engine. We do a lot of batch processing of our transaction data logs to extract ranking signals, and process lots of external web data for additional product attributes and ranking signals. Everything is backed up nightly from Datastore to Google Cloud Storage for disaster recovery.

I built my last startup on AWS, so it was a little bit of a change to use Google Cloud this time around. However, it’s been a really great choice. It gives us a beautiful combination of scale and agility. We deploy to production often multiple times per day, which is extremely easy with the GCP tools. This lets us iterate rapidly, and focus on our product rather than system administration.  I suppose when you’re building a search engine, using Google’s infrastructure seems like an obvious choice :-)

What’s Next

It’s been a great experience so far, but we’re not close to the end. There’s still a long way to go to index every shop on the planet, after all. We’re getting there, and having fun along the way. If you’re interested, we’re always looking for good people.

Footnotes
  1. There’s always a few exotic ones that aren’t worth the trouble, but for practical purposes it might as well be 100% coverage.

Startups

Imagine you’re a road engineer and you’re designing an access road for a new town. The town will soon be built in a previously uninhabited area. You’ve managing the construction project, but unfortunately no one can tell you what the population of the town will be.

Taking your job seriously, you sit down to design the best road that you can build. You settle on constructing a seven lane highway with regular flyovers to minimize traffic. The road will be fully lit with a state-of-the-art LED lighting system. You add crash barriers and regularly spaced emergency telephones. After much consideration you decide to also include a rest area with parking and toilets. This involves designing a self-contained water and sewerage system, but it’s obviously worth it.

With three months to go until launch day, you discover problems with road drainage. After the panic subsides, the construction team agrees to work around the clock to refit a completely new system for surface water management. By a minor miracle, the work is completed on time.

Opening day finally arrives and the excitement is intense. Everyone agrees the finished product is an engineering marvel. The new town will have the best road in the world.

Unfortunately, it turns out that the town is a remote settlement with a population of 57. The road is mainly used by an old man and a donkey.

—————

The next year, you are again given a road construction project for another new town. Having learned your lesson, you build a modest single lane road. It’s well constructed but nothing special.

Opening day comes again, and it’s revealed that this time the “town” is in fact a major city with a population of 14 million. There are 50 mile tailbacks for six years before a larger road can be built. Your face appears on wanted posters throughout the nation, and you flee the country in disgrace.

—————

Twitter, I forgive you the Fail Whale. And I hope to always walk the middle *ahem* road.

Epiphenomenalism for Computer Scientists

It’s hard to work on robotics or machine learning and not occasionally think about consciousness.  However, it’s quite easy not to think about it properly! I recently concluded that everything I used to believe on this subject is wrong. So I wanted to write a quick post explaining why.

For a long time, I subscribed to a view on consciousness called “epiphenomenalism”. It just seemed obvious, even necessary. I suspect a lot of computer scientists may share this view. However, I recently had a chance to think a bit more carefully about it, and came upon problems which I now see as fatal. Below I explain briefly what epiphenomenalism is, why it is so appealing to computer scientists, and what convinced me it cannot be right. Everything here is old news in philosophy, but might be interesting for someone coming to the issue from a computer scientist perspective. Continue reading ‘Epiphenomenalism for Computer Scientists’

Building a DIY Street View Car

A little blast from the past here. Several years ago I built something very like a Google Street View car to gather data for my PhD thesis. At the time I wrote up a blog post about the experience, as a guide for anyone else who might want to build such a thing. But I never quite finished it. Upgrading WordPress today, I came across this old post sitting in my drafts folder from years ago, and decided to rescue it. So here it is. The making of a DIY StreetView car.

Continue reading ‘Building a DIY Street View Car’

Will the robots take our jobs?

This post is about robots and the economy, but takes some detours first. Bear with me.

Robert Gordon and the End of Growth

There has been a very interesting discussion going on recently, prompted by an article by economist Robert Gordon of Northwestern University. Gordon’s article (“Is US economic growth over?”) makes the case that long-term US economic growth on the scale of the last century was due to one-time events and has run its course, with future growth prospects being much lower. He attributes the growth of the past few centuries to three distinct industrial revolutions. The first, beginning 1750-1830, was due to steam power and railroads. The second, 1870-1900, was due to electrification, internal combustion engines, running water and petroleum. The third, beginning around 1960, was due to the computer and the internet. Gordon makes the case that the second industrial revolution, from 1870-1900, was by far the most important, and that computers and the internet have had far smaller impacts on GDP. Combined with demographic headwinds, he sees much lower rates of growth in the next century.

Martin Wolf summarizes the pessimist’s case succinctly:

Unlimited growth is a heroic assumption. For most of history, next to no measurable growth in output per person occurred. What growth did occur came from rising population. Then, in the middle of the 18th century, something began to stir. Output per head in the world’s most productive economies – the UK until around 1900 and the US, thereafter – began to accelerate. Growth in productivity reached a peak in the two and a half decades after World War II. Thereafter growth decelerated again, despite an upward blip between 1996 and 2004. In 2011 – according to the Conference Board’s database – US output per hour was a third lower than it would have been if the 1950-72 trend had continued (see charts). Prof Gordon goes further. He argues that productivity growth might continue to decelerate over the next century, reaching negligible levels.

Robots to the rescue?

What interests me most is the responses that Gordon’s article has received. His position is very interesting, but likely wrong in one massive aspect.

Continue reading ‘Will the robots take our jobs?’

Highlights of Robotics: Science and Systems 2012

I spent last week at RSS 2012 in Sydney. Here are a few of the papers that caught my attention. This year I went to more talks on manipulation, but I still find myself picking a SLAM paper as my favourite :)

Robust Estimators for SLAM

For me, the most interesting work at the conference were two related papers, one from Ed Olson and another from Niko Sünderhauf.

Figure 1 from Olson and Agarwal 2012

Large Scale Deep Learning at Google

[This blog has been dormant a long time, since I post most of this kind of content on Google+ these days. I’ll still cross-post the occasional longer piece here.]

There’s an important paper at ICML this week, showing results from a Google X project which scaled up deep learning to 16,000 cores. Just by throwing more computation at the problem, things moved substantially beyond the prior state of the art.

Building High-level Features Using Large Scale Unsupervised Learning
Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng

Learned feature detectors


I think this is a really important paper, so let me give some background. Since starting at Google, there have been five projects which I’ve seen internally that blew me away with their obvious potential to truly change the world. A few of those are now public: (1) Self-driving cars, (2) Project Glass, (3) Knowledge Graph. Number four is this paper. It might grab fewer headlines than the first three, but in the long term this is by far the most important.

What this paper demonstrates (or at least convinced me) is that raw computation is now the key limiting factor for machine learning. That is a huge. For the last twenty years or more, it was not really the case. The field was dominated by SVMs and Boosting. Progress didn’t really have much to do with Moore’s Law. If machines got a million times faster, it wasn’t really clear that we had any good way to use the extra computation. There certainly wasn’t a viable path to animal-level perceptual abilities. Now I would like to stick my neck out and say that I think that position has changed. I think we now have a research program that has a meaningful chance of arriving at learning abilities comparable to biological systems.
That doesn’t mean that if someone gifted us a datacenter from 2050 we could solve machine learning immediately. There is a lot of algorithmic progress still to be made [1]. Unlike SVMs, the training of these systems still owes a lot to black magic. There are saturation issues that I think nobody has really figured out yet, to name one of a hundred problems [2]. But, the way seems navigable. I’ve been optimistic about this research ever since I saw Geoff Hinton’s talk on RBMs back in 2007, but it was a cautious optimism back then. Now that Google has shown you can scale the methods up by orders of magnitude and get corresponding performance improvements, my level of confidence has gone up several notches.

Returning to the present, here are a few cool aspects of the current paper:

1) Without supervision, the model learns complex, generalizable features (see the human face and cat face detectors below). To say that again, there is no labelled training data. Nobody told the model to detect faces. The face feature simply emerges naturally as a compact way for the network to reconstruct its inputs. We’ve seen that before for low level features like edges and edge junctions, but to see it for high level concepts is a result.

2) “Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation.”

This is important too. It’s been known for a while that most current approaches used in computer vision don’t really learn any meaningful invariance to transformations which are not explicitly hand-designed into the features. e.g. See this paper from the DiCarlo lab: Comparing State-of-the-Art Visual Features on Invariant Object Recognition Tasks

3) “Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.”

It works! Well – it still fails 85% of the time (on a very very hard test set), but it’s big progress. These techniques apply to everything from speech recognition to language modeling. Exciting times.

——————————-
Notes:

[1]: I saw a talk by Geoff Hinton just yesterday which contained a big advance which he called “dropout”. No paper available yet, but check his website in the next few days. Lots happening right now.

[2] Or the embarrassing fact that models that achieve close to record performance on MNIST totally fail on 1 – MNIST (i.e. just invert the colours and the model fails to learn). Another example is the structural parameters (how many layers, how wide) which are still picked more or less arbitrarily. The brain is not an amorphous porridge, in some places the structure is important and the details will take years for us to figure out.

The Universal Robotic Gripper

I just saw a video of device that consists of nothing more than a rubber balloon, some coffee grounds and a pump. I’m pretty sure it’s going to change robotics forever. Have a look:

You need to a flashplayer enabled browser to view this YouTube video

It’s a wonderful design. It’s cheap to make. You don’t need to position it precisely. You need only minimal knowledge of the object you’re picking up. Robotic grasping has always been too hard to be really practical in the wild. Now a whole class of objects just got relatively easy.

Clearly, the design has it’s limitations. It’s not going to allow for turning the pages of a book, making a cheese sandwich, tying a dasiy chain, etc. But for relatively straightforward manipulation of rigid objects, it’s a beautiful solution. This one little idea could help start a whole industry.

The research was a collaboration between Chicago, Cornell and iRobot, with funding from DARPA. It made the cover of PNAS this month. The research page is here.

Fun with Robots

It’s no secret that I’m a huge fan of Willow Garage. So as they get ready to ship their first PR2 robots, here’s a gratuitous video of the pre-release testing:

You need to a flashplayer enabled browser to view this YouTube video

This second video is a nice overview of what Willow Garage and their open source robotics program is all about:

You need to a flashplayer enabled browser to view this YouTube video

Google Goggles Goes Live

To my surprise, Google Goggles actually launched last night, not 12 hours after I posted about it yesterday. I’ve just spent a while playing around  with it on my Android handset. Search times are, as expected, much more than one second, more in the anticipated 5-10 second range. Good to see that even Google can’t break the laws of physics. The app shows a pretty-but-pointless image analysis animation to make the wait seem shorter, almost exactly like my tongue-in-cheek suggestion from yesterday.

The engine covers all the easy verticals (books, DVDs, logos, landmarks, products, text, etc). The recognition quality is very good, though the landing pages are often a bit useless. It will take a bit of living with it to see how much use it is as a tool rather than a tech demo.
The major worry is that it may end up being too broad-but-shallow. For example, they do wine recognition, but the landing pages are generic. Perhaps visual wine recognition would be better built into Snoot or some other dedicated iPhone wine app. Or Google could take the route Bing recently took with recipes, and build rich landing pages for each vertical. Because of the nature of current visual search technology, Goggles is essentially a number of different vertical searches glued together, so this is more feasible than it would be for web search.

Certainly an interesting week for visual search!