Meet Matterport’s newest employee: Thomas Bayes

Mikhail Kourinny
Matterport Engineering Techblog
5 min readMar 17, 2017

--

Here at Matterport, we create virtual tours of real places for web and virtual reality. Our virtual tours (3D Spaces) are used across many different markets — residential real estate, architecture and engineering, travel and hospitality, and commercial real estate.

While Matterport has only been around for five years, our content library is HUUUGE — over 400,000 3D Spaces that have been viewed over 140 million times. But while our library is big, we’ve only scratched the surface of what’s possible with it.

One interesting problem we’re tackling is automatic room classification. Of those 400,000 Spaces, most of them are residential real estate. That is, they are 3D Spaces of homes that people want to sell. If we can immediately classify a room as a bedroom, bathroom, kitchen, or something else — well beyond being just plain cool, it’s helpful to whoever’s selling or buying that house.

That’s where computer vision, deep learning, and Thomas Bayes enters the picture. Literally!

He’s teaching you probability while you sleep.

Getting Deeper

Currently we have a classifier algorithm that can determine if a picture is a bedroom, kitchen, or other room.

For example, if you input this picture into the classifier:

the classifier will output something like this:

So cooking. Much counter space.

A trivial task for a human, but definitely non-trivial for a computer. We can’t go into detail on the classifier (you’ll have to wait for a future blog post… maybe), but we can say it involves deep learning with a softmax output layer.

Classification is relatively simple for a single, well-chosen, well-aimed picture. But it’s not always the case with Matterport.

When someone captures a space, they’re actually using our Matterport camera mounted on a tripod. The camera rotates 360 degrees, and when it rotates it stops six different times. Each time it stops it takes a 2D panorama of what it sees.

Not all of those six pictures are well-aimed and easy to classify. For example, this picture is maybe a kitchen, maybe a wall.

Is it a kitchen? Is it a wall?

Even worse, you’ll sometimes get a picture that’s hard even for a human to classify.

You’re a human and you don’t even know what room we’re in.

This doesn’t tell us much. But we know that a typical house has more bedrooms than any other room. So our classifier “prefers” the bedroom label just based on prior knowledge.

Obviously, this can be wrong and the five other views of the room might disagree. But if we aren’t careful, this bias can cause us to incorrectly classify as a bedroom, even if there’s a nice view of a pool table (billiards) in another image. Just taking the arithmetic or geometric mean over the view labels will magnify the prior bias.

Bayes to the Rescue

So the real question is: what’s the best way to combine these observations into a final prediction? That’s where Thomas Bayes helps.

Our softmax classifier for each picture Di and room class r gives us a probability of room belonging to that class:

Our task is to find a combined probability across several different pictures:

Let’s do some math. According to Bayes:

Assuming getting images like D1 and Dn are conditionally independent given r (Big Assumption), we can rewrite these further:

and invoke Bayes rule again:

Observing that the sum of all probabilities equals to one:

we can compute, for each r:

and then normalize it to 1:

The only thing our classifier does not give us is prior probabilities p(r); Luckily, we have other ways to determine the prior probabilities.

So what does it all mean?

Let’s assume we have 10 pictures of a room. For each picture, our classifier says that it’s a bedroom with 90% probability, but a pool room (billiards) with 10% probability. Yet we know from our priors that usually 20% of rooms are our bedrooms and only 1% are pool rooms.

Let’s put all those numbers into our last two equations.

So our formula says… “With 99.3% probability, this is a pool room.”

Who are we to disagree with Bayes? :)

Misha Kourinny is a senior engineer on the Matterport vision team. Matterport is a hardware and software company offering 3D capture, processing, and hosting solutions for real-world applications like real estate, construction, hospitality, and news & entertainment. Our content can be experienced in any desktop or mobile web browser, and on virtual reality platforms like Samsung Gear VR and Google Cardboard.

We’re hiring!

--

--