The proverbial, “a Picture says a thousand words” not only depicts the expressive nature of the pictures, but also the multitude of messages it can convey. We as human beings have always had the memory of the place where our picture was taken and if not, we usually arrive at the conclusion by judging the nearby landmarks say the buildings, monuments and other related things which are easily recognisable. If not for these clues, it would virtually be impossible for us to know where the picture was taken.


Now, while humans can have a partially eidetic memory and can stir up their brains to correlate several things to arrive at the location of the picture, it is not so easy with the machines. All that is now set to change as a computer vision specialist Tobias Weyand from Google along with his teammates has developed a deep learning machine that would find out the location of any photo by picking up the scent from its pixels.

Weyand approach begins by dividing the World into 26,000 unequal grids based on the photos taken in that particular location. As he explains, the bigger cities will come with finer grained grid structure as opposed to the remote location. In the next step, the team will be creating a database of geolocated images from the Web and corroborate with the location data, thus arriving at the grid where the photo was taken.

Obviously, the database is pretty exhaustive at 126 million images and it is something that keeps on growing. Using all these images, Weyand has taught a powerful neural network to zero in on the grid location itself without any human intervention. After another round of validation of the data, the result is up. They call the network PlaNet and since it is based on deep machine learning it only goes on to become much more accurate.

In order to test the accuracy of the machine, it was fed with 2.3-Million geotagged images from Flickr to see how accurately it could determine their positions. PlaNet was able to localize nearly 3.6 percent of the images at street level accuracy and 10.1 percent at the city level accuracy. The percentage increases to 28.4 of the photos on a country of origin level and 48 percent when it comes to the continent level.

Here are some stats to show how PlaNet was better than humans at what it did, “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” Weyand further added that “This small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocation Street View Scenes.

In case of pictures taken in remote places or indoors, the Machine tries to pair them up with other images from the same album in order to arrive at the location. Weyand further assures that his model uses only “377MB” which will make things easier for smartphone users.

Also Read:
Senior Author

Mahit Huilgol is a Mechanical Engineering graduate and is a Technology and Automobile aficionado. He ditched the Corporate boardroom wars in the favor for technology battle ground. Also a foodie by heart and loves both the edible chips and the non-edible silicon chips.


Leave a Reply

Your email address will not be published. Required fields are marked *