Facebook presented Rosetta at the end of 2018 with the idea of better understanding the millions of images that were uploaded to the social network. The system is capable of extracting text and classifying these images autonomously, but these advances – also used by Google – can be used for worrying purposes.
For example, to find out and collect police car license plates. Various experts have discovered that all such information can be extracted and leaked, which can pose a potential threat to the privacy of individuals and entities.
10 GOOGLE APPS THAT COULD HAVE SUCCESSFUL
Google, Facebook and the risks of massive data extraction
Rosetta was conceived as a system to "understand text in images and videos with machine learning." This tool was aimed at "improving experiences like more relevant photo search or adding text to screen readers that make Facebook more accessible to people with visual disabilities."
The proposal, Facebook engineers said in their Rosetta description, would also help identify inappropriate or harmful content "and keep our community safe." The system is capable of analyzing and extracting the text daily and in real time from more than one billion public images from Facebook and Instagram and even from video frames.
What is achieved with something like this? Well, that metadata is added to each of these images that describe it and that allow it to be easily labeled and classified. The danger of this classification is that it ends up allowing potentially dangerous uses.
This is what is indicated, for example, by the cybersecurity experts at Quantika14, who have indicated that it is possible to carry out a massive extraction of this data without Facebook being able to prevent it. The system identifies how many people are in a photo – and how many are smiling – if those people are standing, if they are in a bike lane or already put if they are with a hose spraying water.
All this information is stored and registered in the parameters of the image, and this information can be extracted once classified by Facebook. Those responsible for the report also indicated how this could be used to discover that each photo of a car normally had the license plate associated in that metadata.
Another example is the one I show in the video. How to obtain car registrations from the @policia or from a company?1. I access your page (https://t.co/MKYcHgHBzT)2. I download links from your images3. I download the text analysis pic.twitter.com/kJceDE6YGq
– Gorgue de Triana (@JorgeWebsec) February 9, 2022
As one of the team members explained, this makes it possible, for example, to obtain the license plates of police cars or of a company. Simply access the target's Facebook web page, download the links to those images, and then download the analysis of the texts associated with those texts.
The discovery of Quanktika is not new: the competitive analysis company Molfar has already discovered this option and has shown how in certain cases it allows personalities to be followed.
The search for National Police car license plates is also possible directly in the Google image search engine.
The recognition and identification of license plates from photos has also been used for years by Google —which offers its Cloud Vision API for these purposes—, and the risks of applying this type of analysis to the data are evident.
At Xataka we have contacted those responsible for Google and Facebook to further clarify the operation of this type of system and its scope. From Meta they indicate that these descriptions do not reveal sensitive information, and they also add that if a person wishes to change the alternative text of an image that they upload, they can do so.
Update (11/2/2022, 10:45): added comments from Facebook spokespersons.