As part of its investment in augmented reality, Facebook has collaborated with Ray-Ban to create its own line of AR glasses. At the moment, these devices are limited to recording and sharing imagery, but the company anticipates that such devices will be utilised for other purposes in the future.
Facebook’s AI team has undertaken a new study that sheds light on the company’s long-term goals. It envisions artificial intelligence (AI) systems constantly evaluating the lives of individuals using first-person video, capturing everything they see, do, and hear in order to assist them with daily chores. “Episodic memory” (answering inquiries like “where did I leave my keys?”) and “audio-visual diarization” are among the skills that Facebook researchers want these systems to develop (remembering who said what when).
As of now, no AI system can successfully perform the tasks listed above, and Facebook emphasises that this is a research effort rather than a commercial product. Even so, it’s evident that the firm is looking to the future of AR computing with features like these.
Such lofty goals have enormous ramifications for personal data privacy. Experts in the field of privacy are already concerned about how Facebook’s AR glasses allow users to secretly record members of the public. Future iterations of the gear might convert wearers into walking surveillance machines as well as record, analyse, and transcribing video material.
With its first commercial AR glasses, the company intends to focus on recording and sharing video and images, rather than doing analysis on them.
First-person or “egocentric” video is the focus of a new Facebook research initiative called Ego4D. A dataset of egocentric videos and benchmarks that Facebook hopes AI systems will be able to handle in the future make up the two main components of the project. A total of 13 colleges around the world worked with Facebook to compile this massive dataset—the largest of its sort ever compiled. For this project, 855 people from nine nations contributed a total of about 3,205 hours of video footage.
Rather than Facebook, research institutions were in charge of gathering the data. In order to capture unscripted activities, participants wore GoPro cameras and AR glasses. Some were compensated to participate. This includes anything from building things to baking to spending time with pets and interacting with friends. The institutions sanitised all of the video material by obscuring the faces of spectators and eliminating any personally identifiable information from the recordings.
Another component of Ego4D is a collection of benchmarks (or tasks). Facebook is encouraging researchers all over the world to attempt to solve these tasks using artificial intelligence systems.
As per Facebook,
Episodic memory: What happened when (e.g., “Where did I leave my keys?”)?
Forecasting: What am I likely to do next (e.g., “Wait, you’ve already added salt to this recipe”)?
Hand and object manipulation: What am I doing (e.g., “Teach me how to play the drums”)?
Audio-visual diarization: Who said what when (e.g., “What was the main topic during class?”)?
Social interaction: Who is interacting with whom (e.g., “Help me better hear the person talking to me at this noisy restaurant”)?
Currently, AI systems would have a difficult time solving any of these issues, but the creation of datasets and benchmarks is a tried-and-true strategy for promoting AI progress.
To be sure, ImageNet, a dataset and yearly competition created to promote it, is widely given credit for helping to kick-start the recent AI boom. Scientists used images from the ImagetNet databases to train AI systems to recognise images of a wide range of things. Using a new method of deep learning to blow past competitors, the winning entry in the competition in 2012 launched the present era of scientific investigation.
With its Ego4D augmented reality initiative, Facebook hopes to achieve comparable results in the augmented reality space. One day, systems trained on Ego4D could be utilised in a wide range of applications, from wearable cameras to home helper robots that employ first-person cameras to navigate their surroundings.
When it comes to advancing research in this area, Grauman believes that the project has the potential to do just that. The company’s discipline must shift from analysing piles of human-taken images and films to this flowing, continual first-person visual stream that AR systems and robots must grasp in the context of ongoing activity.
Even if Facebook’s goals appear doable, there will be many who are alarmed by the company’s focus on this area. As far as data leaks and FTC fines go, Facebook has a dismal track record. Many instances have proved that the company prioritises growth and engagement over the needs of its customers, and this has been proven time and time again. So it is alarming to see that the Ego4D project does not include privacy precautions as a major concern. Individuals who don’t want to be recorded aren’t mentioned at all in the “audio-visual diarization” job (which involves transcribing what people say).