AI that sees with sound, learns to walk and predicts seismic physics

The research in machine learning and AI, now a core technology in virtually every industry and business, is too vast for anyone to read it all. The purpose of this Perceptron column is to summarize some of the latest findings and papers, especially but not limited to the field of artificial intelligence, and explain why they are important.

This month, Meta engineers detailed two of the latest innovations from the depths of the company’s research labs: an AI system that compresses audio files and an algorithm that can speed up protein-folding AI performance by 60x. Elsewhere, MIT scientists revealed that they are using spatial acoustic information to help machines better visualize their environment by simulating how a listener will hear a sound from any point in a room.

Meta compression work isn’t exactly reaching uncharted territory. Last year, Google announced Lyra, a neural audio codec trained to compress speech at low bitrates. But Meta claims its system is the first to run CD-quality stereo audio, making it useful for commercial applications such as voice calls.

Meta audio compression

Meta audio compression

Architectural drawing of the Meta AI audio compression model. Image credits: Meta

Using AI, Meta’s compression system, called Encodec, can compress and decompress audio in real-time on a single CPU core at speeds of around 1.5kb/s to 12kb/s. Compared to MP3, Encodec can achieve about 10 times the compression rate at 64kb/s with no noticeable quality loss.

Encodec researchers say that human evaluators preferred the quality of audio processed by Encodec compared to audio processed by Lyra, suggesting that Encodec could eventually be used to provide better quality audio in situations where bandwidth is limited or at an additional cost.

As for Meta’s protein-folding work, it has less immediate commercial potential. But it could lay the groundwork for important scientific research in the field of biology.

Meta protein folding

Meta protein folding

Protein structures predicted by the Meta system. Image credits: Meta

Also Read :  Who are the leading innovators in microneedles for transdermal drug delivery for the pharmaceutical industry?

Meta says its AI system, ESMFold, predicted the structures of about 600 million proteins from bacteria, viruses and other microbes that have yet to be characterized. That’s more than triple the 220 million structures that Alphabet-backed DeepMind managed to predict earlier this year, covering almost every protein from known organisms in DNA databases.

Meta system is not as accurate as DeepMind. Of the roughly 600 million proteins it produced, only a third were of “high quality.” But it is 60 times faster in structure prediction, allowing it to scale structure prediction to much larger protein databases.

Not to draw too much attention to Matt, the company’s AI division also this month detailed a system designed for mathematical reasoning. The company’s researchers say their “neural problem solver” learned from a dataset of successful mathematical proofs to generalize to new, different types of problems.

Meta is not the first to create such a system. OpenAI developed its own, called Lean, which it announced in February. Separately, DeepMind has experimented with systems that can solve complex mathematical problems in the study of symmetries and knots. But Meta claims its neural problem solver was able to solve five times more International Mathematical Olympiads than any previous AI system and outperformed other systems on widely used math benchmarks.

Meta notes that math-solving AI could benefit software verification, cryptography, and even space.

Drawing on the MIT work, the researchers developed a machine learning model that can perceive how sounds in a room will travel through space. By modeling acoustics, the system can learn the geometry of the room from sound recordings, which can then be used to create a visual representation of the room.

The researchers say the technology could be used for virtual and augmented reality software or for robots that need to navigate complex environments. In the future, they plan to improve the system so that it can generalize to new and larger scenes, such as entire buildings or even entire cities.

Also Read :  Gabb phone review: Cellphone for kids without the internet worries

At Berkeley’s robotics department, two separate teams are accelerating the rate at which a four-legged robot can learn to walk and perform other tricks. One team sought to combine best-of-breed work from many other advances in reinforcement learning to allow a robot to go from a blank slate to power walking on undefined terrain in just 20 minutes in real time.

“Perhaps surprisingly, we find that with several careful design decisions in terms of task setup and algorithm implementation, a quadruped robot can learn to walk from scratch with deep RL in less than 20 minutes across a variety of environments and surface types. Crucially, it does not require new algorithmic components or other unexpected innovations,” the researchers write.

Instead, they choose and combine some of the most advanced approaches and get amazing results. You can read the paper here.

A demonstration of a robotic dog from EECS Professor Peter Ebeel’s lab in Berkeley, California in 2022. (Photo courtesy of Philipp Wu/Berkeley Engineering)

Another locomotion learning project from (TechCrunch friend) Pieter Abbeel’s lab was described as “imagination training.” They install a robot with the ability to try to predict how it will work, and while it starts out pretty helpless, it quickly gains more knowledge about the world and how it works. This leads to a better forecasting process that leads to better knowledge and so on in the feedback loop until it happens in less than an hour. It just as quickly learns to bounce back from being jostled or otherwise “spoiled” as the lingo is. Their work is documented here.

Work with a potentially more immediate application came earlier this month from Los Alamos National Laboratory, where researchers developed a machine learning technique to predict the friction that occurs during earthquakes, providing a way to predict earthquakes. Using the language model, the team says they were able to analyze the statistical characteristics of the seismic signals emitted by a fault in the laboratory’s earthquake equipment to predict the timing of the next earthquake.

Also Read :  Confidential Computing Market is Going to Boom with IBM, Intel,

“The model is not limited by the physics, but it predicts the physics, the actual behavior of the system,” said Chris Johnson, one of the researchers on the project. “Now we make future predictions based on past data, which is not descriptive of the current state of the system.”

Dream time

Dream time

Image credits: Dream time

The researchers say it’s difficult to apply the technique to the real world because it’s unclear whether there is enough data to train a predictive system. However, they are optimistic about applications that could include predicting damage to bridges and other structures.

Last week, MIT researchers warned that neural networks used to simulate actual neural networks should be scrutinized for training bias.

Neural networks are, of course, based on the way our brain processes and signals information by reinforcing certain connections and combinations of nodes. But this does not mean that synthetic and real work the same. In fact, the MIT team found that neural net-based simulations of grid cells (part of the nervous system) produced similar behavior only when their creators carefully forced them to do so. If they were allowed to govern themselves as actual cells do, they did not produce the desired behavior.

This is not to say that deep learning models are useless in this area – far from it, they are very valuable. But as Professor Ila Fiete said in a school news post, “they can be a powerful tool, but one has to be very careful in interpreting them and determining whether they actually make de novo predictions or even shed light on what the brain is optimizing.”

Source

Leave a Reply

Your email address will not be published.

Related Articles

Back to top button