Robot Technology News
ROBO SPACE
MIT researchers develop an efficient way to train more reliable AI agents
illustration only
MIT researchers develop an efficient way to train more reliable AI agents
by Adam Zewe | MIT News
Boston MA (SPX) Nov 25, 2024

Fields ranging from robotics to medicine to political science are attempting to train AI systems to make meaningful decisions of all kinds. For example, using an AI system to intelligently control traffic in a congested city could help motorists reach their destinations faster, while improving safety or sustainability.

Unfortunately, teaching an AI system to make good decisions is no easy task.

Reinforcement learning models, which underlie these AI decision-making systems, still often fail when faced with even small variations in the tasks they are trained to perform. In the case of traffic, a model might struggle to control a set of intersections with different speed limits, numbers of lanes, or traffic patterns.

To boost the reliability of reinforcement learning models for complex tasks with variability, MIT researchers have introduced a more efficient algorithm for training them.

The algorithm strategically selects the best tasks for training an AI agent so it can effectively perform all tasks in a collection of related tasks. In the case of traffic signal control, each task could be one intersection in a task space that includes all intersections in the city.

By focusing on a smaller number of intersections that contribute the most to the algorithm's overall effectiveness, this method maximizes performance while keeping the training cost low.

The researchers found that their technique was between five and 50 times more efficient than standard approaches on an array of simulated tasks. This gain in efficiency helps the algorithm learn a better solution in a faster manner, ultimately improving the performance of the AI agent.

"We were able to see incredible performance improvements, with a very simple algorithm, by thinking outside the box. An algorithm that is not very complicated stands a better chance of being adopted by the community because it is easier to implement and easier for others to understand," says senior author Cathy Wu, the Thomas D. and Virginia W. Cabot Career Development Associate Professor in Civil and Environmental Engineering (CEE) and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS).

She is joined on the paper by lead author Jung-Hoon Cho, a CEE graduate student; Vindula Jayawardana, a graduate student in the Department of Electrical Engineering and Computer Science (EECS); and Sirui Li, an IDSS graduate student. The research will be presented at the Conference on Neural Information Processing Systems.

Finding a middle ground
To train an algorithm to control traffic lights at many intersections in a city, an engineer would typically choose between two main approaches. She can train one algorithm for each intersection independently, using only that intersection's data, or train a larger algorithm using data from all intersections and then apply it to each one.

But each approach comes with its share of downsides. Training a separate algorithm for each task (such as a given intersection) is a time-consuming process that requires an enormous amount of data and computation, while training one algorithm for all tasks often leads to subpar performance.

Wu and her collaborators sought a sweet spot between these two approaches.

For their method, they choose a subset of tasks and train one algorithm for each task independently. Importantly, they strategically select individual tasks which are most likely to improve the algorithm's overall performance on all tasks.

They leverage a common trick from the reinforcement learning field called zero-shot transfer learning, in which an already trained model is applied to a new task without being further trained. With transfer learning, the model often performs remarkably well on the new neighbor task.

"We know it would be ideal to train on all the tasks, but we wondered if we could get away with training on a subset of those tasks, apply the result to all the tasks, and still see a performance increase," Wu says.

To identify which tasks they should select to maximize expected performance, the researchers developed an algorithm called Model-Based Transfer Learning (MBTL).

The MBTL algorithm has two pieces. For one, it models how well each algorithm would perform if it were trained independently on one task. Then it models how much each algorithm's performance would degrade if it were transferred to each other task, a concept known as generalization performance.

Explicitly modeling generalization performance allows MBTL to estimate the value of training on a new task.

MBTL does this sequentially, choosing the task which leads to the highest performance gain first, then selecting additional tasks that provide the biggest subsequent marginal improvements to overall performance.

Since MBTL only focuses on the most promising tasks, it can dramatically improve the efficiency of the training process.

Reducing training costs
When the researchers tested this technique on simulated tasks, including controlling traffic signals, managing real-time speed advisories, and executing several classic control tasks, it was five to 50 times more efficient than other methods.

This means they could arrive at the same solution by training on far less data. For instance, with a 50x efficiency boost, the MBTL algorithm could train on just two tasks and achieve the same performance as a standard method which uses data from 100 tasks.

"From the perspective of the two main approaches, that means data from the other 98 tasks was not necessary or that training on all 100 tasks is confusing to the algorithm, so the performance ends up worse than ours," Wu says.

With MBTL, adding even a small amount of additional training time could lead to much better performance.

In the future, the researchers plan to design MBTL algorithms that can extend to more complex problems, such as high-dimensional task spaces. They are also interested in applying their approach to real-world problems, especially in next-generation mobility systems.

The research is funded, in part, by a National Science Foundation CAREER Award, the Kwanjeong Educational Foundation PhD Scholarship Program, and an Amazon Robotics PhD Fellowship.

Research Report:"Model-Based Transfer Learning for Contextual Reinforcement Learning"

Related Links
Laboratory for Information and Decision Systems
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
Mini smart city drives design of safer automated transportation
Ithaca NY (SPX) Nov 22, 2024
The city resembles any number of urban centers - that's the point. It has blocks of residential houses and commercial businesses, landscaped parks, roads, roundabouts, and traffic lights. There is a variety of vehicles, even a few police cruisers that have pulled over speeding motorists. And it all fits in a single room in the basement of Hollister Hall. Welcome to the Information and Decision Science Laboratory. Here, a 20-by-20-foot "smart" city shrunk to 1:25 scale and its fleet of custom ... read more

ROBO SPACE
PLP launches drone kit for installing bird diverters on power lines

'Record' drone barrage pummels Ukraine as missile tensions seethe

Drones spotted flying near US Air Force bases in UK

Russia and Ukraine trade aerial attacks amid escalation fears

ROBO SPACE
3D-printing advance mitigates three defects simultaneously for failure-free metal parts

Shape memory alloy antenna redefines communication technology

Impossible objects brings high-speed CBAM 25 series 3D printer to Europe

Tunable ultrasound propagation in microscale metamaterials

ROBO SPACE
Cooling with light explored through semiconductor quantum dots

Photon qubits advance quantum computing without error correction techniques

A pathway to advanced quantum devices with zinc oxide quantum dots

Rocket Lab secures $23.9M CHIPS Award to boost semiconductor production

ROBO SPACE
Serbia lifts moratorium on nuclear power

Cheers, angst as US nuclear plant Three Mile Island to reopen

Argonne evaluates small modular reactors for Ukraine's economic recovery

Framatome's PROtect fuel achieves key milestone at Gosgen Nuclear Plant in Switzerland

ROBO SPACE
Chinese man sentenced to 20 months for Falun Gong harassment in US

Chemical weapons watchdog says banned gas found in Ukraine samples

Thai military accused of beating Myanmar man to death

Syrians, Iraqis archive IS jail crimes in virtual museum

ROBO SPACE
Contentious COP29 deal casts doubt over climate plans

Ukraine says energy sector 'under massive enemy attack'

Developing nations slam 'paltry' $300 bn climate deal

Biden praises COP29 deal, vows US action despite Trump

ROBO SPACE
Breakthrough in heat-to-electricity conversion demonstrated in tungsten disilicide

A nonflammable battery to power a safer, decarbonized future

Quantum-inspired design boosts efficiency of heat-to-electricity conversion

Engineers develop additive for affordable renewable energy storage

ROBO SPACE
China inflatable space capsule aces orbital test

Tianzhou 7 completes cargo Mission, Tianzhou 8 docks with Tiangong

Zebrafish thrive in space experiment on China's space station

China's commercial space sector expands as firms outline ambitious plans

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.