Subscribe free to our newsletters via your
  Robot Technology News  




Subscribe free to our newsletters via your




















ROBO SPACE
Artificial-intelligence system surfs web to improve its performance
by Staff Writers
Boston MA (SPX) Nov 14, 2016


illustration only

Of the vast wealth of information unlocked by the Internet, most is plain text. The data necessary to answer myriad questions - about, say, the correlations between the industrial use of certain chemicals and incidents of disease, or between patterns of news coverage and voter-poll results - may all be online. But extracting it from plain text and organizing it for quantitative analysis may be prohibitively time consuming.

Information extraction - or automatically classifying data items stored as plain text - is thus a major topic of artificial-intelligence research. Last week, at the Association for Computational Linguistics' Conference on Empirical Methods on Natural Language Processing, researchers from MIT's Computer Science and Artificial Intelligence Laboratory won a best-paper award for a new approach to information extraction that turns conventional machine learning on its head.

Most machine-learning systems work by combing through training examples and looking for patterns that correspond to classifications provided by human annotators. For instance, humans might label parts of speech in a set of texts, and the machine-learning system will try to identify patterns that resolve ambiguities - for instance, when "her" is a direct object and when it's an adjective.

Typically, computer scientists will try to feed their machine-learning systems as much training data as possible. That generally increases the chances that a system will be able to handle difficult problems.

In their new paper, by contrast, the MIT researchers train their system on scanty data - because in the scenario they're investigating, that's usually all that's available. But then they find the limited information an easy problem to solve.

"In information extraction, traditionally, in natural-language processing, you are given an article and you need to do whatever it takes to extract correctly from this article," says Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and senior author on the new paper. "That's very different from what you or I would do. When you're reading an article that you can't understand, you're going to go on the web and find one that you can understand."

Confidence boost
Essentially, the researchers' new system does the same thing. A machine-learning system will generally assign each of its classifications a confidence score, which is a measure of the statistical likelihood that the classification is correct, given the patterns discerned in the training data. With the researchers' new system, if the confidence score is too low, the system automatically generates a web search query designed to pull up texts likely to contain the data it's trying to extract.

It then attempts to extract the relevant data from one of the new texts and reconciles the results with those of its initial extraction. If the confidence score remains too low, it moves on to the next text pulled up by the search string, and so on.

"The base extractor isn't changing," says Adam Yala, a graduate student in the MIT Department of Electrical Engineering and Computer Science (EECS) and one of the coauthors on the new paper. "You're going to find articles that are easier for that extractor to understand. So you have something that's a very weak extractor, and you just find data that fits it automatically from the web." Joining Yala and Barzilay on the paper is first author Karthik Narasimhan, also a graduate student in EECS.

Remarkably, every decision the system makes is the result of machine learning. The system learns how to generate search queries, gauge the likelihood that a new text is relevant to its extraction task, and determine the best strategy for fusing the results of multiple attempts at extraction.

Just the facts
In experiments, the researchers applied their system to two extraction tasks. One was the collection of data on mass shootings in the U.S., which is an essential resource for any epidemiological study of the effects of gun-control measures. The other was the collection of similar data on instances of food contamination. The system was trained separately for each task.

In the first case - the database of mass shootings - the system was asked to extract the name of the shooter, the location of the shooting, the number of people wounded, and the number of people killed. In the food-contamination case, it extracted food type, type of contaminant, and location. In each case, the system was trained on about 300 documents.

From those documents, it learned clusters of search terms that tended to be associated with the data items it was trying to extract. For instance, the names of mass shooters were correlated with terms like "police," "identified," "arrested," and "charged." During training, for each article the system was asked to analyze, it pulled up, on average, another nine or 10 news articles from the web.

The researchers compared their system's performance to that of several extractors trained using more conventional machine-learning techniques. For every data item extracted in both tasks, the new system outperformed its predecessors, usually by about 10 percent.


Comment on this article using your Disqus, Facebook, Google or Twitter login.

Thanks for being here;
We need your help. The SpaceDaily news network continues to grow but revenues have never been harder to maintain.

With the rise of Ad Blockers, and Facebook - our traditional revenue sources via quality network advertising continues to decline. And unlike so many other news sites, we don't have a paywall - with those annoying usernames and passwords.

Our news coverage takes time and effort to publish 365 days a year.

If you find our news sites informative and useful then please consider becoming a regular supporter or for now make a one off contribution.

SpaceDaily Contributor
$5 Billed Once


credit card or paypal
SpaceDaily Monthly Supporter
$5 Billed Monthly


paypal only

.


Related Links
Massachusetts Institute of Technology
All about the robots on Earth and beyond!






Share this article via these popular social media networks
del.icio.usdel.icio.us DiggDigg RedditReddit GoogleGoogle

Previous Report
ROBO SPACE
'Bots' step up for 2016 election news coverage
Washington (AFP) Nov 5, 2016
If you're reading about the US election, some of that news is likely to come to you from a "bot." Automated systems known as "bots" or "robo-journalism" have been around for years, but they are playing a bigger role in coverage this year amid technology advances and stretched media resources. The New York Times, Washington Post, CNN, NBC, Yahoo News and the non-profit Pro Publica are amo ... read more


ROBO SPACE
A remote-controlled drone helps in designing future wireless networks

U.S. Navy's first drone squadron stands up

Iraqi forces battle car bombs with commercial drones

China to export CH-5 drone

ROBO SPACE
First random laser made of paper-based ceramics

Scientists have 'scared away' microparticles with laser light

Study: Math scares everyone, even physicists

Exotic property of salty solutions discovered

ROBO SPACE
Engineers develop invisibility cloak for high-tech processing chips

Computers made of genetic material

The thinnest photodetector in the world

Stable quantum bits can be made from complex molecules

ROBO SPACE
Japan, India sign controversial civil nuclear deal

Vietnam to scrap planned nuclear plants: state media

French, Finns divided over nuclear dispute ruling

Russia, China Plan Documents to Build 2 New Tianwan Nuclear Power Plant Reactors

ROBO SPACE
Tunisia says jihadist group head killed

Three US troops fatally shot in Jordan

Baghdadi: The enigmatic IS jihadist chief

Nanosensors on the alert for terrorist threats

ROBO SPACE
Deeper carbon cuts needed to avoid climate tragedy: UN

New program makes energy-harvesting computers more reliable

Australian consortium buys power grid after Chinese bid blocked

UNESCO urges Bangladesh to scrap Sundarbans plant

ROBO SPACE
Bottlebrush polymers make dielectric elastomers viable for use in devices

Salty batteries

Lithium ion extraction

Shoring up the power grid - with DIY scrap-metal batteries

ROBO SPACE
Long March-5 reflects China's "greatest advancement" yet in rockets

New heavy-lift carrier rocket boosts China's space dream

Long March-7 being assembled, to transport Tianzhou-1

Kuaizhou-1 scheduled to launch in December




Memory Foam Mattress Review
Newsletters :: SpaceDaily :: SpaceWar :: TerraDaily :: Energy Daily
XML Feeds :: Space News :: Earth News :: War News :: Solar Energy News






The content herein, unless otherwise known to be public domain, are Copyright 1995-2017 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. Privacy Statement