NEWS

SCIENTIFIC PUBLISHING

main search engine, but projects like GeoLink benefit from input from scientists with extensive knowledge in the subject area, who identify meaningful links that computer scientists then translate into code. Biomedical researchers have been forerunners in building fine-grained, subject-specific ontologies. PubMed, for example, uses the Medical Subject Headings ontology to streamline its query system. The potential of these projects goes beyond helping scientists find the right papers computers to process texts—a much needed quickly, says computer scientist C. Lee Giles service as millions of new papers come out of Pennsylvania State University, University every year. A similar initiative from the Allen Park. By extracting information on methInstitute for Artificial Intelligence (AI2) in ods and results from papers and pooling Seattle, Washington, is developing an intelsuch data together, search engines like Seligent academic search engine for computer mantic Scholar could automate the process science. Called Semantic of literature review and Scholar, it is expected comparison. to be fully released by For example, Etzioni the end of 2015. Eventusays, it would take a talally, Semantic Scholar’s ented computer science coverage will include graduate student weeks other subjects, says AI2 of extensive reading on Chief Executive Officer dependent parsing (a Oren Etzioni. task in natural language Oren Etzioni, Allen Institute for Existing academic processing) to gain an Artificial Intelligence search engines boast overview of techniques extensive coverage of used for it in the last 5 scientific literature. (Google Scholar alone years, the data sets produced, and the acindexes about 160 million documents by curacy rates. In contrast, Semantic Scholar some calculations.) Their reliance on keycould potentially compile the techniques and word searches, however, often means users results within seconds. “We are imagining get more junk than treasure. Search engines techniques that go way beyond just paper also don’t typically return raw data sets. recommendation, to the point where we are In contrast, GeoLink and Semantic Scholar really generating novel insights,” Etzioni says. attempt to build fine-grained, niche search Such instant overviews would espeengines catered to specific subject areas, by cially benefit junior scientists and intertapping into deeper semantic processing disciplinary scientists who enter a new field that helps computers establish scientifically of study, says computer scientist Christina meaningful connections between publicaLioma of the University of Copenhagen. tions. If Wiebe types in “plankton in the Red It would also enable scientists to identify Sea,” for example, the search engine would emerging trends in a field and adjust their understand it as a string of characters that directions accordingly, Giles says. show up on papers, but would also know the Realizing the technology’s potential, researchers who investigated the topic, the however, partially depends on having accruises they took, the instruments they used, cessible, text-minable literature for computand their data sets and papers. Google has ers to read, an issue that has pitted scholapplied similar techniques to improve its ars against some publishers. For now, the GeoLink project will mine only publicly available abstracts of studies. (Semantic Scholar will cover 4 million open-access computer science papers.) Computer scientists still have a lot of work to do to improve the accuracy of text processing, Giles says. Nonetheless, he believes that the semantic search A study of Red Sea plankton helped approach “is the Web of motivate smarter search engines. the future.” ■

Geoscientists aim to magnify specialized Web searching

National Science Foundation project, GeoLink, could become search hub for the geoscience community By Jia You

PHOTO: CARLETON RAY/SCIENCE SOURCE

W

hen oceanographer Peter Wiebe sat down recently to write a paper on his January Red Sea expedition, he wanted to examine all data sets on plankton in the region. Other researchers have been sampling the organisms for years, but he didn’t know where to find those data sets. “These data centers are kind of black holes,” says Wiebe, who works at the Woods Hole Oceanographic Institution in Massachusetts. “It’s very hard to figure out what’s in there and to get it out.” That could soon change. Wiebe is working with computer scientists to lay the groundwork for a smarter academic search engine that would help geoscientists find the exact data sets and publications they want in the blink of an eye, instead of spending hours scrolling through pages of irrelevant results. The group kicked off their project, called GeoLink, last month at the American Geophysical Union (AGU) meeting in San Francisco, California. The research effort is part of EarthCube, an initiative funded by the National Science Foundation (NSF) to upgrade cyberinfrastructure for the geosciences. Over the next 2 years, Wiebe and colleagues will build computer programs that can extract information from AGU conference abstracts, NSF awards, and geoscience data repositories and then digitally connect these resources in ways that make them more accessible to scientists. A pilot project that concluded this year has already developed some of the underlying design. If GeoLink garners sufficient interest, the researchers could turn it into a one-stop search hub for the geoscience community, says computer scientist Tom Narock of Marymount University in Arlington, Virginia, another principal investigator on the project. Projects like GeoLink are part of a growing effort to make literature reviews more efficient by leveraging the increasing ability of

“We are imagining techniques that go way beyond just paper recommendation.”

SCIENCE sciencemag.org

2 JANUARY 2015 • VOL 347 ISSUE 6217

Published by AAAS

11

Scientific publishing. Geoscientists aim to magnify specialized Web searching.

Scientific publishing. Geoscientists aim to magnify specialized Web searching. - PDF Download Free
177KB Sizes 2 Downloads 2 Views