Advanced Computing, Mathematics and Data Division
Staff Awards & Honors
PNNL Scientists Tackle Data-scaling Challenges in New Paper
Group to present as part of DISCS-2013 in Denver
As part of the ongoing Resource Discovery for Extreme Scale Collaboration, or RDESC, project funded by the DOE's Advanced Scientific Computing Research (ASCR) program, a team of PNNL scientists from the Global Security Technology & Policy, Data Intensive Scientific Computing, High Performance Computing, and Scientific Data Management groups collaborated on a paper, "Toward a Data Scalable Solution for Facilitating Discovery of Scientific Data Resources," to be presented as part of the workshop program at the 2013 International Workshop on Data-Intensive Scalable Computing Systems (DISCS-2013).
The software system architecture for semantic graph databases.
The paper begins by describing the challenges involved in collecting, managing, and processing the vast quantities of data enabled by today's computing systems and affecting science domains ranging from genomics to climate science. The RDESC project itself aims to curate diverse metadata about soil, atmospheric, and climate data sets. The large quantity and heterogeneity of the metadata then motivates a specific data-scaling challenge in answering questions posed by atmospheric scientists. While the metadata are available in different binary or syntactic formats, the authors converted these "native" forms to Resource Description Framework (RDF) triples and a Web Ontology Language, known as OWL, ontology, and they formulated questions as SPARQL queries, a query language supported by most RDF databases. After describing the data challenges, the paper presents a scalable semantic database platform developed by PNNL for processing SPARQL queries on large RDF data sets. SGEM (Semantic Graph Engine, Multithreaded) is a multiple-layer stack that includes a parallel runtime system, a data structure library, and a SPARQL-to-C++ compiler.
The authors--Alan Chappell, Sutanay Choudhury, John Feo, David Haglin, Alessandro Morari, Sumit Purohit, Karen Schuchardt, Antonino Tumeo, Jesse Weaver (all PNNL) and Oreste Villa (NVIDIA)--will present their work at DISCS-2013 during Session 4: Data Analytics and Tools.
Held in conjunction with SC13 in Denver, DISCS-2013 unites researchers in an effort to exchange ideas and discuss approaches for addressing the challenges facing data-intensive and high-performance parallel computing at the extreme scale.