Collaborative Problem Solving at Pacific Northwest National Laboratory
Home
Modeling Problem Solving

PSE Data Management

Climate Modeling PSEs

Distributed Queries

Information Services



ellipse

Data Management for Problem Solving Environments



Scientific problem solving environments seek to integrate the activities necessary to accomplish high level domain tasks.  They may include support for managing scientific workflow, tracking data pedigres, transforming and filtering data, automating feature extraction, and annotated records management.  Pulling these and other capabilities together to provide systems that support scientific problem solving requires a flexible and dynamic data management architecture.   The data architecture serves as the underlying glue that ties together long running reseach processes and widely distributed collaborators.

In the late 1980's and throughout the 1990's, Relational Database Management Systems (RDBMS) and Object Oriented Database Management Systems (OODBMS) technology emerged as the leaders for traditional data management solutions.  These architectures are well suited for organizations that are centrally located and which have control over the client platforms and systems.  In this paradigm, systems that use databases are constructued to provide business support for well-defined problems.  The data is located on one centralized server or several replicated servers and managed by one organization.  The data itself is componsed primariliy of simple data types with some large objects (LOBS).

While a small PSE can conceivably function using this architecture, the complex PSEs we envision would have a difficult time coping in this environment.  Scientific PSEs are shared by many organizations with no organization controlling the data.  Data can be stored in multiple databases, files systems or both.  The data would overwhelm a traditional data server architecture and its administrators.  The traditional architecture requires agreement and enforcement of ontologies and schemas that are then mapped into the underlying technology.  There are several undesirable impacts:

  • As the scope of a PSE increases, the number of parties that must agree upon ontology become untenably large.
  • As components are incorporated in a PSE, negotiation is required between the components developer and the PSE framework designers and data administrators.
  • As best practices evolve or PSEs are extended to support users with different goals, the schema and data structures must be changed simultaneously and existing data migrated.
  • As PSE usage expands, the need for federated access to multiple data stores becomes necessary.  With traditional approaches, this is difficult due to incompatable access mechanisms and non-integrable and non-discoverable schemas.


Our research is focused on light weight data management solutions with dynamic, open and discoverable schemas that mitigate these problems.  We have successfully applied such a strategy an existing problem solving environment, Ecce.  Further information about the new Ecce data architecture can be found in "Open Data Management Solutions for Problem Solving Environments: Application of Distributed Authoring and Versioning to the Extensible Computational Chemistry Environment".  We also have two additional projects that will be focusing on open data architectures for scientific problem solving: Scientific Annotation Middleware and Collaboratory for Multiscale Chemistry.
 
 

Pacific Northwest National Laboratory is operated by Battelle for the U.S. Department of Energy.

For information about Collaborative Problem Solving Environments at PNNL, please contact Deborah Gracio at (509) 375-6362 or debbie.gracio@pnl.gov.

Security & PrivacyWebmaster
Reviewed: August 18, 2000