Data Management for Problem Solving Environments
Scientific problem solving environments seek to integrate the activities
necessary to accomplish high level domain tasks. They may include
support for managing scientific workflow, tracking data pedigres, transforming
and filtering data, automating feature extraction, and annotated records
management. Pulling these and other capabilities together to provide
systems that support scientific problem solving requires a flexible and
dynamic data management architecture. The data architecture
serves as the underlying glue that ties together long running reseach processes
and widely distributed collaborators.
In the late 1980's and throughout the 1990's, Relational Database Management Systems (RDBMS) and Object Oriented Database Management Systems (OODBMS) technology emerged as the leaders for traditional data management solutions. These architectures are well suited for organizations that are centrally located and which have control over the client platforms and systems. In this paradigm, systems that use databases are constructued to provide business support for well-defined problems. The data is located on one centralized server or several replicated servers and managed by one organization. The data itself is componsed primariliy of simple data types with some large objects (LOBS).
While a small PSE can conceivably function using this architecture, the complex PSEs we envision would have a difficult time coping in this environment. Scientific PSEs are shared by many organizations with no organization controlling the data. Data can be stored in multiple databases, files systems or both. The data would overwhelm a traditional data server architecture and its administrators. The traditional architecture requires agreement and enforcement of ontologies and schemas that are then mapped into the underlying technology. There are several undesirable impacts:
- As the scope of a PSE increases, the number of parties that must agree upon ontology become untenably large.
- As components are incorporated in a PSE, negotiation is required between the components developer and the PSE framework designers and data administrators.
- As best practices evolve or PSEs are extended to support users with different goals, the schema and data structures must be changed simultaneously and existing data migrated.
- As PSE usage expands, the need for federated access to multiple data stores becomes necessary. With traditional approaches, this is difficult due to incompatable access mechanisms and non-integrable and non-discoverable schemas.
Our research is focused on light weight data management solutions
with dynamic, open and discoverable schemas that mitigate these problems.
We have successfully applied such a strategy an existing problem solving
environment, Ecce.
Further information about the new Ecce data architecture can be found in
"Open Data Management Solutions for Problem
Solving Environments: Application of Distributed Authoring and Versioning
to the Extensible Computational Chemistry Environment". We also
have two additional projects that will be focusing on open data architectures
for scientific problem solving: Scientific Annotation
Middleware and Collaboratory for Multiscale
Chemistry.

