Collaborative Problem Solving Environment for Climate
Computational scientists at Pacific Northwest National Laboratory (PNNL) are designing a collaborative problem solving environment (CPSE) to support regional climate modeling and assessment of climate impacts. Where most climate computational science research and development projects focus at the level of the scientific codes, file systems, data archives, and networked computers, our analysis and design efforts are aimed at designing enabling technologies that are directly meaningful and relevant to climate researchers at the level of the practice and the science. We seek to characterize the nature of scientific problem solving and look for innovative ways to improve it. Moreover, we aim to glimpse beyond current systems and technical limitations to derive a design that expresses the regional climate or impact assessment modeler's own perspective on research activities, processes, and resources. The product of our analysis and design work is a conceptual regional climate and impact assessment CPSE prototype that specifies a complete simulation and modeling user environment and a suite of high-level problem solving tools. While this prototype is defined in terms of climate modeling and impact assessment, many of the problem-solving features apply equally well to other science domains.
The prototype brings together a number of CPSE features to create a comprehensive environment. These features include visual workflow design, workflow execution, calculation tables, temporal and spatial data browsers, data pedigree management, lab notebooks, free-form annotation, and computer mediated collaboration. These features are fully described in Conceptualizing a Collaborative Problem Solving Environment for Regional Climate Modeling and Assessment of Climate Impacts .
The prototype can be viewed via the web. IE4 and Flash are required.
Testbed Climate Workbench
In order to assess some of the technical issues surrounding the full-scale development of the system as described in the conceptual prototype, a Testbed Climate Workbench that investigates design and implementation issues of key features has been developed. Our efforts focused on exploring user interaction issues using a workflow-centric interface and the implications to the design and implementation of the software infrastructure. While we designed a Web-services architecture capable of taking advantage of both commercial and Grid Infrastructure tools, we did not implement that architecture, instead opting to implement just enough functionality to get user feedback on the problem solving aspects of our interaction model.Figure 1 below shows the primary Testbed Climate workbench interface which is a desktop with one or more workflows representations. The workflow diagram displays a set of applications and the control flow between those applications. Shown are the Mesoscale Model (MM5) from the National Center for Atomospheric Research (NCAR) and its preprocessing programs as typically used by climate change researchers at PNNL.
Figure 1. The Testbed Climate Workbench.
Although no support for building the workflow was developed, the system is designed such that applications can be plugged into this architecture from a workflow descriptor file that could be created graphically or edited manually. Standard operations can be performed on any of the applications (Figure 2) including bringing up an editor to as shown for Terrain in Figure 3. Color codes and icons show the execution state and success/fail status of each application (Figure 4). Researchers can branch off from any point in the process and define new computations.
Figure 2. Options to edit and run the regrid application.
Figure 3. Workflow showing execution states.
Figure 4. Terrain input options editor.
As the researcher changes parameters and executes models, metadata about their process is automatically saved in a data server. The data server is accessed using the Web Distributed Authoring and Versioning Protocol (WebDAV) and is implemented using the Apache server and mod_dav module. From this metadata record, the actual execution process can be recreated and presented to the user allowing branching from any point in the process. This flexibility supports the dynamic nature of research processes. Metadata currently capured includes:
- All input parameters
- state of the job
- completion status
- job log file
- run information (machine, etc)
- input and output files
For simplicity the history list (Figure 5) is currently presented
as a list. Graphical representations (Figure 6) as presented in the
conceptual prototype are needed.
Figure 5. History List. Scientists can select any event on the history list and reconstruct the workflow process that it represents.
Figure 6. A graphical history list.
Based on user feedback, the workflow-centric model presented here is a promising way for scientists to interact with their computing environment. However there are many issues requiring further investigation. For example, we envision a researcher with different workflows for different projects or multiple workflows for a given project. There are many questions regarding how a researcher interacts with multiple workflows and how to manage and present workflows that change over time. In addition, full specification of the workflow requires looping constructs and conditional branch capabilities. There has been significant investigation from a schematic standpoint in the Web Services Flow Language (WSFL). However there are many unresolved issues with the schema itself and how the researcher would interact with a visual representation with very detailed control constructs. Also of interest is how a scripting language might relate to such representations.
More work is required on representations of the historical record of the research process.
An interesting issue is maintaining the integrity of input options that are independently specified for each application but may have dependencies between with other applications in the workflow . For example, in our MM5 Workbench, several of the applications require temporal information - start and stop day of the simulation and hour interval. Validity rules must be enforced across the execution chain or applications will fail or provide flawed results. It is unclear how to support this while maintaining a generic architecture for defining workflows.
There are also issues to investigate when delegating resource allocation
to the workflow engine. In such a scenario, a user specifies what
to run and a Grid-aware workflow engine finds suitable resources.
Grid superschedulers also may perform the task of allocating resources
but there are cases where only an application that knows more about the
dependencies of the processes involved can make suitable decisions.
As an example, given two sequential processes, how does a scheduler account
for the tradeoff in queue delays versus the cost of moving data from one
machine to another.

