Scientific Data Management Group Tools
Please find below a description of the key tools that the scientific data management group has developed or has significantly contributed to their development. The group uses these tools flexibly both on their own and in combination with other tools to customize integrated infrastructures that meet the scientific user's needs.
Collaborative Analytical Toolbox (CAT) is a client-server based analytic framework based on ECLIPSE for building and organizing a knowledge repository. CAT is the result of over 5 man-years of software development at Pacific Northwest National Laboratory and has been applied to a number of research domains. The environment provides a familiar, customizable interface that enables users to:
- Customize the organization of their information.
- View/browse their information in any number of ways and via any number of arbitrary hierarchies.
- Browse or search for information from a variety of sources using many different search tools.
- Pull data back into their project space.
- Integrate with existing tools to analyze their data.
- Collaborate with other users by sharing data, templates, annotations, etc.
Interactive Software Development Environment (ISDE) is a framework developed by the Atmospheric Radiation Measurement (ARM) program that includes tools, C libraries, and graphical user interfaces designed to flexibly compose transformation and analysis tasks on atmospheric science data. ARM is using ISDE to develop instrument ingests that convert raw instrument data into netCDF files that meet ARM's data standards, and Value Added Products (VAP) applications which apply scientific algorithms to existing ARM data products to produce new netCDF data product(s) e.g. with added scientific value or through integration with other external data products. For more information please refer to the ISDE's home page.
The Experimental Data Management System (EDMS) is a web-based portal for the common storage, organization, and management of high-throughput biological data. EDMS has been developed as a part of internal research initiatives in Systems Biology and Data Intensive Computing at the Pacific Northwest National Laboratory (PNNL). The goal of the EDMS is to facilitate integrated data management across multiple programs at the Laboratory and secure data sharing with collaborators in the William R. Wiley Environmental Molecular Sciences Laboratory (EMSL) user community. The database also provides a mechanism by which PNNL scientists can publicly disseminate published data for access by the greater scientific community.
Kepler is an open source workflow engine that has been customized to support the needs of scientific workflows. The DOE SciDAC Scientific Data Management (SDM) Center was one of the founding projects that came together to co-develop Kepler. As lead for the SDM Center scientific process automation thrust area and an inaugural member of the Kepler Consortium, we have made significant contributions to the development of Kepler and its deployment supporting large-scale simulation workflows. See the Kepler organization home page for more information about this project.
SWAWT is a workflow management system designed to streamline team-oriented software development activities. Combining widely used tools (ant, make, Subversion, CVS, RPM, XML, etc.), SWAWT creates an open environment that actually bridges software development phases with project management tasks. The design and implementation of SWAWT is based on roles, conventions, and procedures that will work with any software life cycle process (Waterfall, XP, etc.). This practical approach integrates, automates, and even eliminates many activities associated with development, testing, configuration management, packaging, and delivery of software. SWAWT is in use on several projects throughout PNNL, and is also in use at Brookhaven and Oak Ridge.
VESPA A visual analytics platform for exploring Proteogenomics data
VESPA targets at the integration of peptide-centric proteomics data with other high-throughput, qualitative and quantitative data, such as data from ChIP-seq analyses. At the core, VESPA integrates bottom-up proteomics data with genome level information, i.e., mapping peptides to their respective genome locations. The visualization allows the user to observe the location and sequence of peptides that do not match current annotations, as well as offering valuable filtering criteria such as the removal of ambiguous peptides. Read more about VESPA and download a copy.
PBCdlComm The open-source software is designed for data acquisition from PakBus-protocol based Campbell Data loggers. The implements is a significant subset of the PakBus protocol, a proprietary family of procotols created by Campbell Scientific for communication between connected devices. This work was funded by US Department of Energy's Atmospheric Radiation Measurement program (ARM). As an alternative for Campbell Scientific's Loggernet software, this is ideal for deploying with rugged field PCs that have limited processing resources and do not require a full-blown GUI module. The software is designed so that it can be extended to implement a new data storage mechanism, a customized post-processing module and/or integrated with web-services for sophisticated monitoring or reporting. So far, PbCdlComm has been tested with CR1000 and CR3000 dataloggers.