# Large-scale Data Analytics

Large scale data analytics focuses on development of new approaches and technologies that can analyze and build predictive models from massive and distributed data sources. PNNL researchers are applying and developing new resources to integrate computational statistical languages with distributed computing systems, perform statistical tasks in near-real time, and visualize the results.

Within this context machine learning algorithms are being employed to build predictive models from static and streaming data, often heterogeneous in nature with partial information. Our approach to solving critical national level challenges employ robust training and validation methodologies and operational demonstration. The data analytic team broadly addresses the challenge of signature discovery, testing and validation trough integration and collaboration with domain and computer scientists to supply high value solutions in near real-time.

**Methods:**

- Bayesian and neural networks
- Linear and non-linear support vector machines
- Ensemble methods (e.g., Random forests)
- Nearest neighbor algorithms
- Qualitative and quantitative regression
- Data fusion and integration
- Bayesian model averaging
- Deep learning
- Incremental learning