Trusted and Responsible Artificial Intelligence

Current artificial intelligence (AI) evaluation processes can mislead scientists into developing models that are biased, not operational, and lack transparency. Understanding how a machine learning model performance will change during real-world operation, identifying as many modes of model failure as possible, and explaining why the model behaves as it does is particularly important for mission-critical systems that support national security missions.

Trusted and Responsible AI

Jeff London | Pacific Northwest National Laboratory

Interactive Analytics Across Dimensions

Pacific Northwest National Laboratory has developed a tool suite of interactive analytics that can be rapidly integrated into analyst workflows to empirically analyze and gain qualitative understanding of AI model performance jointly across multiple dimensions—accountability, transparency, fairness, and robustness.

  • Accountable systems must demonstrate relative reliability when applied in key circumstances.
  • Transparent systems enable interactive explanations of model behavior and identify points of failure through data inputs and model predictions.
  • Fair systems must behave equitably across representative subsets, such as subpopulations of users affected by model outputs.
  • Robust systems must be resilient to variations in data inputs.

Unlike solutions that are designed to support model developers during training, our interactive analytics are applied by model users at both the test and evaluation and the deployment and integration stages of the machine learning model development life cycle.

Our analytics focus on auditing model behavior for comparison and selection, confidence analysis, benchmarking and robustness analysis, and interactive investigations of transparency and fairness. The analyst will interact with model inputs and outputs instead of model parameters, as well as operational data, to investigate model behavior and generate a meta report influenced by the analyst’s interactive investigations.

Operational Impact: Informed Trust and Understanding 

Our capabilities allow analysts to answer important questions about model performance that go beyond the F1 score and answer some of the following questions:

  • Will the model work on new data?
  • How often is the model right? For which examples?
  • Is the model making the same mistakes as humans?
  • What are model limitations?
  • How does one choose between multiple models and why?
  • Why should one trust the model?

Our approach enables informed model selection, benchmarking, and comparison for automated machine-learning-driven tools as well as understanding and identifying resilient, unbiased AI models used for situational awareness or insight discovery. Further, our solution:

  • Supports a diverse set of machine learning paradigms, including classification, regression, clustering, and reinforcement learning in supervised, semi-supervised, and unsupervised settings.
  • Evaluates model performance on both labeled and unlabeled data, dynamically changing data streams, and manipulated operational data.