Pak Chung Wong
Pacific Northwest National Laboratory
P.O. Box 999, K7-28
Richland, WA 99352
(509) 372-4764
pak.wong@pnl.gov
Pak Chung Wong is a chief scientist and project manager at the Pacific Northwest National Laboratory in Richland, Washington, where he performs research and development on information technology and scientific computation. His research interests include visual analytics, visualization, social computing, bioinformatics, human-computer interaction, privacy and security, and computational science. He received a PhD in computer science from the University of New Hampshire.
o Conference and Symposium Chair – IEEE InfoVis 2006, IEEE VAST 2006, SPIE VDA 2006, SPIE VDA 2005
o Program Chair – IEEE InfoVis 2005, IEEE Vis 2003, IEEE InfoVis 2002, IEEE InfoVis 2001
o Paper Chair – IEEE VAST 2006, IEEE InfoVis 2002, IEEE InfoVis 2001
o Case Studies Chair – IEEE Vis 2002, IEEE Vis 2001
o Publicity Chair – IEEE Vis 2000, IEEE Vis 99
o Symposium Liaison – IEEE Vis 2006, IEEE Vis 2005, IEEE Vis 2002, IEEE Vis 2001, IEEE Vis 2000
o Program Committee – APVIS 2006, CMV 2006, HPC 2006, APVIS 2005, CMV 2005, HPC 2005, I-KNOW'05, IEEE Vis 2005, IEEE InfoVis 2005, IEEE Vis 2004, IEEE InfoVis 2004, IEEE Vis 2003, IEEE InfoVis 2003, SPIE VDA 2003, CMV 2003, KIV03, IEEE InfoVis 2002, SPIE VDA 2002, IEEE InfoVis 2001, SPIE VDEA 2001
o Guest Editor, Information Visualization, Vol. 6, No. 1, Palgrave Macmillan, Mar 2007.
o Co-Editor, Proceedings IEEE Symposium on Visual Analytics Science and Technology (VAST) 2006, IEEE CS Press, Oct 2006.
o Guest Co-Editor, IEEE Computer Graphics and Applications, Vol. 24, No. 5, IEEE CS Press, Sep/Oct 2004.
o Guest Editor, Information Visualization, Vol. 2, No. 1, Palgrave Macmillan, Mar 2003.
o Co-Editor, Proceedings IEEE Symposium on Information Visualization 2002, IEEE CS Press, Oct 2002.
o Co-Editor, Proceedings IEEE Symposium on Information Visualization 2001, IEEE CS Press, Oct 2001.
o Guest Editor, IEEE Computer Graphics and Applications, Vol. 19, No. 5, IEEE CS Press, Sep/Oct 1999.
o Pak Chung Wong, Harlan Foote, Patrick Mackey, Ken Perrine, and George Chin Jr., “Generating Graphs for Visual Analytics through Interactive Sketching,” IEEE Transactions on Visualization and Computer Graphics, Vol. 12, No. 6, Nov-Dec 2006.
o Pak Chung Wong, Harlan Foote, George Chin Jr., Patrick Mackey, and Ken Perrine, “Graph Signatures for Visual Analytics,” IEEE Transactions on Visualization and Computer Graphics, Vol. 12, No. 6, Nov-Dec 2006.
o Pak Chung Wong, Stuart J. Rose, George Chin Jr., Deborah A. Frincke, Richard May, Christian Posse, Antonio Sanfilippo, and Jim Thomas, “Walking the Path—A New Journey to Explore and Discover through Visual Analytics,” Information Visualization, Vol. 5, No. 4, Dec 2006.
o Pak Chung Wong and Jim Thomas, “Visual Analytics,” IEEE Computer Graphics and Applications, Vol. 24, No. 5, Sep 2004.
o Pak Chung Wong, Kwong Kwok Wong, Harlan Foote, and Jim Thomas, “Global Visualization and Alignments of Whole Bacterial Genomes,” IEEE Transactions on Visualization and Computer Graphics, Vol. 9, No. 3, Jul-Sep 2003.
o Pak Chung Wong, “Guest Editor’s Introduction: Special Issue on Selected InfoVis 2003 Papers,” Information Visualization, Vol. 2, No. 1-2, Mar 2003.
o Pak Chung Wong, Kwong Kwok Wong, and Harlan Foote, “Organic Data Memory Using the DNA Approach,” Communications of the ACM, Vol. 46, No. 1, Jan 2003.
o Pak Chung Wong, Harlan Foote, David L. Kao, L. Ruby Leung, and Jim Thomas, “Multivariate Visualization with Data Fusion,” Information Visualization, Vol. 1, No. 3-4, 2002.
o Jim Thomas, Paula Cowley, Olga Kuchar, Lucy Nowell, Judi Thomson, and Pak Chung Wong, “Discovering Knowledge through Visual Analysis,” Journal of Universal Computer Science, Vol. 7, No. 6, Jun 2001.
o Pak Chung Wong, Harlan Foote, Ruby Leung, Dan Adams, and Jim Thomas, “Data Signatures and Visualization of Scientific Data Sets,” IEEE Computer Graphics and Applications, Vol. 20, No. 2, Mar 2000.
o Pak Chung Wong, “Visual Data Mining,” IEEE Computer Graphics and Applications, Vol. 19, No. 5, Sep 1999.
o First place, InfoVis Contest, IEEE Symposium on Information Visualization 2004.
o Data Fusion
· IEEE Computer Graphics and Application (Sep 2004)
o Text Mining
· R&D Magazine (Sep 2004)
o DNA-Based Data Memory (featured in over 40 news articles worldwide)
· Forbes Magazine (Oct 24, 2005)
· New Scientist (Jan 8, 2003)
· America Association for the Advancement of Science (Jan 8, 2003)
· Science Box (Jan 8, 2003)
· Yahoo! News (Jan 8, 2003)
· Cyberpunks (Jan 8, 2003)
· ACM Tech News (Jan 8, 2003)
· PC Zone (Jan 9, 2003, in Dutch)
· Yale Law School Memo (Jan 9, 2003)
· Heise News (Jan 9, 2003, in German)
· Heise News Online (Jan 9, 2003, in German)
· Stephan in Berlin (Jan 9, 2003, in German)
· Cnews (Jan 9, 2003, in Russian)
· Rostov (Jan 9, 2003, in Russian)
· Ural (Jan 9, 2003, in Russian)
· 2000 (Jan 9, 2003, in Ukrainian)
· Frankfurter Allgemeine (Jan 10, 2003, in German)
· Die Welt (Jan 10, 2003, in German)
· Yahoo! Schlagzeilen (Jan 10, 2003, in German)
· Focus (Jan 10, 2003, in German)
· Netzeitung (Jan 10, 2003, in German)
· ZDNet (Jan 10, 2003, in German)
· Index (Jan 10, 2003, in Hungarian)
· Bigmir (Jan 10, 2003, in Russian)
· Mari (Jan 10, 2003, in Russian)
· Open (Jan 10, 2003, in Russian)
· UA Prom (Jan 2003, 10, in Russian)
· SF era (Jan 11, 2003, in Romanian)
· CNI en Linea Tecnociencia (Jan 11, 2003, in Spanish)
· EE2US (Jan 11, 2003)
· Tomorrow (Jan 11, 2003, in German)
· Tomorrow Business (Jan 11, 2003, in German)
· FutureWork (Jan 12, 2003)
· Rheinbach-Szene (Jan 12, 2003, in German)
· Säkerhet & Sekretess (Jan 14, 2003, in Swedish)
· Human Sciences Research Council (Jan 16, 2003)
· BioTIK (Jan 16, 2003, in Danish)
· Nausea Manifesto (Jan 17, 2003)
· Népszabadság (Jan 17, 2003, in Hungarian)
· Rzeczpospolita (Jan 17, 2003, in Polish)
· NZOOM (Jan 19, 2003)
· Klawonn (Jan 20, 2003)
· Süddeutsche Zeitung (Jan 27, 2003, in German)
· Levsha (Jan 31, 2003, in Russian)
· Technology Research News, main article (Jan 31, 2003)
o Whole Genome Alignment
· Sciences News (April 2003)
· America Association for the Advancement of Science (2002)
· Breakthroughs (2002)
o Data Signature
· GlobalTechnoScan (2001)
· DOE Pulse (2001)
o Automatic Document Analysis (1999)
· America Association for the Advancement of Science (1999)
· Science Daily Magazine (1999)
o Link Analysis and Graph Exploration
Pak Chung Wong, Harlan Foote, George Chin Jr., Patrick Mackey, and Ken Perrine, “Graph Signatures for Visual Analytics,” IEEE Transactions on Visualization and Computer Graphics, Vol. 12, No. 5, Sep-Oct 2006.
Abstract: We present a visual analytics technique to explore graphs using the concept of a data signature. A data signature, in our context, is a multidimensional vector that captures the local topology information surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations through repeated use of brushing and linking between the two visualizations. The interpretation of the graph structures is based on the outcomes of multiple participatory analysis sessions with intelligence analysts conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public domain datasets with either well-known or obvious features to explain the rationale of our design and illustrate its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the traditional approach based solely on graph visualization but also the advantages and strengths of our signature-guided approach presented in the paper.
Pak Chung Wong, Harlan Foote, Patrick Mackey, Ken Perrine, and George Chin Jr., “Generating Graphs for Visual Analytics through Interactive Sketching,” IEEE Transactions on Visualization and Computer Graphics, Vol. 12, No. 5, Sep-Oct 2006.
Abstract: We introduce an interactive graph generator, GreenSketch, designed to facilitate the creation of descriptive graphs required for different visual analytics tasks. The human-centric design approach of GreenSketch enables users to master the creation process without specific training or prior knowledge of graph model theory. The customized user interface encourages users to gain insight into the connection between the compact matrix representation and the topology of a graph layout when they sketch their graphs. Both the human-enforced and machine-generated randomnesses supported by GreenSketch provide the flexibility needed to address the uncertainty factor in many analytical tasks. This paper describes over two dozen examples that cover a wide variety of graph creations from a single line of nodes to a real-life small-world network that describes a snapshot of telephone connections. While the discussion focuses mainly on the design of GreenSketch, we include a case study that applies the technology in a visual analytics environment and a usability study that evaluates the strengths and weaknesses of our design approach.
Pak Chung Wong, Patrick Mackey, Ken Perrine, James Eagan, Harlan Foote, and Jim Thomas, “Dynamic Visualization of Graphs with Extended Labels,” Proceedings IEEE Symposium on Information Visualization 2005, October 2005.
Abstract: The paper describes a novel technique to visualize graphs with extended node and link labels. The lengths of these labels range from a short phrase to a full sentence to an entire paragraph and beyond. Our solution is different from all the existing approaches that almost always rely on intensive computational effort to optimize the label placement problem. Instead, we share the visualization resources with the graph and present the label information in static, interactive, and dynamic modes without the requirement for tackling the intractability issues. This allows us to reallocate the computational resources for dynamic presentation of real-time information. The paper includes a user study to evaluate the effectiveness and efficiency of the visualization technique.
Pak Chung Wong, Ken Perrine, Patrick Mackey, Harlan Foote, and Jim Thomas, “Visual Analytics and Storytelling through Video,” IEEE Symposium on Information Visualization 2005 Video Track, October 2005.
Abstract: This paper supplements a video clip submitted to the Video Track of IEEE Symposium on Information Visualization 2005. The original video submission applies a two-way storytelling approach to demonstrate the visual analytics capabilities of a new visualization technique. The paper presents our video production philosophy, describes the plot of the video, explains the rationale behind the plot, and finally, shares our production experiences with our readers.
o Bio-Molecular Systems
Pak Chung Wong, Kwong Kwok Wong, and Harlan Foote, “Organic Data Memory Using the DNA Approach,” Communications of the ACM, Vol. 46, No. 1, Jan 2003.
Introduction: A data preservation problem looms large behind today’s information superhighway. Ancient humans preserved their knowledge by engraving bones and rocks. About two millennia ago, people invented paper to publish their thoughts. Today we use magnetic media and silicon chips to store our data. But bones and rocks erode, paper disintegrates, and electronic memory ultimately degrades. All these storage media require constant attention to maintain their information content. All are easily destroyed by people and natural disasters, whether intentionally or accidentally. In light of the vast amount of information being generated everyday, it’s time to consider a new medium.
Pak Chung Wong, Kwong Kwok Wong, Harlan Foote, and Jim Thomas, “Global Visualization and Alignments of Whole Bacterial Genomes,” IEEE Transactions on Visualization and Computer Graphics, Vol. 9, No. 3, Jul-Sep 2003.
Abstract: We present a novel visualization technique to align whole bacterial genomes with millions of nucleotides. Our basic design combines the descriptive power of pixel-based visualizations with the interpretative strength of digital image-processing filters. The innovative use of pixel enhancement techniques on pixel-based visualizations brings out the best of the recursive data patterns and further enhances the effectiveness of the visualization techniques. The result is a fast, versatile, and cost-effective analysis tool to reveal hidden structures that might lead to the discovery of functional identifications as well as phenotypic changes of whole bacterial genomes. Nine different whole bacterial genomes obtained from public genome banks are used to demonstrate our designs and prove their viability. Although the design of the new visualization technique is targeted at analyzing genomic sequences, we show with examples that it can be used to study other types of sequential datasets with a priori orders.
o Visualization and Data Mining
Pak Chung Wong, Harlan Foote, Wendy Cowley, and Jim Thomas, “Dynamic Visualization of Transient Data Streams,” Proceedings IEEE Information Visualization 2003, Oct 2003.
Abstract: We introduce two dynamic visualization techniques using multi-dimensional scaling to analyze transient data streams such as newswires and remote sensing imagery. While the time-sensitive nature of these data streams requires immediate attention in many applications, the unpredictable and unbounded characteristics of this information can potentially overwhelm many scaling algorithms that require a full re-computation for every update. We present an adaptive visualization technique based on data stratification to ingest stream information adaptively when influx rate exceeds processing rate. We also describe an incremental visualization technique based on data fusion to project new information directly onto a visualization subspace spanned by the singular vectors of the previously processed neighboring data. The ultimate goal is to leverage the value of legacy and new information and minimize re-processing of the entire dataset in full resolution. We demonstrate these dynamic visualization results using a newswire corpus and a remote sensing imagery sequence.
Pak Chung Wong, Wendy Cowley, Harlan Foote, Elizabeth Jurrus, and Jim Thomas, “Visualizing Sequential Patterns for Text Mining,” Proceedings IEEE Information Visualization 2000, Oct 2000.
Abstract: A sequential pattern in data mining is a finite series of elements such as A ® B ® C ® D where A, B, C, and D are elements of the same domain. The mining of sequential patterns is designed to find patterns of discrete events that frequently happen in the same arrangement along a timeline. Like association and clustering, the mining of sequential patterns is among the most popular knowledge discovery techniques that apply statistical measures to extract useful information from large datasets. As our computers become more powerful, we are able to mine bigger datasets and obtain hundreds of thousands of sequential patterns in full detail. With this vast amount of data, we argue that neither data mining nor visualization by itself can manage the information and reflect the knowledge effectively. Subsequently, we apply visualization to augment data mining in a study of sequential patterns in large text corpora. The result shows that we can learn more and more quickly in an integrated visual data-mining environment.
Pak Chung Wong, Paul Whitney, and Jim Thomas, “Visualizing Association Rules for Text Mining,” Proceedings IEEE Symposium on Information Visualization 99, Oct 1999.
Abstract: An association rule in data mining is an implication of the form X ® Y where X is a set of antecedent items and Y is the consequent item. For years researchers have developed many tools to visualize association rules. However, few of these tools can handle more than dozens of rules, and none of them can effectively manage rules with multiple antecedents. Thus, it is extremely difficult to visualize and understand the association information of a large data set even when all the rules are available. This paper presents a novel visualization technique to tackle many of these problems. We apply the technology to a text mining study on large corpora. The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a three-dimensional display with minimum human interaction, low occlusion percentage, and no screen swapping.
Pak Chung Wong, “Visual Data Mining,” IEEE Computer Graphics and Applications, Vol. 19, No. 5, Sep 1999.
Introduction: Seeing is knowing, though merely seeing is not enough. When you understand what you see, seeing becomes believing. A while ago scientists discovered that seeing and understanding together enable humans to glean knowledge and even deeper insight from large amounts of data. The approach is to integrate the exploration abilities of the human mind with the enormous processing power of computers to form a powerful knowledge discovery environment that capitalizes on the best of both worlds. The technology is based upon visual and analytical processes developed in various disciplines including scientific visualization, data mining, statistics, and machine learning with custom extensions that deal with very large multidimensional multivariate datasets. The methodology is based upon functionality to characterize structures and display data, as well as human capabilities to perceive patterns, exceptions, trends, and relationships. The purpose of this article is to define our vision, present the state of the art, and discuss the future of a young discipline that we call Visual Data Mining.
Nancy E. Miller, Pak Chung Wong, Mary Brewster, and Harlan Foote, “TOPIC ISLANDS—A Wavelet-Based Text Visualization System,” Proceedings IEEE Visualization '98, Oct 1998.
Abstract: We present a novel approach to visualize and explore unstructured text. The underlying technology, called TOPIC-O-GRAPHYTM, applies wavelet transforms to a custom digital signal constructed from words within a document. The resultant multiresolution wavelet energy is used to analyze the characteristics of the narrative flow in the frequency domain, such as theme changes, which is then related to the overall thematic content of the text document using statistical methods. The thematic characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized text partitions to partitions consisting of a few words. Using this technology, we are developing a visualization system prototype known as TOPIC ISLANDSTM to browse a document, generate fuzzy document outlines, summarize text by levels of detail and according to user interests, define meaningful subdocuments, query text content, and provide summaries of topic evolution.
o Scientific Databases and Climate Modeling
Pak Chung Wong, , Harlan Foote, David Kao, Ruby Leung, and Jim Thomas, “Multivariate Visualization with Data Fusion,” Information Visualization, Vol. 1, No. 3/4, Dec 2002, Palgrave Macmillan.
Abstract: We discuss a fusion-based visualization method to analyze a multivariate climate dataset and its metadata. The primary difference between a conventional visualization and a fusion-based visualization is that the former draws on a single image whereas the latter draws on multiple see-through layers, which are then overlaid on each other to form the final visualization. We propose optimized colormaps to highlight subtle features that would not be shown with conventional colormaps. We present fusion techniques that integrate multiple single-purpose visualization techniques into the same viewing space. Our highly flexible fusion approach allows scientists to explore multiple parameters concurrently by mixing and matching images without frequently reconstructing new visualizations from the data for every possible combination. Although our primary visualization application is climate modeling, we show with examples that our fundamental design—fusing layers of data images for multivariate visualization—can be generalized for other information visualization applications.
Pak Chung Wong, Harlan Foote, Ruby Leung, Elizabeth Jurrus, Dan Adams, and Jim Thomas, “Vector Fields Simplification—A Case Study of Visualizing Climate Modeling and Simulation Data Sets,” Proceedings IEEE Visualization 2000, Oct 2000.
Abstract: In our study of regional climate modeling and simulation, we frequently encounter vector fields that are crowded with large numbers of critical points. A critical point in a flow is where the vector field vanishes. While these critical points accurately reflect the topology of the vector fields, in our study only a subset of them is worth further investigation. We present a filtering technique based on the vorticity of the vector fields to eliminate the less interesting and sometimes sporadic critical points in a multiresolution fashion. The neighboring regions of the preserved features, which are characterized by strong shear and circulation, are potential locations of weather instability. We apply our feature-filtering technique to a regional climate modeling data set covering East Asia in the summer of 1991.
Pak Chung Wong, Harlan Foote, Ruby Leung, Dan Adams, and Jim Thomas, “Data Signatures and Visualization of Very Large Datasets,” IEEE Computer Graphics and Applications, Vol. 20, No. 2, Mar 2000.
Introduction: Today, as datasets used in computations grow in size and complexity, the technologies we have developed over the years to deal with scientific datasets have become less efficient and effective. Many frequently used operations such as Eigenvector computation could quickly exhaust our desktop workstations once the data size reaches certain limits. The variety of high dimensional datasets we collect everyday, on the other hand, do not help relieve the problem. Many conventional metric designs that build on quantitative or categorical datasets cannot be applied directly to heterogeneous datasets with multiple data types. While building new machines with more resources might conquer the data size problems, the complexity of today’s computations requires a new breed of projection techniques to support analysis of the data and verification of the results. We introduce the concept of a data signature, which captures the essence of a scientific dataset in a compact format, and use it to conduct analysis as if using the original. A time-dependent climate simulation dataset is used to demonstrate our approach and present the results.