International Conference on the Frontiers of Statistics: High Dimensional Data Analysis

International Conference on the Frontiers of Statistics: High Dimensional Data Analysis
Yunnan University, Kunming, China
August 13 and 14, 2007

Short Courses - August 11 and 12, 2007
High dimensional data visualization: the matrix visualization approach, half-day, by Chun-Houh Chen, Academia Sinica, Taiwan

[English Version][Chinese Version]

High dimensional data visualization: the matrix visualization approach

Chun-houh Chen                                                Phone:     +886 2 27835611 ext. 407
Associate Research Fellow                                    Fax: +886 2 27831523
Institute of Statistical Science                               Email:     cchen@stat.sinica.edu.tw
Academia Sinica, Taiwan                                       URL: http://gap.stat.sinica.edu.tw/

Graphical exploration for quantitative/qualitative data acts as the initial yet essential step in modern statistical data analysis. All conventional graphical tools have their own limits: Scatterplot Matrix (SM) is useful for visualizing about only twenty variables; Box-Plot (BP) does not provide interactions between variables; Parallel-Coordinate-Plot (PCP) requires extensive conditioning for extracting overall information. Dimension reduction tools such as Principal Component Analysis (PCA) and MultiDimensional Scaling (MDS) also lose effectiveness when it comes to visual exploration of information structure embedded in very high dimensional data sets.

Matrix visualization (MV, Chen (2002); Chen et al. (2004)) on the other hand can simultaneously explore the associations of up to thousands of subjects, variables, and their interactions, without first reducing dimension. MV permutes the rows and columns of the raw data matrix by suitable seriation (reordering) algorithms, together with the corresponding proximity matrices. The permuted raw data matrix and two proximity matrices are then displayed as matrix maps through suitable color spectra, and the subject-clusters, variable-groups, and interactions embedded in the data set can be visually extracted. For binary, ordinal, and nominal data types, SM, BP, and PCP basically can not provide much visual information while MV still gives us comprehensive information about individual profiles for subjects and variables together with the interaction patterns of each subject-cluster on every variable-group.

0. General framework of generalized association plots (GAP) for MV

In this lecture I will first briefly introduce the technical background of MV for continuous, binary, and nominal data types using the Generalized Association Plots (GAP) developed by our laboratory of information visualization. Real applications to scientific problems from biomedical experiments, psychometric studies, and social surveys will then be presented followed by ongoing developments and potential future directions for MV research. Related information and software (currently for continuous and binary data only; we hope to release the nominal version of GAP during the Kunming meeting) can be obtained from http://gap.stat.sinica.edu.tw/. Potential participants are encouraged to download the Java version of GAP with user manual before attending the training course.

Topics to be covered:

1. MV for continuous (SARS) data

　

2. MV for binary data

3. MV for categorical data

4. MV for cartography data with geographic link

5. Covariate-adjusted
6. Multi-level
7. Missing value
8. Proximity Model
9. Nonlinear
10. Canonical
...