5.3 Comparison of Seriation Algorithms with Iris Data
In this section, Iris data (Fisher 1936) is used to compare the performance of the proposed seriations with several commonly used sorting algorithms. The target proximity matrix is the Euclidean distance matrix of the 150 iris flowers on four variables. Two conventional seriation algorithm sets,
(a).farthest and nearest insertion spanning tour,
(b). single, complete, and average linkage clustering trees, are compared with the proposed set,
(c).rank2 ellipse, rank1 tree, and rank1&2 double ellipse.
The conventional algorithms are discussed in Minnotte and West (1998). The three permuted distance maps are displayed in Figure 7. A clear near Robinson pattern can be identified in Figure 7h, using the proposed rank1&2 algorithms. Figure 7a with farthest insertion spanning algorithm on the other hand only approach the Robinson pattern in local regions. Performance of the average clustering tree in Figure 7f stands in between the spanning tours and the proposed algorithms.
Figure 7. Permuted Euclidean Distance Maps for Iris Data with Conventional and Proposed Seriation Algorithms.
For a numerical comparison, three antiRobinson loss functions
(Streng, 1978) are calculated for each permuted matrix, D=[], for the amount of deviation from a Robinson form with distancetype
poximity:
AR(i) counts only the number of antiRobinson events occur in the permuted matrix. AR(s) sums up all the absolute value of antiRobinson deviations. AR(w) is a weighted version of AR(s) penalized by the difference of column indices of two entries. The result is summarized in Table 1. Clearly the proposed algorithms outperform the conventional methods with a significan margin in the Robinsonian sense.
Seriation Algorithm \ Loss Fun. 
AR(i) 
AR(s) 
AR(w) 
MS 
(a) Farthest Insertion Spanning 
339,392 
2,391,228.7 
75,265,472.0 
530.5 
(b) Nearest Insertion Spanning 
344,117 
2,594,751.6 
86,392,680.8 
557.4 
(c) Single Linkage Clustering 
192,180 
603,196.8 
9,822,063.2 
673.8 
(d) Complete Linkage Clustering 
153,508 
405,983.3 
6,233,901.7 
568.8 
(e) Average Linkage Clustering 
148,950 
381,679.8 
4,576,913.2 
558.8 
(f) RankOneTree 
86,367 
166,953.6 
1,613,008.1 
625.5 
(g) RankTwoEllipse 
91,411 
186,148.0 
2,098,849.9 
883.1 
(h) Rank1&2 Double Ellipse 
83,217 
146,115.5 
1,602,892.1 
789.5 
Table 1. AntiRobinson Deviations for the Permuted Distance Matrices for Iris data.
The last column in Table 1 displays the amount of minimal span loss function for each permuted matrix,
. This is the object to be minimized in a minimal spanning algorithm such as the traveling salesman problem. The idea is to find a shortest path through all data points. The major concern is on the optimization for local structures only which is different from the search of global structure in the Robinson problem. If only the local pattern is of concern then the conventional methods permute the matrix better than the proposed algorithms, but the difference is not as significant as that in the Robinson setup.
Reference:

Fisher, R. A. (1936), "The Use of Multiple Measurements in Taxonomic Problems," Ann.
Eugen., 7, 179188.

Minnotte, M., and West, R. W. (1998), "The Data Image: A tool for exploring high dimensional data sets," 1998 Proceedings of the Section on Statistical Graphics, American Statistical Association.
 Streng, R. (1991) "Classification and Seriation by Iterative Reordering of a Data Matrix," In Classification, Data Analysis, and Knowledge Organization: Models and Methods with Applications, 121130, edited by Bock, H. H., and
