Validation Strategies for Large-Scale Clustering Applications David Dubin Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Champaign, IL 61820 dubin@alexia.lis.uiuc.edu Daniel X. Pape Community Architectures for Network Information Systems Laboratory University of Illinois at Urbana-Champaign Champaign, IL 61820 dpape@canis.uiuc.edu Kohonen's Self-Organizing Map is a very popular clustering method, but few published studies describe the application of well-known cluster validation methods to the SOM. A few studies have compared the SOM to other clustering methods, but usually with assumptions that aren't consistent with the ways SOMs are used in large-scale applications. Furthermore, SOM proponents may have the mistaken impression that problems of scale rule out traditional validation methods. In scaling validation techniques for large SOM applications, the nature of the clusters and how they are apprehended by the SOM are at least as important as algorithm complexity and the availability of computing resources. We describe extensions to existing algorithms for generating artificial test clusters, aimed at supporting validation studies for large-scale SOMs. We propose ways in which other validation techniques (such as those based on the random graph hypothesis) may be employed in large-scale SOM applications.