A Very Brief Summary of The Self-Organizing Map

A Self-Organizing Map (SOM) is a neural network algorithm introduced by Prof. Teuvo Kohonen in 1982. It is used to categorize and interpret large, high-dimensional data sets. Briefly, it does this by mapping n-dimensional data points that are similar to each other onto nearby regions of a q-dimensional space; q is usually much smaller than n, and is also usually 2.

The SOM algorithm relies on two main things: the input data set, and the output data set, or map.

Input Data

Each data item has associated with it an n-length vector of elements. These elements are commonly called features, attributes, or properties of the data. Usually these elements are numerical for computation purposes, but non-numerical attributes can be encoded easily.

Output Map

The map is an array of nodes (also called neurons). This array is usually two-dimensional, but could be of higher order; it is often laid out in a rectangular or hexagonal lattice. Each node has an associated reference vector of the same size as each input feature vector. It is to these reference vectors that the input vectors are compared.


The simplest description of the SOM algorithm is this:

The reason for Step 4 is to make the winning node and the nodes in the winning neighborhood respond more favorably to inputs similar to the input vector. It is in this way that topologically close regions of the output map gain an affinity for clusters of similar data vectors.

Steps 1-4 are repeated for all the input vectors. This is called an epoch, or a time-step. There are different views on how many time-steps the algorithm should be run for. While the optimal number varies for each application, anywhere between 1000 and 100,000 epochs is normal.


Daniel X. Pape
Last modified: Sat Apr 4 18:30:40 CST 1998