In this project I did at TKM, the goal was to display patents on a topological map. This example shows a map build with a corpus that is about the prion protein. The challenge here was to decompose documents from N dimension to 2 dimensions without a too big loss of information.
We first decompose documents into words with a custom weight. Then, we perform a Dimensional Reduction of our document representation in order to display it on a 2D map. And finally, we display it using a topological map based on the density of points. We develop the rendering component build upon native WebGL.
During this project, we have tried several multidimensional reduction techniques (t-SNE, MDS, LLA, LDA) to compare visualization output. We finally choose Self Organizing Maps (SOM) algorithm because this one doesn't need to handle all data at once, but associates a weight with each document to display it on a 2D grid.