Main || CV || Publications || Software || Visuals and Animations
I work with large datasets a lot, and find it useful to visualize them. A good example of a large matrix is the correlation matrix of copy number measurements at 14,556 markers and expression of 14,556 genes. I used two separate tools to visualize and study such a huge matrix.
Google Maps based. First visualization of the correlation matrix is based on the Google Maps engine. It is written in Javascript and works on almost any platform. Silverlight based. Another visualization of the correlation matrix is based on an existing silverlight application, DH view SL, originally developed for viewing large, stitched panoramic images. |
This series of 5 gif aninmations
illustrates the process of k-means clustering. It clearly shows how an unlucky choice of starting points can lead to a strongly suboptimal choice of clusteers.
This multipage PDF illustrates a more efficient version of k-means clustering called k-means++. It uses weighted seeding of the starting points. D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. |
This gif animation illustrates an O(n) algorithm for construction of the Greatest Convex Minorant for a given set of points (or a piece-wise linear function). |
This gif animation illustrates the key idea behind
the algorithm for construction of the nearest unimodal distribution for a given one.
The solution is the nearest unimodal distribution
(minimizes Kolmogorov–Smirnov distance) to a given one. J. A. Hartigan and P. M. Hartigan. The Dip Test of Unimodality. The Annals of Statistics Vol. 13, No. 1 (Mar., 1985), pp. 70-84 |
This gif animation
illustrates the Hilbert curve construction for n = 7. For better performance the animation shows each 19-th frame of the original 16,384 frames of full animation.
More about Hilbert curve at wikipedia.org. |
This gif animation illustrates how sample histograms become smoother and smoother as the sample size grows. For huge sample sizes the histogram is indistinguishable from a density plot. |
Main || CV || Publications || Software || Visuals and Animations