Archive for the 'Manifold learning' Category

July 14th 2008

Dimensionality reduction: mapping the reduced space into the original space

Once the data set is reduced (see my first posts if you’re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:

  • you can interpolate the data in the original space based on an interpolation in the reduced space, or
  • you create an approximation of the mapping with a multidimensional function (B-splines, …)

When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.

The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).

Continue Reading »

No Comments yet »

June 27th 2008

Dimensionality reduction: the scikit is available !

My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..

I’ve written a small tutorial. It is not an usual tutorial (there is a user tutorial and then what developers should know to enhance it), and some results of the techniques are exposed in my blog. It provides the basic commands to start using the scikit yourself (reducing some data, projecting new points, …) as well as the expoed interface to enhance the scikit.

If you have any question, feel free to ask me, I will add the answers to the tutorial page so that everyone can benefit from it.

Be free to contribute new techniques and additional tools as well, I cannot write them all ! For instance, the scikit lacks some robust neighbors selection to avoid short-cuts in the manifold…

Tutorial and the learn scikit mainpage.

3 Comments »

June 11th 2008

A Metric Multidimensional Scaling-Based Nonlinear Manifold Learning Approach for Unsupervised Data Reduction

At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that’s why I waited before publishing this post).
The journal is free, so you won’t have to pay to read it : Access to the EURASIP JASP article

I will publish additional figures here in a short time. The scikit is almost completed as well, I’m finishing the online tutorial for those who are interested in using it and/or enhancing it.

2 Comments »

April 23rd 2008

Dimensionality reduction: comparison of different methods

I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well known techniques.

First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Froebenius) norm between the original coordinates and the computed one up to an affine transform of the latter.
Continue Reading »

2 Comments »

April 2nd 2008

Dimensionality reduction: explicit optimization of a cost function

Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, … All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.

Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the “feature” space. Every metric MDS can be used as stress functions, here are some of them.

Continue Reading »

No Comments yet »

March 3rd 2008

Some news about the manifold learning scikit

I got the word today that my paper was accepted, so I can now focus on delivering the code.

I’m in the process of refactoring it so that it depends less on some of our libraries here. In two weeks, there is a nipy sprint in Paris I will attend, and machine learning is one of the topic we will discuss, so this may indicate where and how I’ll contribute the code I will keep going on showing some results next week.

2 Comments »

February 18th 2008

Dimensionality reduction: Locally Linear Embedding

One of the most cited algorithm in nonlinear manifold learning, with Isomap, is LLE. Contrary to Isomap, LLE tries to retain the local data structure of the sampled manifold. Whereas Isomap preserves absolute distances, LLE preserves local relative distances (it preserves barycenter weights).
This means that LLE is not suitable for every dimensionality reductions. For visualization purposes, it can lead to very different solutions if the manifold is noisy.
Continue Reading »

2 Comments »

February 4th 2008

Dimensionality reduction: Principal Components Analysis

Before going into more details about nonlinear manifold learning, I’ll present the linear description that is used in most of the applications.
PCA, for Principal Components Analysis, is the other name for the Karhunen-Loeve transform. It aims at describing the data by a single linear model. The reduced space is the space on the linear model, it is possible to project a new point on the manifold and thus testing the belonging of point to the manifold.
The problem with PCA is that it cannot tackle nonlinear manifold, as the SwissRoll that was presented in my last item.
Continue Reading »

No Comments yet »

January 25th 2008

Dimensionality reduction: Isomap

Isomap is one of the “oldest” tools for dimensionality reduction. It aims at reproducing geodesic distances (geodesic distances are a property of Riemanian manifolds) on the manifold in an Euclidiean space.
To compute the approximated geodesic distances, a graph is created, an edge linking two close points (K-neighboors or Parzen windows can be used to choose the closest points) with its weight being the Euclidean distance between them. Then, a square matrix is computed with the shortest path between two points with a Dijkstra or Floyd-Warshall algorithm. This follows some distance and Riemanian manifolds properties. The number of points is generally chosen based on the estimated distance on the manifold.
Finally, an classical MDS procedure is performed to get a set of coordinates.
Continue Reading »

2 Comments »

January 15th 2008

More on manifold learning

I hope to present here some result in February, but I’ll expose what I’ve implemented so far :

  • Isomap
  • LLE
  • Laplacian Eigenmaps
  • Hessian Eigenmaps
  • Diffusion Maps (in fact a variation of Laplacian Eigenmaps)
  • Curvilinear Component Analysis (the reduction part)
  • NonLinear Mapping (Sammond)
  • My own technique (reduction, regression and projection)
  • PCA (usual reduction, but robust projection with an a priori term)

The results I will show here are mainly reduction comparison between the techniques, knowing that each technique has a specific field of application : LLE is not made to respect the geodesic distances, Isomap, NLM and my technique are.

3 Comments »

Next »