April 2nd 2008

## Dimensionality reduction: explicit optimization of a cost function

Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, … All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.

Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the “feature” space. Every metric MDS can be used as stress functions, here are some of them.

The oldest function is Sammon’s NonLinear Mapping. Originally based on Euclidien distances, I implemented it with the approximated geodesic distances described in the Isomap ticket. The goal of this function is to add a weight (the inverse of the geodesic distance), leading to less weight for the greatest distances, but also an important weight for small distances.

Here is the cost function for the distances (**y** are the coordinates in the original space and **x** in the feature/reduced space) :

Optimizing this function with a conjugate-gradient descent from a random start can give this result :

Another function that is present and cited in the litterature is Desmartines’ one from the Curvilinear Component Analysis :

The F() function is 1 when the argument is small (less than an arbitrary value lambda), and else 0.

As a consequence, this function is not convex, not even continuous. The algorithm proposed in the associated paper is not great, I never managed to make it work on a SwissRoll, even with few points. So here are the step I use to optimize the function :

- Start from a random position
- Optimize 10 points (lambda infinite)
- Optimize one more point :
- Move only this point with lambda infinite (gradient descent)
- Move every point with a decreasing lambda (I start with lambda = greatest geodesic distance in the data set and linearly decrease it until lambda is the limit of 5% of he smallest geodesic distances)

Each time, the new point is moved according to every already placed point, then when every point is moving, only the local stresses are used. But the optimization can still go wrong and some points that should be close can end far one from another, because their associated stress is zero.

Here is the result for this optimization :

The cost function I use is a robust one, not “recursive” as Desmartines qualifies it (the weight is not a function of the estimated distance as it is the case from the CCA cost function) :

The first term is the robust term, derivable when the (geodesic estimated and Euclidien computed) distances are equals, the second term allows for a fast convergence when the distances are not correctly estimated (useful at the beginning of the optimization, less afterwards) and the last term gives a small weight for small distances, as they can be polluted by noise for noisy manifolds. Gamma should only be a small value, Tau is set to be equal to 80% of the geodesic distances and sigma to 5%. This gives good results in every case.

Here is the result with this cost function :

Its optimization is not easy as it can give folded reduced space as an answer. I proposed two algorithms to solve the issue :

The first one :

- Optimize every point
- Add some noise to the computed coordinates depending on the global cost and the iteration (at the end, the added noise must be very small)
- Optimize with a simple gradient descent and a Fibonacci line search

The second one :

- Optimize 10 points with a gradient descent from a random start
- Optimize with one new point
- Move only this point with a gradient descent
- Move every points

The second algorithm is slower than the first, but it works every time.

Stay tuned for the results…

Tags: CCA, Curvilinear Component Analysis, Dimensionality reduction, Manifold learning, metric MDS, NLM, NonLinear Mapping, robust cost function, SammonNo Comments yet »