This entry is part 2 of 2 in the series Deep adventures

A few weeks ago, on StackOverflow, a user asked for an accuracy measure on the embedded space for an autoencoder. This was with Keras, but I thought it would be a nice exercise for Tensorflow as well.

The idea in this case is to add a few layers to the embedded space to create a classifier and measure its accuracy while we optimize the autoencoder.

We will train the autoencoder in alternation with the classifier. When one is updated, the other will be frozen, and then we can measure classification accuracy and reconstruction loss concurrently in Tensorboard.

Read More

This entry is part 1 of 2 in the series Deep adventures

A few year ago, Packt Publishing contacted to be a technical reviewer for the first edition of Building Machine Learning Systems with Python, and I was impressed by the writing of Luis Pedro Coelho and Willi Richert. For the second edition, I was again a technical reviewer.

Writing is not easy, especially when it’s not your mother tongue, and scientific books are plagued with books that are not that great, with low technical content or bad English (that can be said for novels as well, the worst I ever read probably being the Hunger games series…). Even if I don’t like the books, I know that the authors did their best, having written in the past a book that I can say was not very great in terms of flow. Writing a book always deserves the deepest respect.

Read More

It’s been a while since I last blogged about manifold learning. I don’t think I’ll add much in terms of algorithms to the scikit, but now that a clear API is being defined (http://sourceforge.net/apps/trac/scikit-learn/wiki/ApiDiscussion), it’s time for the manifold module to comply to it. Also, documentation will be enhanced and some dependencies will be removed.

I’ve started a branch available on github.com, and I will some examples in the scikit as well. I may explain them here, but I won’t rewrite what is already published. A future post will explain the changes, and I hope that interested people will understand the modifications and apply them to my former posts. It’s just that I don’t have much time to change everything…

It has been a while since my last post on manifold learning, and I still have some things to speak about (unfortunately, it will be the end post of the dimensionality reduction series on my blog, as my current job is not about this anymore). After the multidimensional regression, it is possible to use it to project new samples on the modelized manifold, and to classify data.

Read More

Once the data set is reduced (see my first posts if you’re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:

  • you can interpolate the data in the original space based on an interpolation in the reduced space, or
  • you create an approximation of the mapping with a multidimensional function (B-splines, …)

When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.

The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).

Read More

My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..

I’ve written a small tutorial. It is not an usual tutorial (there is a user tutorial and then what developers should know to enhance it), and some results of the techniques are exposed in my blog. It provides the basic commands to start using the scikit yourself (reducing some data, projecting new points, …) as well as the expoed interface to enhance the scikit.

If you have any question, feel free to ask me, I will add the answers to the tutorial page so that everyone can benefit from it.

Be free to contribute new techniques and additional tools as well, I cannot write them all ! For instance, the scikit lacks some robust neighbors selection to avoid short-cuts in the manifold…

Tutorial and the learn scikit mainpage.

Buy Me a Coffee!
Other Amount:
Your Email Address:

At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that’s why I waited before publishing this post).
The journal is free, so you won’t have to pay to read it : Access to the EURASIP JASP article

I will publish additional figures here in a short time. The scikit is almost completed as well, I’m finishing the online tutorial for those who are interested in using it and/or enhancing it.

I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well-known techniques.

First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Frobenius) norm between the original coordinates and the computed one up to an affine transform of the latter.

Read More

Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.

It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.

Read More