Archive for the 'Python' Category

July 14th 2008

Dimensionality reduction: mapping the reduced space into the original space

Once the data set is reduced (see my first posts if you’re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:

  • you can interpolate the data in the original space based on an interpolation in the reduced space, or
  • you create an approximation of the mapping with a multidimensional function (B-splines, …)

When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.

The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).

Continue Reading »

No Comments yet »

June 27th 2008

Dimensionality reduction: the scikit is available !

My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..

I’ve written a small tutorial. It is not an usual tutorial (there is a user tutorial and then what developers should know to enhance it), and some results of the techniques are exposed in my blog. It provides the basic commands to start using the scikit yourself (reducing some data, projecting new points, …) as well as the expoed interface to enhance the scikit.

If you have any question, feel free to ask me, I will add the answers to the tutorial page so that everyone can benefit from it.

Be free to contribute new techniques and additional tools as well, I cannot write them all ! For instance, the scikit lacks some robust neighbors selection to avoid short-cuts in the manifold…

Tutorial and the learn scikit mainpage.

3 Comments »

June 11th 2008

A Metric Multidimensional Scaling-Based Nonlinear Manifold Learning Approach for Unsupervised Data Reduction

At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that’s why I waited before publishing this post).
The journal is free, so you won’t have to pay to read it : Access to the EURASIP JASP article

I will publish additional figures here in a short time. The scikit is almost completed as well, I’m finishing the online tutorial for those who are interested in using it and/or enhancing it.

2 Comments »

May 6th 2008

Book review: From P2P to Web Services and Grids: Peers In A Client/Server World

I was looking for an introductory book on peer-to-peer (P2P) application and their application to grid computation. Web services was a bonus, as it is something I don’t usually play with.
Continue Reading »

No Comments yet »

April 30th 2008

My favorite design pattern in Python

I’ve noticed some days ago that I mainly used one design pattern in my scientific (but not only) code, the registry. How does it work? A registry is a list/dictionary/… of objects, applications add a new entry if it is needed, and then a user can tap into the registry to find the most adequate object for one’s purpose.

Continue Reading »

No Comments yet »

April 28th 2008

Book review: Twisted: Networking Programming Essentials

This book is different from the two last books I read. Indeed, it tackles a specific Python library, Twisted, and how to use it.
Continue Reading »

1 Comment »

April 23rd 2008

Dimensionality reduction: comparison of different methods

I’ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well known techniques.

First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Froebenius) norm between the original coordinates and the computed one up to an affine transform of the latter.
Continue Reading »

2 Comments »

April 21st 2008

Book review: Tools and environments for parallel and distributed computing

After Advanced Computer Architecture and Parallel Processing, I’m going to review another book from the same serie. As the title hints it, the goal of this book is to introduce the tools that may be used in parallel, grid and distributed computing. This is the layer above the architecture the last book presented.
Continue Reading »

No Comments yet »

April 11th 2008

Book review: Advanced Computer Architecture and Parallel Processing

This is my first review. I read this book some time ago but I still want to write about it because the topic is very interesting.
Continue Reading »

2 Comments »

April 8th 2008

Parallel computing in large-scale applications

In March 2008 issue, IEEE Computers published a case study on large-scale parallel scientific code development. I’d like to comment this article, a very good one in my mind.

Five research centers were analyzed, or more precisely their development tool and process. Each center did a research in a peculiar domain, but they seem share some Computational Fluid Dynamics basis.

Continue Reading »

No Comments yet »

Next »