<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>Matthieu Brucher&#039;s blog &#187; Manifold learning</title> <atom:link href="http://matt.eifelle.com/category/python/manifold-learning/feed/" rel="self" type="application/rss+xml" /><link>http://matt.eifelle.com</link> <description></description> <lastBuildDate>Tue, 27 Jul 2010 07:04:23 +0000</lastBuildDate> <generator>http://wordpress.org/?v=2.9.1</generator> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Dimensionality reduction: Refactoring the manifold module</title><link>http://matt.eifelle.com/2010/07/13/dimensionality-reduction-refactoring-the-manifold-module/</link> <comments>http://matt.eifelle.com/2010/07/13/dimensionality-reduction-refactoring-the-manifold-module/#comments</comments> <pubDate>Tue, 13 Jul 2010 07:05:40 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[scikit]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=1290</guid> <description><![CDATA[It&#8217;s been a while since I last blogged about manifold learning. I don&#8217;t think I&#8217;ll add much in terms of algorithms to the scikit, but now that a clear API is being defined (http://sourceforge.net/apps/trac/scikit-learn/wiki/ApiDiscussion), it&#8217;s time for the manifold module to comply to it. Also, documentation will be enhanced and some dependencies will be removed.
I&#8217;ve [...]]]></description> <content:encoded><![CDATA[<p>It&#8217;s been a while since I last blogged about manifold learning. I don&#8217;t think I&#8217;ll add much in terms of algorithms to the scikit, but now that a clear API is being defined (<a
href="http://sourceforge.net/apps/trac/scikit-learn/wiki/ApiDiscussion">http://sourceforge.net/apps/trac/scikit-learn/wiki/ApiDiscussion</a>), it&#8217;s time for the manifold module to comply to it. Also, documentation will be enhanced and some dependencies will be removed.</p><p>I&#8217;ve started a branch available on <a
href="http://github.com/mbrucher/scikit-learn">github.com</a>, and I will some examples in the scikit as well. I may explain them here, but I won&#8217;t rewrite what is already published. A future post will explain the changes, and I hope that interested people will understand the modifications and apply them to my former posts. It&#8217;s just that I don&#8217;t have much time to change everything&#8230;</p>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2010/07/13/dimensionality-reduction-refactoring-the-manifold-module/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Dimensionality reduction: projection and classification</title><link>http://matt.eifelle.com/2009/01/08/dimensionality-reduction-projection-and-classification/</link> <comments>http://matt.eifelle.com/2009/01/08/dimensionality-reduction-projection-and-classification/#comments</comments> <pubDate>Thu, 08 Jan 2009 08:09:55 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Classification]]></category> <category><![CDATA[Dimensionality reduction]]></category> <category><![CDATA[Multidimensional regression]]></category> <category><![CDATA[projection]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=244</guid> <description><![CDATA[It has been a while since my last post on manifold learning, and I still have some things to speak about (unfortunately, it will be the end post of the dimensionality reduction series on my blog, as my current job is not about this anymore). After the multidimensional regression, it is possible to use it [...]]]></description> <content:encoded><![CDATA[<p>It has been a while since my last post on manifold learning, and I still have some things to speak about (unfortunately, it will be the end post of the dimensionality reduction series on my blog, as my current job is not about this anymore). After the <a
href="http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/">multidimensional regression</a>, it is possible to use it to project new samples on the modelized manifold, and to classify data.<br
/> <span
id="more-244"></span></p><h4>Projection</h4><p>Finding the projection of a point on the manifold can be done by searching in the reduced space the point that will minimize some criterion. As the dimension of this space is small, this can be efficiently done.</p><p>Another way is to use the multidimensional regression. It consists of several linear models, so one can project the new sample on each model and choose the best one. This can lead to somewhat different projections than the search in the reduced space. Indeed, a projection on one model can in fact be reconstructed by another model. We did not do this, because we&#8217;ve added another way of ensuring that we are coosing a good projection. The reason is that to correctly project on each model, you have to track on which model you currently are, which means optimizing a function that is not smooth, which is more difficult and thus longer.</p><p>So we are using two cost functions, one is simply the <em>Maximum Likelihood</em>, so for a Gaussian random variable it is an orthogonal projection. But as figure 1 tells us, sometime, the projection is not on the manifold. So to enhance this result, we use a <em>Maximum A Posteriori</em> function, which adds a regularization term. It simply is a Gaussian Mixture in the reduced space, one Gaussian for each learning point labelled to the considered model. This way, the projection will be attracted to the subspace where the model is used to reconstruct points.</p><p><center><div
id="attachment_269" class="wp-caption aligncenter" style="width: 160px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/ml-map.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/ml-map-150x150.png" alt="Different kind of projections" title="Different kind of projections" width="150" height="150" class="size-thumbnail wp-image-269" /></a><p
class="wp-caption-text">Figure 1: Different kind of projections</p></div></center></p><p>I&#8217;ve projected new samples of the SwissRoll and the SCurve with both methods. The SwissRoll shows that projecting points on models without a regularization feature or without tracking the correct model during the projection does not give valid results in the reduced space. In the original space, the points are correctly reconstructed. With the SCurve, the results are better.</p><p><center></p><table><tr><td><div
id="attachment_274" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00ml_projectionsamplesproj.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00ml_projectionsamplesproj-300x225.png" alt="SwissRoll: Projected coordinates with ML" title="SwissRoll: Projected coordinates with ML" width="300" height="225" class="size-medium wp-image-274" /></a><p
class="wp-caption-text">SwissRoll: Projected coordinates with ML</p></div></td><td><div
id="attachment_276" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00ml_projectionsamplesprojected.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00ml_projectionsamplesprojected-300x225.png" alt="SwissRoll: Projection with ML" title="SwissRoll: Projection with ML" width="300" height="225" class="size-medium wp-image-276" /></a><p
class="wp-caption-text">SwissRoll: Projection with ML</p></div></td></tr><tr><td><div
id="attachment_280" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00map_projectionsamplesproj.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00map_projectionsamplesproj-300x225.png" alt="SwissRoll: Projected samples with MAP" title="SwissRoll: Projected samples with MAP" width="300" height="225" class="size-medium wp-image-280" /></a><p
class="wp-caption-text">SwissRoll: Projected samples with MAP</p></div></td><td><div
id="attachment_283" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00map_projectionsamplesprojected.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/swissrollcoords2cf08ml-plmr089nonoise00map_projectionsamplesprojected-300x225.png" alt="SwissRoll: Projection with MAP" title="SwissRoll: Projection with MAP" width="300" height="225" class="size-medium wp-image-283" /></a><p
class="wp-caption-text">SwissRoll: Projection with MAP</p></div></td></tr><tr><td><div
id="attachment_281" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00ml_projectionsamplesproj.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00ml_projectionsamplesproj-300x225.png" alt="SCurve: Projected samples with ML" title="SCurve: Projected samples with ML" width="300" height="225" class="size-medium wp-image-281" /></a><p
class="wp-caption-text">SCurve: Projected samples with ML</p></div></td><td><div
id="attachment_284" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00ml_projectionsamplesprojected.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00ml_projectionsamplesprojected-300x225.png" alt="SCurve: Projection with ML" title="SCurve: Projection with ML" width="300" height="225" class="size-medium wp-image-284" /></a><p
class="wp-caption-text">SCurve: Projection with ML</p></div></td></tr><tr><td><div
id="attachment_282" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00map_projectionsamplesproj.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00map_projectionsamplesproj-300x225.png" alt="SCurve: Projected samples with MAP" title="SCurve: Projected samples with MAP" width="300" height="225" class="size-medium wp-image-282" /></a><p
class="wp-caption-text">SCurve: Projected samples with MAP</p></div></td><td><div
id="attachment_285" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00map_projectionsamplesprojected.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/scurvecoords2cf08ml-plmr089nonoise00map_projectionsamplesprojected-300x225.png" alt="SCurve: Projection with MAP" title="SCurve: Projection with MAP" width="300" height="225" class="size-medium wp-image-285" /></a><p
class="wp-caption-text">SCurve: Projection with MAP</p></div></td></tr></table><p></center></p><p>As an other benchmark, I&#8217;ve taken the 20 datasets from the COIL-20 database. I&#8217;ve added an occlusion to each image, and I&#8217;ve projected the images with and without occlusion on the corresponding manifold (I&#8217;ve used a Laplacian random variable to correctly describe the noise made by the occlusion). I&#8217;ve compared the results to a robust projection on the 2 or 15 first eigenvectors of the datasets. First, without occlusions, the projection on 15 eigenvectors manages to reproduce the images, as does my method. Of course, only 2 eigenvectors is not enough to describe the dataset correctly (although I&#8217;m also using only 2 coordinates to describe them). With 40% of occlusion, my method gets on a par with the 15 eigenvectors, and even better if we consider only the reconstruction error. Indeed, my method yields better images than the 15 eigenvectors.</p><p><center></p><table><tr><td
colspan=2><center><div
id="attachment_291" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01stripe4.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01stripe4-300x66.png" alt="COIL-20 dataset 1: 40% occlusion" title="COIL-20 dataset 1: 40% occlusion" width="300" height="66" class="size-medium wp-image-291" /></a><p
class="wp-caption-text">COIL-20 dataset 1: 40% occlusion</p></div></center></td></tr><tr><td
colspan=2><center><div
id="attachment_292" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01coords2pca08pca069nonoise00ml_projectionsamplesstripe4.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01coords2pca08pca069nonoise00ml_projectionsamplesstripe4-300x66.png" alt="Projection on 2 eigenvectors" title="Projection on 2 eigenvectors" width="300" height="66" class="size-medium wp-image-292" /></a><p
class="wp-caption-text">Projection on 2 eigenvectors</p></div></center></td></tr><tr><td
colspan=2><center><div
id="attachment_293" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01coords15pca08pca069nonoise00ml_projectionsamplesstripe4.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01coords15pca08pca069nonoise00ml_projectionsamplesstripe4-300x66.png" alt="Projection on 15 eigenvectors" title="Projection on 15 eigenvectors" width="300" height="66" class="size-medium wp-image-293" /></a><p
class="wp-caption-text">Projection on 15 eigenvectors</p></div></center></td></tr><tr><td
colspan=2><center><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01coords2cf08ml-plmr069nonoise00map_projectionsamplesstripe4.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toy01coords2cf08ml-plmr069nonoise00map_projectionsamplesstripe4-300x66.png" alt="Projection on &quot;my&quot; manifold" title="Projection on &quot;my&quot; manifold" width="300" height="66" class="size-medium wp-image-294" /></a></center></td></tr><tr><td><div
id="attachment_287" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toys00hits.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toys00hits-300x213.png" alt="Exact pose estimation (0% occlusion)" title="Exact pose estimation (0% occlusion)" width="300" height="213" class="size-medium wp-image-287" /></a><p
class="wp-caption-text">Exact pose estimation (0% occlusion)</p></div></td><td><div
id="attachment_289" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toys00near.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toys00near-300x213.png" alt="Approximate pose estimation (0% occlusion)" title="Approximate pose estimation (0% occlusion)" width="300" height="213" class="size-medium wp-image-289" /></a><p
class="wp-caption-text">Approximate pose estimation (0% occlusion)</p></div></td></tr><tr><td><div
id="attachment_288" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toys40hits.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toys40hits-300x213.png" alt="Exact pose estimation (40% occlusion)" title="Exact pose estimation (40% occlusion)" width="300" height="213" class="size-medium wp-image-288" /></a><p
class="wp-caption-text">Exact pose estimation (40% occlusion)</p></div></td><td><div
id="attachment_290" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toys40near.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toys40near-300x213.png" alt="Approximate pose estimation (40% occlusion)" title="Approximate pose estimation (40% occlusion)" width="300" height="213" class="size-medium wp-image-290" /></a><p
class="wp-caption-text">Approximate pose estimation (40% occlusion)</p></div></td></tr></table><p></center></p><h4>Classification</h4><p>Now, if we use the 20 datasets as 20 manifolds and if we project all images on all manifolds and select the best projection for each image, we have a new of doing a classification.  This leads to the following graphic representation of the confusion matrix (hotter colors indicate higher percentage).</p><p><center></p><table><tr><td><div
id="attachment_297" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toysamples00.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toysamples00-300x213.png" alt="Confusion matrix for the whole COIL-20 database (0% occlusion)" title="Confusion matrix for the whole COIL-20 database (0% occlusion)" width="300" height="213" class="size-medium wp-image-297" /></a><p
class="wp-caption-text">Confusion matrix for the whole COIL-20 database (0% occlusion)</p></div></td><td><div
id="attachment_298" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/toysamples10.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/toysamples10-300x213.png" alt="Confusion matrix for the whole COIL-20 database (10% occlusion)" title="Confusion matrix for the whole COIL-20 database (10% occlusion)" width="300" height="213" class="size-medium wp-image-298" /></a><p
class="wp-caption-text">Confusion matrix for the whole COIL-20 database (10% occlusion)</p></div></td></tr></table><p></center></p><p>Without occlusion, the confusion matrix is pretty good, but with even 10% occlusion, some test class samples are misclassified for an other class. This is because those classes are darker than the actual test sample class, and with an occlusion, these classes fit better the occlusions.</p><p>Now, for the last dataset, I&#8217;ve taken samples from the <a
href="http://www.outex.oulu.fi/">Outex database</a>. Each texture image is cut is 16 samples, then one half of the samples is used as training, the other half as the test samples. Then, the sample are transformed in cooccurrence matrices.</p><p><center></p><table><tr><td><div
id="attachment_299" class="wp-caption aligncenter" style="width: 95px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/outex00.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/outex00.png" alt="One texture sample" title="One texture sample" width="85" height="85" class="size-full wp-image-299" /></a><p
class="wp-caption-text">One texture sample</p></div></td><td><div
id="attachment_300" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/outex00cooc.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/outex00cooc-300x300.png" alt="Multichannel cooccurrence matrices" title="Multichannel cooccurrence matrices" width="300" height="300" class="size-medium wp-image-300" /></a><p
class="wp-caption-text">Multichannel cooccurrence matrices</p></div></td></tr></table><p></center></p><p>There 72 textures, thus 72 classes. Here are the resulting confusion matrices for the training and test samples.</p><p><center></p><table><tr><td><div
id="attachment_301" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/outex.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/outex-300x213.png" alt="Confusion matrix for the training samples" title="Confusion matrix for the training samples" width="300" height="213" class="size-medium wp-image-301" /></a><p
class="wp-caption-text">Confusion matrix for the training samples</p></div></td><td><div
id="attachment_302" class="wp-caption aligncenter" style="width: 310px"><a
href="http://matt.eifelle.com/wp-content/uploads/2009/01/outexsamples.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2009/01/outexsamples-300x213.png" alt="Confusion matrix for the test samples" title="Confusion matrix for the test samples" width="300" height="213" class="size-medium wp-image-302" /></a><p
class="wp-caption-text">Confusion matrix for the test samples</p></div></td></tr></table><p></center></p><p>In this case, the results are better than the litterature (Generalization of the cooccurrence matrix for colour images : Application to colour texture classification, <u>Image Analysis &#038; Stereology</u>, 2004).</p><h4>The end</h4><p>This is the last post I&#8217;ll be doing on manifold learning. It is very long because I wanted to write my last results (some can&#8217;t be found in the litterature), and I didn&#8217;t feel writing two or three posts. I&#8217;m not researching mainfold learning anymore, so it needed to be finished clearly.</p><p>I hope you enjoyed the different posts in this category. There is still much to do in the field, but one cannot do everything&#8230;</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2009/01/08/dimensionality-reduction-projection-and-classification/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Dimensionality reduction: videos in regression algorithms</title><link>http://matt.eifelle.com/2008/09/09/dimensionality-reduction-videos-in-regression-algorithms/</link> <comments>http://matt.eifelle.com/2008/09/09/dimensionality-reduction-videos-in-regression-algorithms/#comments</comments> <pubDate>Tue, 09 Sep 2008 08:04:23 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Dimensionality reduction]]></category> <category><![CDATA[Multidimensional regression]]></category> <category><![CDATA[Videos]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=109</guid> <description><![CDATA[Two months ago, my last post was on regression. I&#8217;d like to start this new year with some videos on how my algorithms behave.The first video shows the manifold being regressed with a color mapping. Each color symbolizes a model in the piecewise linear function.Here is the evolution of the mapping with the first algorithm. [...]]]></description> <content:encoded><![CDATA[<p>Two months ago, my last post was on regression. I&#8217;d like to start this new year with some videos on how <a
href="http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/">my algorithms</a> behave.<br
/> <span
id="more-109"></span><br
/> The first video shows the manifold being regressed with a color mapping. Each color symbolizes a model in the piecewise linear function.</p><p><object
width="425" height="344"><param
name="movie" value="http://www.youtube.com/v/T0c_wCGhwq8&#038;hl=en&#038;fs=1"></param><param
name="allowFullScreen" value="true"></param><embed
src="http://www.youtube.com/v/T0c_wCGhwq8&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" width="425" height="344"></embed></object></p><p>Here is the evolution of the mapping with the first algorithm. Only a part of the manifold is regressed each time, and there is no way of tuning the result.</p><p><object
width="425" height="344"><param
name="movie" value="http://www.youtube.com/v/iQdfZsPnNko&#038;hl=en&#038;fs=1"></param><param
name="allowFullScreen" value="true"></param><embed
src="http://www.youtube.com/v/iQdfZsPnNko&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" width="425" height="344"></embed></object></p><p>The following video was made with the second algorithm. Each time, the whole manifold is regressed and the model enhanced. I didn&#8217;t put the whole optimization, at the end nothing could be seen on such a video.<br
/> After the first iteration, the original model is split in two other models, and this happens again after the second iteration.</p><p><object
width="425" height="344"><param
name="movie" value="http://www.youtube.com/v/YSKn9cTXBgA&#038;hl=en&#038;fs=1"></param><param
name="allowFullScreen" value="true"></param><embed
src="http://www.youtube.com/v/YSKn9cTXBgA&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" width="425" height="344"></embed></object></p><p>I&#8217;ll make other videos on the other parts of my work.</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/09/09/dimensionality-reduction-videos-in-regression-algorithms/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Dimensionality reduction: mapping the reduced space into the original space</title><link>http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/</link> <comments>http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/#comments</comments> <pubDate>Mon, 14 Jul 2008 07:32:35 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Dimensionality reduction]]></category> <category><![CDATA[Multidimensional regression]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=78</guid> <description><![CDATA[Once the data set is reduced (see my first posts if you&#8217;re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:you can interpolate the data in the original space based on an interpolation in the reduced space, or
you create an approximation of the mapping with a multidimensional [...]]]></description> <content:encoded><![CDATA[<p>Once the data set is reduced (see my first posts if you&#8217;re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:</p><ul><li>you can interpolate the data in the original space based on an interpolation in the reduced space, or</li><li>you create an approximation of the mapping with a multidimensional function (B-splines, &#8230;)</li></ul><p>When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.</p><p>The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).</p><p><span
id="more-78"></span></p><h4>Position of the problem</h4><p>In the literature, <a
href="http://books.nips.cc/papers/files/nips14/AA05.pdf">several</a> <a
href="http://www.merl.com/papers/docs/TR2003-13.pdf">papers</a> were published in order to create a piecewise linear function. Their main advantage is that they can compute the reduced space when computing the mapping. Their issue is that the number of models is fixed at the beginning of the optimization. What I propose now is an adaptive number of models, depending on the manifold.</p><p>Each point has a set of neighbors, the k nearest ones are used here.</p><h4>Basic approach</h4><p>In fact, the basic algorithm is straightforward: put a new model where you can, optimize all models, label the points you can label, loop and stop at some point.</p><p>The first algorithm I used is the following:</p><ol><li>Start with no model.</li><li>Find a point whose neighborhood is not labeled, if you can&#8217;t, stop.</li><li>Create a new model there and label the points accordingly.</li><li>Estimate all models with regards to the labels.</li><li>Label each point to the nearest model if the point is too far (that is further than a factor times the mean error), do not label it.</li><li>If one model has not enough points (twice the dimension of the reduced space), the model is deleted.</li><li>Go back to step 2.</li></ol><p>This algorithm is very simple, but cannot model precisely a manifold, you cannot tune it. But it can give a good first impression on the manifold.</p><p>Here are some steps of the algorithm for a SwissRoll (the left figure indicates the label in the reduced space and the left figure is the approximation of the manifold in the original space):</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start.png"><img
class="alignnone size-medium wp-image-79" title="Labels when starting the first iteration" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start-300x213.png" alt="Labels when starting the first iteration" width="300" height="213" /></a><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start_regressed.png"><img
class="alignnone size-medium wp-image-80" title="The first model and the mapped points in the original space" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start_regressed-300x213.png" alt="The first model and the mapped points in the original space" width="300" height="213" /></a></p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis.png"><img
class="alignnone size-medium wp-image-81" title="Updated labels after iteration 1" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis_regressed.png"><img
class="alignnone size-medium wp-image-82" title="Updated mapped points in the original space after iteration 1" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p><p>At the end of the iteration, the manifold is roughly approximated:</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis.png"><img
class="alignnone size-medium wp-image-83" title="Labels at the end of the training" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis_regressed.png"><img
class="alignnone size-medium wp-image-84" title="Mapping in the original space at the end of the last iteration" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p><h4>Maximize a likelihood</h4><p>As I&#8217;ve said, there are no ways for the first method to get a given precision. So instead of adding models where it is available, the second algorithm adds models where the current function is the least likely. Here are the steps for this algorithm:</p><ol><li>Start with one model (every point is labeled to this model).</li><li>Compute the likelihood of each point.</li><li>Get the <em>n</em> points for which the neighborhood is the least likely</li><li>Get one point from this set.</li><li>Create a new model and label the point and its neighborhood to it.</li><li>Update all plans.</li><li>Update all labels (label a point to the most likely model).</li><li>If one model has not enough points (twice the dimension of the reduced space), the model is deleted.</li><li>Go to step 6 until the labels are stable.</li><li>Go to step 2 if a criterion is not met (usually, I choose the Aike information criterion).</li></ol><p>I&#8217;ve added later an additional step that asserts that the points assigned to a model are connected (that is the subgraph is connex). If it is not, the model is split in several parts, one for each connected component.</p><p>I didn&#8217;t state it before, but the error can be modeled by different random variables. I&#8217;ve chosen an isotropic Gaussian variable in every training.</p><p>There an immediate improvement of the algorithm, as it can be seen in the following figures:</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start.png"><img
class="alignnone size-medium wp-image-85" title="Adding a new plan where the current model is not likely enough" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start-300x213.png" alt="" width="300" height="213" /></a><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start_regressed.png"><img
class="alignnone size-medium wp-image-86" title="A new model is introduced where the curren tmodel is least likely" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start_regressed-300x213.png" alt="" width="300" height="213" /></a></p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis.png"><img
class="alignnone size-medium wp-image-87" title="Stabilization of the three models after the introduction of a third model due to the connexity contraints" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis_regressed.png"><img
class="alignnone size-medium wp-image-88" title="The three models after their optimization" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p><p>Here, after the introduction of the second model, a third was introduced. Indeed, the second model took the middle part of the reduced space thus splitting the graph with the points of the first model in two components, thus a new model was added.</p><p>At the end of the global optimization, the result is the following one:</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis.png"><img
class="alignnone size-medium wp-image-89" title="Far more models are now used" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a
href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis_regressed.png"><img
class="alignnone size-medium wp-image-90" title="The manifold model after the optimization" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p><p>The differences between the two algorithms are obvious in the quality of the reconstruction. The first one had a 20% reconstruction error, instead of 3% for the second one. Although it is more complicated, it is more capable of optimizing the problem of finding linear models that will minimize the reconstruction error.</p><h4>Coming next</h4><p>After this training, a complete manifold model is available, with the precision one need. I&#8217;ll present how it can be used in the following posts. Stay tuned !</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Dimensionality reduction: the scikit is available !</title><link>http://matt.eifelle.com/2008/06/27/dimensionality-reduction-the-scikit-is-available/</link> <comments>http://matt.eifelle.com/2008/06/27/dimensionality-reduction-the-scikit-is-available/#comments</comments> <pubDate>Fri, 27 Jun 2008 07:50:15 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[BSD license]]></category> <category><![CDATA[code]]></category> <category><![CDATA[Machine learning]]></category> <category><![CDATA[scikit]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=60</guid> <description><![CDATA[My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..
I&#8217;ve written a small tutorial. It is not an usual tutorial (there is a user tutorial and then what developers should know [...]]]></description> <content:encoded><![CDATA[<p>My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..</p><p>I&#8217;ve written <a
href="http://scipy.org/scipy/scikits/wiki/MachineLearning/ManifoldLearning">a small tutorial</a>. It is not an usual tutorial (there is a user tutorial and then what developers should know to enhance it), and some results of the techniques are exposed in my blog. It provides the basic commands to start using the scikit yourself (reducing some data, projecting new points, &#8230;) as well as the expoed interface to enhance the scikit.</p><p>If you have any question, feel free to ask me, I will add the answers to the tutorial page so that everyone can benefit from it.</p><p>Be free to contribute new techniques and additional tools as well, I cannot write them all ! For instance, the scikit lacks some robust neighbors selection to avoid short-cuts in the manifold&#8230;</p><p><a
href="http://scipy.org/scipy/scikits/wiki/MachineLearning">Tutorial</a> and <a
href="http://scipy.org/scipy/scikits/wiki/MachineLearning">the <em>learn</em> scikit mainpage</a>.</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/06/27/dimensionality-reduction-the-scikit-is-available/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>A Metric Multidimensional Scaling-Based Nonlinear Manifold Learning Approach for Unsupervised Data Reduction</title><link>http://matt.eifelle.com/2008/06/11/a-metric-multidimensional-scaling-based-nonlinear-manifold-learning-approach-for-unsupervised-data-reduction/</link> <comments>http://matt.eifelle.com/2008/06/11/a-metric-multidimensional-scaling-based-nonlinear-manifold-learning-approach-for-unsupervised-data-reduction/#comments</comments> <pubDate>Wed, 11 Jun 2008 07:31:43 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[article]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=76</guid> <description><![CDATA[At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that&#8217;s why I waited before publishing this post).
The journal is free, so you won&#8217;t have to pay to read it : Access to the EURASIP JASP article
I will publish additional figures here in [...]]]></description> <content:encoded><![CDATA[<p>At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that&#8217;s why I waited before publishing this post).<br
/> The journal is free, so you won&#8217;t have to pay to read it : <a
href="http://dx.doi.org/10.1155/2008/862015">Access to the EURASIP JASP article</a></p><p>I will publish additional figures here in a short time. The scikit is almost completed as well, I&#8217;m finishing the online tutorial for those who are interested in using it and/or enhancing it.</p>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/06/11/a-metric-multidimensional-scaling-based-nonlinear-manifold-learning-approach-for-unsupervised-data-reduction/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Dimensionality reduction: comparison of different methods</title><link>http://matt.eifelle.com/2008/04/23/dimensionality-reduction-comparison-of-different-methods/</link> <comments>http://matt.eifelle.com/2008/04/23/dimensionality-reduction-comparison-of-different-methods/#comments</comments> <pubDate>Wed, 23 Apr 2008 07:11:43 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[CCA]]></category> <category><![CDATA[Curvilinear Component Analysis]]></category> <category><![CDATA[Diffusion maps]]></category> <category><![CDATA[Dimensionality reduction]]></category> <category><![CDATA[Isomap]]></category> <category><![CDATA[Laplacian Eigenmaps]]></category> <category><![CDATA[LLE]]></category> <category><![CDATA[MDS]]></category> <category><![CDATA[metric MDS]]></category> <category><![CDATA[NLM]]></category> <category><![CDATA[NonLinear Mapping]]></category> <category><![CDATA[PCA]]></category> <category><![CDATA[robust]]></category> <category><![CDATA[robust cost function]]></category> <category><![CDATA[Sammon]]></category><guid
isPermaLink="false">http://matt.eifelle.com/?p=48</guid> <description><![CDATA[I&#8217;ve already given some answers in one of my first tickets on manifold learning. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well known techniques.
First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve already given some answers in <a
href="http://matt.eifelle.com/2008/01/25/dimensionality-reduction-isomap/">one of my first tickets on manifold learning</a>. Here I will give some more complete results on the quality of the dimensionality reduction performed by the most well known techniques.</p><p>First of all, my test is about respecting the geodesic distances in the reduced space. This is not possible for some manifolds like a Gaussian 2D plot. I used the SCurve to create the test, as the speed on the curve is unitary and thus the distances in the coordinate space (the one I used to create the SCurve) are the same as the geodesic ones on the manifold. My test measures the matrix (Froebenius) norm between the original coordinates and the computed one up to an affine transform of the latter.<br
/> <span
id="more-48"></span></p><p>I tested several noise levels :</p><ul><li>no noise</li><li>5% of Gaussian noise</li><li>2% of Laplacian noise (only 2% because there are many outliers in the Laplacian law)</li><li>Impulsive noise on 12.5% elements of the distance matrix with 200% Laplacian noise (quantified with respect to the variance of the noise-free d1-d2, where d2 was estimated with my <a
href="http://matt.eifelle.com/2008/04/02/dimensionality-reduction-explicit-optimization-of-a-cost-function/">robust cost function</a>)</li></ul><p>Here are the results:</p><table
border=1 width=100%><tr><th>Method</th><th>no noise</th><th>Gaussian Noise 5%</th><th>Laplacian noise 2%</th><th>Impulsive noise</th></tr><tr
align="center"><td>PCA</td><td>43.6</td><td>43.6</td><td>44.4</td><td>na</td></tr><tr
align="center"><td>Isomap</td><td>3.01</td><td>8.55</td><td>7.01</td><td>3.80</td></tr><tr
align="center"><td>Sp</td><td>2.29</td><td>2.94</td><td>6.46</td><td>2.93</td></tr><tr
align="center"><td>Ssam</td><td>2.61</td><td>2.60</td><td>6.10</td><td>3.22</td></tr><tr
align="center"><td>Scca</td><td>3.01</td><td>6.22</td><td>4.70</td><td>3.09</td></tr><tr
align="center"><td>Laplacian Eigenmaps</td><td>21.13</td><td>23.51</td><td>23.47</td><td>na</td></tr><tr
align="center"><td>Diffusion Maps</td><td>67.50</td><td>67.76</td><td>67.54</td><td>na</td></tr><tr
align="center"><td>Hessian Eigenmaps</td><td>3.05</td><td>18.57</td><td>20.51</td><td>na</td></tr><tr
align="center"><td>LLE</td><td>40.1</td><td>90.2</td><td>69.2</td><td>na</td></tr></table><p>The geodesic-based algorithms perfom obviously and logically better than every other algorithms. In my case, I want this to happen as I want to estimate a mapping function between the reduced space and the original space. This estimation and the effect of the reduction algorithm on it will be the subjects of future tickets.</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/04/23/dimensionality-reduction-comparison-of-different-methods/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Dimensionality reduction: similarities graph and its use</title><link>http://matt.eifelle.com/2008/04/04/dimensionality-reduction-similarities-graph-and-its-use/</link> <comments>http://matt.eifelle.com/2008/04/04/dimensionality-reduction-similarities-graph-and-its-use/#comments</comments> <pubDate>Fri, 04 Apr 2008 07:58:25 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Diffusion maps]]></category> <category><![CDATA[Dimensionality reduction]]></category> <category><![CDATA[Hessian Eigenmaps]]></category> <category><![CDATA[Laplacian Eigenmaps]]></category> <category><![CDATA[LLE]]></category><guid
isPermaLink="false">http://matt.eifelle.com/2008/04/04/dimensionality-reduction-similarities-graph-and-its-use/</guid> <description><![CDATA[Some of the widely used method are based on a similarity graph made with the local structure. For instance LLE uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means [...]]]></description> <content:encoded><![CDATA[<p>Some of the widely used method are based on a similarity graph made with the local structure. For instance <a
href="http://matt.eifelle.com/2008/02/18/dimensionality-reduction-locally-linear-embedding/">LLE</a> uses the relative distances, which is related to similarities. Using similarities allows the use of sparse techniques. Indeed, a lot of points are not similar, and then the similarities matrix is sparse. This also means that a lot of manifold can be reduced with these techniques, but not with Isomap or the other geodesic-based techniques.</p><p>It is worth mentioning that I only implemented Laplacian Eigenmaps with a sparse matrix, due to the lack of generalized eigensolver for sparse matrix, but it will be available in a short time, I hope.</p><p><span
id="more-53"></span></p><p>The <a
href="http://citeseer.ist.psu.edu/632472.html">Laplacian Eigenmaps</a> are the most known technique using the similarity graph (safe for LLE, which is nothing more than a special case of the Laplacian Eigenmaps). The similarities are computed between neighboors (neighboors meaning the samples that are near one from another in a distance way or samples that are close, like pixels in an image), generally with a Gaussian kernel. The trick here is to choose the correct width of the kernel. Then, the similarities matrix is weighted (each column and line must sum to one, this is the Laplacian of the graph) and then eigenvectors are extracted from it. The first eigenvalue is one and must not be used.</p><p>Here is what I get :</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2lem08nonoise00test.png" title="Laplacian Eigenmaps compression of the Swissroll"><img
src="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2lem08nonoise00test.thumbnail.png" alt="Laplacian Eigenmaps compression of the Swissroll" /></a></p><p>One may wonder why the reduction is so poor, but I&#8217;m not <a
href="http://dx.doi.org/10.1109/TPAMI.2007.70735">the only one</a> to get this result. I tried every width for the kernel to no avail. The literature says that Laplacian Eigenmaps tendto cluster points, which is easily explained by the algorithm. The eigenproblem extracts the main eigenvectors so that the weighted similarities matrix is preserved (in a quadratic way). This means that even if points should be close, if they are not close enough, they have a similarity of 0 so the eigenproblem will separate them.</p><p><a
href="http://dx.doi.org/10.1016/j.acha.2006.04.006"> Diffusion maps</a> are another similarity graph technique. Although there is a Markovian/probabilistic interpretation, diffusion maps are basically Laplacian Eigenmaps with similarities computed between every pair of points. This means that they have the same drawbacks that Laplacian Eigenmaps except for the clustering. The width of the kernel is still difficult to estimate.</p><p>Here is the result :</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2dm08nonoise00test.png" title="Diffusion map compression of the Swissroll"><img
src="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2dm08nonoise00test.thumbnail.png" alt="Diffusion map compression of the Swissroll" /></a></p><p>The fact that every similarity is used explains the fact that diffusion maps cannot reduce the SwissRoll correctly. In this precise case, the kernel width was obviously too big, but smaller width gives a result similar to the Laplacian Eigenmaps, which is not correct either.</p><p>The other technique I will present is <a
href="http://www.pnas.org/cgi/doi/10.1073/pnas.1031596100">Hessian Eigenmaps</a>. Instead of estimating the Laplacian of the similarities graph, it tries to estimate the Hessian. This gives very good result for the SwissRoll :</p><p><a
href="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2hm08nonoise00test.png" title="Hessian Eigenmaps compression of the Swissroll"><img
src="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2hm08nonoise00test.thumbnail.png" alt="Hessian Eigenmaps compression of the Swissroll" /></a></p><p>Unfortunately, the technique is not robust to noise, as I will show you in the result ticket. Safe for this fact, the technique is robust to holes in the manifold (not uniformly sampled manifolds for instance), which is one of the biggest drawback in techniques based on the geodesic distances.</p><p>Stay tuned.</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/04/04/dimensionality-reduction-similarities-graph-and-its-use/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Dimensionality reduction: explicit optimization of a cost function</title><link>http://matt.eifelle.com/2008/04/02/dimensionality-reduction-explicit-optimization-of-a-cost-function/</link> <comments>http://matt.eifelle.com/2008/04/02/dimensionality-reduction-explicit-optimization-of-a-cost-function/#comments</comments> <pubDate>Wed, 02 Apr 2008 07:15:08 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[CCA]]></category> <category><![CDATA[Curvilinear Component Analysis]]></category> <category><![CDATA[Dimensionality reduction]]></category> <category><![CDATA[metric MDS]]></category> <category><![CDATA[NLM]]></category> <category><![CDATA[NonLinear Mapping]]></category> <category><![CDATA[robust cost function]]></category> <category><![CDATA[Sammon]]></category><guid
isPermaLink="false">http://matt.eifelle.com/2008/04/02/dimensionality-reduction-explicit-optimization-of-a-cost-function/</guid> <description><![CDATA[Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, &#8230; All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.
Some cost functions have been proposed, also known [...]]]></description> <content:encoded><![CDATA[<p>Analytical solutions to the dimensionality reduction problem are only possible for quadratic cost functions, like Isomap, LLE, Laplacian Eigenmaps, &#8230; All these solutions are sensitive to outliers. The issue with the quadratic hypothesis is that there is no outilers, but on real manifolds, the noise is always there.</p><p>Some cost functions have been proposed, also known as stress functions as they measure the difference between the estimated geodesic distance and the computed Euclidien distance in the &#8220;feature&#8221; space. Every metric MDS can be used as stress functions, here are some of them.</p><p><span
id="more-39"></span></p><p>The oldest function is Sammon&#8217;s NonLinear Mapping. Originally based on Euclidien distances, I implemented it with the approximated geodesic distances described in the <a
href="http://matt.eifelle.com/2008/01/25/dimensionality-reduction-isomap/">Isomap ticket</a>. The goal of this function is to add a weight (the inverse of the geodesic distance), leading to less weight for the greatest distances, but also an important weight for small distances.</p><p>Here is the cost function for the distances (<strong>y</strong> are the coordinates in the original space and <strong>x</strong> in the feature/reduced  space) :</p><p><img
src="http://matt.eifelle.com/wp-content/uploads/2008/03/ssam.png" alt="Sammond’s cost function" /></p><p>Optimizing this function with a conjugate-gradient descent from a random start can give this result :</p><p><a
title="Geodesic NonLinear Mapping compression of the Swissroll" href="http://matt.eifelle.com/wp-content/uploads/2008/03/swissrollcoords2gnlm08nonoise00test.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2008/03/swissrollcoords2gnlm08nonoise00test.thumbnail.png" alt="Geodesic NonLinear Mapping compression of the Swissroll" /></a></p><p>Another function that is present and cited in the litterature is Desmartines&#8217; one from the Curvilinear Component Analysis :</p><p><img
src="http://matt.eifelle.com/wp-content/uploads/2008/03/scca.png" alt="CCA cost function" /></p><p>The F() function is 1 when the argument is small (less than an arbitrary value lambda), and else 0.</p><p>As a consequence, this function is not convex, not even continuous. The algorithm proposed in the associated paper is not great, I never managed to make it work on a SwissRoll, even with few points. So here are the step I use to optimize the function :</p><ul><li>Start from a random position</li><li>Optimize 10 points (lambda infinite)</li><li>Optimize one more point :<ul><li>Move only this point with lambda infinite (gradient descent)</li><li>Move every point with a decreasing lambda (I start with lambda = greatest geodesic distance in the data set and linearly decrease it until lambda is the limit of 5% of he smallest geodesic distances)</li></ul></li></ul><p>Each time, the new point is moved according to every already placed point, then when every point is moving, only the local stresses are used. But the optimization can still go wrong and some points that should be close can end far one from another, because their associated stress is zero.</p><p>Here is the result for this optimization :</p><p><a
title="CCA compression of the Swissroll" href="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2cca08nonoise00test.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2008/04/swissrollcoords2cca08nonoise00test.thumbnail.png" alt="CCA compression of the Swissroll" /></a></p><p>The cost function I use is a robust one, not &#8220;recursive&#8221; as Desmartines qualifies it (the weight is not a function of the estimated distance as it is the case from the CCA cost function) :</p><p><img
src="http://matt.eifelle.com/wp-content/uploads/2008/03/sp.png" alt="Robust cost function" /></p><p>The first term is the robust term, derivable when the (geodesic estimated and Euclidien computed) distances are equals, the second term allows for a fast convergence when the distances are not correctly estimated (useful at the beginning of the optimization, less afterwards) and the last term gives a small weight for small distances, as they can be polluted by noise for noisy manifolds. Gamma should only be a small value, Tau is set to be equal to 80% of the geodesic distances and sigma to 5%. This gives good results in every case.</p><p>Here is the result with this cost function :</p><p><a
title="Robust compression of the SwissRoll" href="http://matt.eifelle.com/wp-content/uploads/2008/03/swissrollcoords2cf08nonoise00test.png"><img
src="http://matt.eifelle.com/wp-content/uploads/2008/03/swissrollcoords2cf08nonoise00test.thumbnail.png" alt="Robust compression of the SwissRoll" /></a></p><p>Its optimization is not easy as it can give folded reduced space as an answer. I proposed two algorithms to solve the issue :</p><p>The first one :</p><ul><li>Optimize every point<ul><li>Add some noise to the computed coordinates depending on the global cost and the iteration (at the end, the added noise must be very small)</li><li>Optimize with a simple gradient descent and a Fibonacci line search</li></ul></li></ul><p>The second one :</p><ul><li>Optimize 10 points with a gradient descent from a random start</li><li>Optimize with one new point<ul><li>Move only this point with a gradient descent</li><li>Move every points</li></ul></li></ul><p>The second algorithm is slower than the first, but it works every time.</p><p>Stay tuned for the results&#8230;</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/04/02/dimensionality-reduction-explicit-optimization-of-a-cost-function/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Some news about the manifold learning scikit</title><link>http://matt.eifelle.com/2008/03/03/some-news-about-the-manifold-learning-scikit/</link> <comments>http://matt.eifelle.com/2008/03/03/some-news-about-the-manifold-learning-scikit/#comments</comments> <pubDate>Mon, 03 Mar 2008 16:24:34 +0000</pubDate> <dc:creator>Matt</dc:creator> <category><![CDATA[Manifold learning]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[scikit]]></category><guid
isPermaLink="false">http://matt_temp.eifelle.com/item/16</guid> <description><![CDATA[I got the word today that my paper was accepted, so I can now focus on delivering the code.
I&#8217;m in the process of refactoring it so that it depends less on some of our libraries here. In two weeks, there is a nipy sprint in Paris I will attend, and machine learning is one of [...]]]></description> <content:encoded><![CDATA[<p>I got the word today that my paper was accepted, so I can now focus on delivering the code.</p><p>I&#8217;m in the process of refactoring it so that it depends less on some of our libraries here. In two weeks, there is a nipy sprint in Paris I will attend, and machine learning is one of the topic we will discuss, so this may indicate where and how I&#8217;ll contribute the code I will keep going on showing some results next week.</p><form
action="https://www.paypal.com/cgi-bin/webscr" method="post"> <input
type="hidden" name="cmd" value="_xclick" /> <input
type="hidden" name="business" value="matthieu.brucher@gmail.com" /><input
type="hidden" name="item_name" value="Buy Me a Coffee!" /><input
type="hidden" name="currency_code" value="USD" /><span
style="font-size:10.0pt"><strong> Buy Me a Coffee!</strong></span><br
/><br
/><select
id="amount" name="amount" class=""><option
value="3">Capuccino - 3$</option><option
value="6">Frappuccino - 6$</option><option
value="10">Hot Chocolate - 10$</option><option
value="20">Expensive Coffee - 20$</option><option
value="50">Alien Coffee - 50$</option></select><br
/><br
/><strong>Other Amount:</strong><br
/><br
/><input
type="text" name="amount" size="10" title="Other donate" value="" /><br
/><br
/><strong> Your Email Address :</strong><input
type="hidden" name="on0" value="Reference" /><br
/><br
/><input
type="text" name="os0" maxlength="60" /> <br
/><br
/> <input
type="hidden" name="no_shipping" value="2" /> <input
type="hidden" name="no_note" value="1" /> <input
type="hidden" name="mrb" value="3FWGC6LFTMTUG" /> <input
type="hidden" name="bn" value="IC_Sample" /> <input
type="hidden" name="return" value="http://matt.eifelle.com" /><input
type="image" src="https://www.paypal.com/en_US/i/btn/x-click-but11.gif" name="submit" alt="Make payments with payPal - it's fast, free and secure!" /></form>]]></content:encoded> <wfw:commentRss>http://matt.eifelle.com/2008/03/03/some-news-about-the-manifold-learning-scikit/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> </channel> </rss>
<!-- Served from: matt.eifelle.com @ 2010-07-30 08:46:45 by W3 Total Cache -->