<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Matt's blog</title>
	<atom:link href="http://matt.eifelle.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://matt.eifelle.com</link>
	<description></description>
	<pubDate>Tue, 22 Jul 2008 18:56:36 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
			<item>
		<title>Dimensionality reduction: mapping the reduced space into the original space</title>
		<link>http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/</link>
		<comments>http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 07:32:35 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Manifold learning]]></category>

		<category><![CDATA[Dimensionality reduction]]></category>

		<category><![CDATA[Multidimensional regression]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=78</guid>
		<description><![CDATA[Once the data set is reduced (see my first posts if you&#8217;re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:

you can interpolate the data in the original space based on an interpolation in the reduced space, or
you create an approximation of the mapping with a multidimensional [...]]]></description>
			<content:encoded><![CDATA[<p>Once the data set is reduced (see my first posts if you&#8217;re jumping on the bandwagon), there are several ways of mapping this reduced space to the original space:</p>
<ul>
<li>you can interpolate the data in the original space based on an interpolation in the reduced space, or</li>
<li>you create an approximation of the mapping with a multidimensional function (B-splines, &#8230;)</li>
</ul>
<p>When using the first solution, if you map one of the reduced point used for the training, you get the original point. With the second solution, you get a close point. If the data set you have is noisy you should use the second solution, not the first. And if you are trying to compress data (lossly compression), you can not use the first one, as you need the original points to get new interpolated points, so you are not compressing your data set.</p>
<p>The solution I propose is based on approximation with a set of piecewise linear models (each model being a mapping between a subspace of the reduced space to the original space). At the boundaries between the models, I do not assert continuity, contrary to hinging hyperplanes. Contrary to Projection Pursuit Regression and hinging hyperplane, my mapping is between the two spaces, and not from the reduced space to one coordinate in the original space. This will enable projection on the manifold (which is another subject that will be discussed in another post).</p>
<p><span id="more-78"></span></p>
<h4>Position of the problem</h4>
<p>In the literature, <a href="http://books.nips.cc/papers/files/nips14/AA05.pdf">several</a> <a href="http://www.merl.com/papers/docs/TR2003-13.pdf">papers</a> were published in order to create a piecewise linear function. Their main advantage is that they can compute the reduced space when computing the mapping. Their issue is that the number of models is fixed at the beginning of the optimization. What I propose now is an adaptive number of models, depending on the manifold.</p>
<p>Each point has a set of neighbors, the k nearest ones are used here.</p>
<h4>Basic approach</h4>
<p>In fact, the basic algorithm is straightforward: put a new model where you can, optimize all models, label the points you can label, loop and stop at some point.</p>
<p>The first algorithm I used is the following:</p>
<ol>
<li>Start with no model.</li>
<li>Find a point whose neighborhood is not labeled, if you can&#8217;t, stop.</li>
<li>Create a new model there and label the points accordingly.</li>
<li>Estimate all models with regards to the labels.</li>
<li>Label each point to the nearest model if the point is too far (that is further than a factor times the mean error), do not label it.</li>
<li>If one model has not enough points (twice the dimension of the reduced space), the model is deleted.</li>
<li>Go back to step 2.</li>
</ol>
<p>This algorithm is very simple, but cannot model precisely a manifold, you cannot tune it. But it can give a good first impression on the manifold.</p>
<p>Here are some steps of the algorithm for a SwissRoll (the left figure indicates the label in the reduced space and the left figure is the approximation of the manifold in the original space):</p>
<p><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start.png"><img class="alignnone size-medium wp-image-79" title="Labels when starting the first iteration" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start-300x213.png" alt="Labels when starting the first iteration" width="300" height="213" /></a><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start_regressed.png"><img class="alignnone size-medium wp-image-80" title="The first model and the mapped points in the original space" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_start_regressed-300x213.png" alt="The first model and the mapped points in the original space" width="300" height="213" /></a></p>
<p><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis.png"><img class="alignnone size-medium wp-image-81" title="Updated labels after iteration 1" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis_regressed.png"><img class="alignnone size-medium wp-image-82" title="Updated mapped points in the original space after iteration 1" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it01_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p>
<p>At the end of the iteration, the manifold is roughly approximated:</p>
<p><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis.png"><img class="alignnone size-medium wp-image-83" title="Labels at the end of the training" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis_regressed.png"><img class="alignnone size-medium wp-image-84" title="Mapping in the original space at the end of the last iteration" src="http://matt.eifelle.com/wp-content/uploads/2008/07/plmr_it99_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p>
<h4>Maximize a likelihood</h4>
<p>As I&#8217;ve said, there are no ways for the first method to get a given precision. So instead of adding models where it is available, the second algorithm adds models where the current function is the least likely. Here are the steps for this algorithm:</p>
<ol>
<li>Start with one model (every point is labeled to this model).</li>
<li>Compute the likelihood of each point.</li>
<li>Get the <em>n</em> points for which the neighborhood is the least likely</li>
<li>Get one point from this set.</li>
<li>Create a new model and label the point and its neighborhood to it.</li>
<li>Update all plans.</li>
<li>Update all labels (label a point to the most likely model).</li>
<li>If one model has not enough points (twice the dimension of the reduced space), the model is deleted.</li>
<li>Go to step 6 until the labels are stable.</li>
<li>Go to step 2 if a criterion is not met (usually, I choose the Aike information criterion).</li>
</ol>
<p>I&#8217;ve added later an additional step that asserts that the points assigned to a model are connected (that is the subgraph is connex). If it is not, the model is split in several parts, one for each connected component.</p>
<p>I didn&#8217;t state it before, but the error can be modeled by different random variables. I&#8217;ve chosen an isotropic Gaussian variable in every training.</p>
<p>There an immediate improvement of the algorithm, as it can be seen in the following figures:</p>
<p><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start.png"><img class="alignnone size-medium wp-image-85" title="Adding a new plan where the current model is not likely enough" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start-300x213.png" alt="" width="300" height="213" /></a><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start_regressed.png"><img class="alignnone size-medium wp-image-86" title="A new model is introduced where the curren tmodel is least likely" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_start_regressed-300x213.png" alt="" width="300" height="213" /></a></p>
<p><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis.png"><img class="alignnone size-medium wp-image-87" title="Stabilization of the three models after the introduction of a third model due to the connexity contraints" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis_regressed.png"><img class="alignnone size-medium wp-image-88" title="The three models after their optimization" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it01_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p>
<p>Here, after the introduction of the second model, a third was introduced. Indeed, the second model took the middle part of the reduced space thus splitting the graph with the points of the first model in two components, thus a new model was added.</p>
<p>At the end of the global optimization, the result is the following one:</p>
<p><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis.png"><img class="alignnone size-medium wp-image-89" title="Far more models are now used" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis-300x213.png" alt="" width="300" height="213" /></a><a href="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis_regressed.png"><img class="alignnone size-medium wp-image-90" title="The manifold model after the optimization" src="http://matt.eifelle.com/wp-content/uploads/2008/07/mlplmr_it99_updatebv_bis_regressed-300x213.png" alt="" width="300" height="213" /></a></p>
<p>The differences between the two algorithms are obvious in the quality of the reconstruction. The first one had a 20% reconstruction error, instead of 3% for the second one. Although it is more complicated, it is more capable of optimizing the problem of finding linear models that will minimize the reconstruction error.</p>
<h4>Coming next</h4>
<p>After this training, a complete manifold model is available, with the precision one need. I&#8217;ll present how it can be used in the following posts. Stay tuned !</p>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/07/14/dimensionality-reduction-mapping-the-reduced-space-into-the-original-space/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Book review: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism</title>
		<link>http://matt.eifelle.com/2008/07/09/book-review-intel-threading-building-blocks-outfitting-c-for-multi-core-processor-parallelism/</link>
		<comments>http://matt.eifelle.com/2008/07/09/book-review-intel-threading-building-blocks-outfitting-c-for-multi-core-processor-parallelism/#comments</comments>
		<pubDate>Wed, 09 Jul 2008 07:27:24 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Book review]]></category>

		<category><![CDATA[C++]]></category>

		<category><![CDATA[O'Reilly]]></category>

		<category><![CDATA[Parallel computing]]></category>

		<category><![CDATA[Threading Building Blocks]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=77</guid>
		<description><![CDATA[After some general books on grid computation, I needed to change the subject of my readings a little bit. As Intel Threading Building Blocks always intrigued me, I chose the associated book.

Content and opinions
After a small introduction chapter (installation, the TBB philosophy of tasks instead of threads), the second chapter exposes what parallelism is. In [...]]]></description>
			<content:encoded><![CDATA[<p>After some general books on grid computation, I needed to change the subject of my readings a little bit. As Intel Threading Building Blocks always intrigued me, I chose the associated book.<br />
<span id="more-77"></span></p>
<h4>Content and opinions</h4>
<p>After a small introduction chapter (installation, the TBB philosophy of tasks instead of threads), the second chapter exposes what parallelism is. In every C++ book (that I know of), the programmer is taught how to think one instruction after another. The trouble with parallel thinking is that you are executing several things together that can interact, and this is not something the human mind handles correctly. So every useful concept is presented, locks, scaling, &#8230;</p>
<p>The next chapter is perhaps the most useful one, as it presents basic parallel loops that you will use almost everywhere if you have data parallelism. The concept of <strong>range</strong> is introduced, and it&#8217;s the basic tool for TBB. The problem is split thanks to these ranges and when the ranges are small enough, tasks are created for each of these small ranges. The drawback of this chapter is the code quality (I hate when the code uses a mix of C/C++ when the library is 100% C++, that is, don&#8217;t use the <strong>climits</strong> header and a macro when there is the <strong>limits</strong> header!) and the lack of some explanations of the expected speedup of the algorithms. For instance, <strong>parallel_reduce</strong> is different than the parallel reduction proposed as a sample by CUDA. Then, <strong>parallel_scan</strong> (it computes <strong>y[n] = f[n](y[n-1])</strong>) is not clear either. The speedup is achieved by computing several time the same values, but in one case, the result is not saved in the memory. So what you can gain is the fact that for the last part of the computation, only the cost of computing the results for the first parts matters. So if your function is complicated, your speedup is zero. A small explanation for this would have been great.</p>
<p>Fortunately, this chapter was the only one that had such drawbacks. Unfortunately, these drawbacks occur in the most import part of the book. After these basic blocks, task parallelism is introduced. Contrary to data parallelism for which you split your dataset into chunks that can be computed on a lot of cores, task parallelism only splits the work into a fixed number of predefined tasks. This means that the work cannot scale as much as for data parallelism, but the task themselves can be data parallel. This is done with a <strong>pipeline</strong>. Other algorithms for other kinds of loops are exposed and complete the toolbox for basic/usual processing.</p>
<p>Some algorithms need to update some variables in containers (like a sort algorithm), and as the STL containers are not thread-safe, specific containers must be used. TBB proposes a queue, a vector and an hash map for those purposes. The operations on these containers are not the same as the STL operations. New thread-safe methods are available with some examples.</p>
<p>The following chapter is about memory allocation. TBB exposes its own functions so that there cannot be cache conflicts between two threads. Basic examples show how to use them in overloaded <strong>new</strong> and <strong>delete</strong> instructions.</p>
<p>Then, the books gets one step further inside TBB with mutual exclusion. The underlying OS <strong>mutex</strong> is wrapped, but additional specific mutexes are available. Their specifications is well explained. A small chapter is dedicated to timing in a safe way. It is really useful for task timing (before using Intel tools for thread profiling).</p>
<p>The last important chapter is about the task scheduler. This is the basis of the whole library, and using it directly can generally be avoided, but if you don&#8217;t have a choice, TBB allows its direct use. The different choices Intel made when designing it are clearly stated, although some paragraphs are not that easy to understand. This is because the task scheduler is very modular and a lot of things can be done. This chapter talks about every possibility, but don&#8217;t forget to state precisely what you need so that your application is as simple as possible.</p>
<p>The next to last chapter echoes the second one. It sums up the different things you have to remember when write parallel algorithms with TBB. Although some points are pragmatic and very basic, they are all sound.</p>
<p>Finally, the last chapter exposes several examples with increasing difficulty (following the book actually). Additional examples are available with the library but these ones are explained and decrypted.</p>
<h4>Conclusion</h4>
<p>I had trouble at first with this book, because of the mix between C and C++ in the second chapter and then some trouble understanding how <strong>parallel_scan</strong> could even been sped up. But when I went on reading, I enjoyed it. Sometimes I would have liked additional example inside each chapter, instead of writing the interface of a class and explaining each function. In fact, this book, is a reference manual for TBB, and a good one. My troubles with the code was only at the beginning. At the end, everything is OK.</p>
<p>So if you want to use a more usable C++ thread library, go, get TBB (Open Source) and this book.</p>
<div class="subcolumns">
<div style="border: 1px solid #000; padding: 5px; margin-bottom: 15px; background: url(http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/amazon_US_small.gif) right bottom no-repeat #ffffff;">
<div style="width: 57px; float: left; margin-right: 5px;">
		<a href="http://www.amazon.com/exec/obidos/ASIN/0596514808/masbl03-20" target="_blank"><img src="http://ecx.images-amazon.com/images/I/519Nbs-uceL._SL75_.jpg" width="57" height="75" border="0" /></a>
	</div>
<div>
<p><a href="http://www.amazon.com/exec/obidos/ASIN/0596514808/masbl03-20" target="_blank">Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism</a> (Paperback)<br />
		<span style="font-size: 0.8em;">by <strong>James Reinders</strong></span><br />
		ISBN: 0596514808</p>
<p><strong>Price:</strong> <span style="color: #990000; font-weight: bold;">USD 31.49</span><br />
		<strong>37 used &#038; new</strong> available from <span style="color: #990000; font-weight: bold;">USD 15.32</span></p>
<p>		<img src="http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/stars-3.5.gif" class="asa_rating_stars" /> | 3.5 | 3
	</div>
<div style="clear: both;"></div>
</div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/07/09/book-review-intel-threading-building-blocks-outfitting-c-for-multi-core-processor-parallelism/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Dimensionality reduction: the scikit is available !</title>
		<link>http://matt.eifelle.com/2008/06/27/dimensionality-reduction-the-scikit-is-available/</link>
		<comments>http://matt.eifelle.com/2008/06/27/dimensionality-reduction-the-scikit-is-available/#comments</comments>
		<pubDate>Fri, 27 Jun 2008 07:50:15 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Manifold learning]]></category>

		<category><![CDATA[Python]]></category>

		<category><![CDATA[BSD license]]></category>

		<category><![CDATA[code]]></category>

		<category><![CDATA[Machine learning]]></category>

		<category><![CDATA[scikit]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=60</guid>
		<description><![CDATA[My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..
I&#8217;ve written a small tutorial. It is not an usual tutorial (there is a user tutorial and then what developers should know [...]]]></description>
			<content:encoded><![CDATA[<p>My manifold learning code was for some time a Technology Preview in the scikit learn. Now I can say that it is available (BSD license) and there should not be any obvious bug left..</p>
<p>I&#8217;ve written <a href="http://scipy.org/scipy/scikits/wiki/MachineLearning/ManifoldLearning">a small tutorial</a>. It is not an usual tutorial (there is a user tutorial and then what developers should know to enhance it), and some results of the techniques are exposed in my blog. It provides the basic commands to start using the scikit yourself (reducing some data, projecting new points, &#8230;) as well as the expoed interface to enhance the scikit.</p>
<p>If you have any question, feel free to ask me, I will add the answers to the tutorial page so that everyone can benefit from it.</p>
<p>Be free to contribute new techniques and additional tools as well, I cannot write them all ! For instance, the scikit lacks some robust neighbors selection to avoid short-cuts in the manifold&#8230;</p>
<p><a href="http://scipy.org/scipy/scikits/wiki/MachineLearning">Tutorial</a> and <a href="http://scipy.org/scipy/scikits/wiki/MachineLearning">the <em>learn</em> scikit mainpage</a>.</p>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/06/27/dimensionality-reduction-the-scikit-is-available/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Book review: Peer-to-Peer: Harnessing the Power of Disruptive Technologies</title>
		<link>http://matt.eifelle.com/2008/06/17/book-review-peer-to-peer-harnessing-the-power-of-disruptive-technologies/</link>
		<comments>http://matt.eifelle.com/2008/06/17/book-review-peer-to-peer-harnessing-the-power-of-disruptive-technologies/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 08:10:56 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[O'Reilly]]></category>

		<category><![CDATA[Book review]]></category>

		<category><![CDATA[P2P]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=75</guid>
		<description><![CDATA[Peer-to-peer. These words are unleashing in France a fight between the legislators and the developers. And this old - I say old because it was written in 2001, and 7 years is old for a book on this topic - book presented me the issues debated in journals, blogs, &#8230; in a new way.

Content and [...]]]></description>
			<content:encoded><![CDATA[<p>Peer-to-peer. These words are unleashing in France a fight between the legislators and the developers. And this old - I say old because it was written in 2001, and 7 years is old for a book on this topic - book presented me the issues debated in journals, blogs, &#8230; in a new way.<br />
<span id="more-75"></span></p>
<h4>Content and opinions</h4>
<p>The book is split in three parts. The first defines peer-to-peer, the second presents the then-usual applications of peer-to-peer and the third exposes the different issues that looms large with this technology (well, it&#8217;s more a group of technologies).</p>
<p>So the first part is about what peer-to-peer really is. It is really interesting to follow the history of Internet and see that it is so closely related to peer-to-peer. Then, the reason of the emergence of these technologies is presented, though ICQ and Napster. Perhaps not the best peer-to-peer applications (in the technological sense), but they were one of the first, so&#8230; The authors do not underestimate the legal/economic war that was and is still waged, but they ask the real questions about this war. Then, Tim O&#8217;Reilly proposes a new taxonomy of peer-to-peer, how to make it popular technologies for everyone, (music/cinematographic) industries included. The fourth chapter was not that interesting (very short), but showed some additional reasons to the success of everything that is peer-to-peer.</p>
<p>The second part is about projects. The first is SETI@Home, a distributed computer. It shows how it started and became the project we know, with the different problems it had to tackle. Then the next application is Jabber, a conversation tool. Based on XML, it can talk to other clients (MSN, ICQ, &#8230;), but is not only geared to people conversations, but also application conversations, which is somewhat different. Mixmaster is next. I didn&#8217;t know that some people used <em>remailers</em> to ensure anonymity (mails are sent through several mail servers). Besides, it seems that only few remailers are left. Then Gnutella and Free net are presented. Those two applications are perhaps the most known ones, as their goal is file-sharing. The underlying technology and approaches are very different, and thus it is interesting to read those two chapters. Red Rover, Publius and Free Haven are dedicated to avoid the censorship of some files. This can be very interesting for people located in countries where Internet is censored. I have to say that those applications were the least interesting for me, but they may interest others.</p>
<p>The third part is dedicated to some theoretical thoughts. Or at least that was what the title said. The first chapter, Metadata, says that files should have metadata. Yes. That is kind of logic. Then, there are some interesting thoughts about applications performance. The four next chapters are geared toward trust, anonymity, reputation or accounatbility. All in all, they talk about the same topics, and it gets boring in the end. Finally, the last chapter presents how all those applications may talk together.</p>
<h4>Conclusion</h4>
<p>I thought that the book would teach me a lot about peer-to-peer technology. In fact, I learned about the history of the Internet, some issues about performance, but a lot on file sharing. If the editor wanted to show that we can do much more than file-sharing with peer-to-peer, it didn&#8217;t show up in the actual book. Yes, accountability is important, as security, but speaking about how to remain anonymous when there is a illegal-peer-to-peer witch hunt (and it was already the case when the book was written) is not going to help the reputation of the technologies.<br />
I would like to have an updated version of the book speaking about the current peer-to-peer applications : Napster is dead, Gnutella is not widely used, &#8230; Finally, I would have liked to have more applications of the peer-to-peer technologies for the industry (distributed computing, web services, &#8230;). Seti@Home is a start, but only a start, it cannot be used by firms for their own computations.</p>
<div class="subcolumns">
<div style="border: 1px solid #000; padding: 5px; margin-bottom: 15px; background: url(http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/amazon_US_small.gif) right bottom no-repeat #ffffff;">
<div style="width: 50px; float: left; margin-right: 5px;">
		<a href="http://www.amazon.com/exec/obidos/ASIN/059600110X/masbl03-20" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51u0lB3NlJL._SL75_.jpg" width="50" height="75" border="0" /></a>
	</div>
<div>
<p><a href="http://www.amazon.com/exec/obidos/ASIN/059600110X/masbl03-20" target="_blank">Peer-to-Peer : Harnessing the Power of Disruptive Technologies</a> (Hardcover)<br />
		<span style="font-size: 0.8em;">by <strong>Andy Oram</strong></span><br />
		ISBN: 059600110X</p>
<p><strong>Price:</strong> <span style="color: #990000; font-weight: bold;">USD 21.86</span><br />
		<strong>92 used &#038; new</strong> available from <span style="color: #990000; font-weight: bold;">USD 0.32</span></p>
<p>		<img src="http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/stars-4.5.gif" class="asa_rating_stars" /> | 4.5 | 21
	</div>
<div style="clear: both;"></div>
</div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/06/17/book-review-peer-to-peer-harnessing-the-power-of-disruptive-technologies/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A Metric Multidimensional Scaling-Based Nonlinear Manifold Learning Approach for Unsupervised Data Reduction</title>
		<link>http://matt.eifelle.com/2008/06/11/a-metric-multidimensional-scaling-based-nonlinear-manifold-learning-approach-for-unsupervised-data-reduction/</link>
		<comments>http://matt.eifelle.com/2008/06/11/a-metric-multidimensional-scaling-based-nonlinear-manifold-learning-approach-for-unsupervised-data-reduction/#comments</comments>
		<pubDate>Wed, 11 Jun 2008 07:31:43 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Manifold learning]]></category>

		<category><![CDATA[Python]]></category>

		<category><![CDATA[article]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=76</guid>
		<description><![CDATA[At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that&#8217;s why I waited before publishing this post).
The journal is free, so you won&#8217;t have to pay to read it : Access to the EURASIP JASP article
I will publish additional figures here in [...]]]></description>
			<content:encoded><![CDATA[<p>At last, my article on manifold learning has been published and is accessible with doi.org (it was not the case last week, that&#8217;s why I waited before publishing this post).<br />
The journal is free, so you won&#8217;t have to pay to read it : <a href="http://dx.doi.org/10.1155/2008/862015">Access to the EURASIP JASP article</a></p>
<p>I will publish additional figures here in a short time. The scikit is almost completed as well, I&#8217;m finishing the online tutorial for those who are interested in using it and/or enhancing it.</p>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/06/11/a-metric-multidimensional-scaling-based-nonlinear-manifold-learning-approach-for-unsupervised-data-reduction/feed/</wfw:commentRss>
		</item>
		<item>
		<title>To translate or not to translate?</title>
		<link>http://matt.eifelle.com/2008/06/03/to-translate-or-not-to-translate/</link>
		<comments>http://matt.eifelle.com/2008/06/03/to-translate-or-not-to-translate/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 07:30:32 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=74</guid>
		<description><![CDATA[Today, I&#8217;m publishing a tutorial on two C++ profilers on my French website. The question I&#8217;m asking myself and you is: should I translate it ?
If some of you are interested in my French tutorials, I may translate them from time to time, depending on their content (I don&#8217;t want to translate an article on [...]]]></description>
			<content:encoded><![CDATA[<p>Today, I&#8217;m publishing a tutorial on two C++ profilers on my French <a href="http://matthieu-brucher.developpez.com/tutoriels/cpp/profil-valgrind-visual-studio/">website</a>. The question I&#8217;m asking myself and you is: should I translate it ?</p>
<p>If some of you are interested in my French tutorials, I may translate them from time to time, depending on their content (I don&#8217;t want to translate an article on Boost for instance, the documentation does provide everything). But I&#8217;ll do that only if people tell me &#8220;Go on&#8221;. So I&#8217;m all ears&#8230;</p>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/06/03/to-translate-or-not-to-translate/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A new future</title>
		<link>http://matt.eifelle.com/2008/05/27/a-new-future/</link>
		<comments>http://matt.eifelle.com/2008/05/27/a-new-future/#comments</comments>
		<pubDate>Tue, 27 May 2008 08:07:09 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[General]]></category>

		<category><![CDATA[job]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=73</guid>
		<description><![CDATA[Since the beginning of this year, I was trying to figure out what to do in my future. I&#8217;m still doing my PhD, but what could I do after that ?
Before
My current job is to find a model for datasets.
A lot of datasets can be explained by a small number of parameters. For instance identity [...]]]></description>
			<content:encoded><![CDATA[<p>Since the beginning of this year, I was trying to figure out what to do in my future. I&#8217;m still doing my PhD, but what could I do after that ?</p>
<h4>Before</h4>
<p>My current job is to find a model for datasets.</p>
<p>A lot of datasets can be explained by a small number of parameters. For instance identity photos of a single person can be explained by 3 translations and 3 rotations. So my algorithms did that: find the parameters (or something that is close enough) and create a mapping between the parameters and the original space.</p>
<p>During this research, I learnt what is scientific computing. I did not explore everything in this field, but I covered the basics. That&#8217;s where I found about Python, but also C++ (which is the first language I really used). My thirst for information lead me to read a lot of books on several matters (architectural design, process, but also parallel computing and its different flavors). This led me to search for a job that would interest me the most.</p>
<h4>After</h4>
<p>So starting from September I&#8217;ll move to <a href="http://www.pau.fr/">Pau</a>, a town in the South of France. This is where the biggest research center of Total S.A. is located. I will work on oil exploration.</p>
<p>Although the theory behind this are well known (acoustic wave propagation and inverse problem), this does not mean that research in this field is over. For instance, the power needed for solving these problems are enormous. So their implementation must be well thought. And even if you managed to find a solution to your problem, you are not done. Total&#8217;s goal is not to be able to see if acoustic waves propagate fast in some places and slowly in others. Its goal is to find oil and gas. So now that one has an acoustic model, one must see with the geologists if there are some odds that there is oil or gas. And that&#8217;s also a big interesting challenge.</p>
<h4>Meanwhile</h4>
<p>For those who were interested in <a href="http://matt.eifelle.com/category/python/manifold-learning/">manifold learning</a>,  don&#8217;t worry, I&#8217;m not finished in exposing my research. I will go on with some new posts about the mapping between the two spaces and how it can be used to test new samples. The scikit is now almost available. I still have to finish the tutorial and test if everything is OK.</p>
<p>I hope I will be able to continue with other subjects on this blog, there is no reason I cannot do this. Although what I&#8217;ll be doing at Total is secret, there are a lot of fields I&#8217;d like to talk about.</p>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/05/27/a-new-future/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Book review: From P2P to Web Services and Grids: Peers In A Client/Server World</title>
		<link>http://matt.eifelle.com/2008/05/06/book-review-from-p2p-to-web-services-and-grids-peers-in-a-clientserver-world/</link>
		<comments>http://matt.eifelle.com/2008/05/06/book-review-from-p2p-to-web-services-and-grids-peers-in-a-clientserver-world/#comments</comments>
		<pubDate>Tue, 06 May 2008 07:29:41 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Book review]]></category>

		<category><![CDATA[Distributed Computing]]></category>

		<category><![CDATA[Springer]]></category>

		<category><![CDATA[globus]]></category>

		<category><![CDATA[Grid computing]]></category>

		<category><![CDATA[JXTA]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=59</guid>
		<description><![CDATA[I was looking for an introductory book on peer-to-peer (P2P) application and their application to grid computation. Web services was a bonus, as it is something I don&#8217;t usually play with.

Content and opinions
The book is split in four parts. The first chapter is an introduction to distributed systems, with the definitions and some examples of [...]]]></description>
			<content:encoded><![CDATA[<p>I was looking for an introductory book on peer-to-peer (P2P) application and their application to grid computation. Web services was a bonus, as it is something I don&#8217;t usually play with.<br />
<span id="more-59"></span></p>
<h4>Content and opinions</h4>
<p>The book is split in four parts. The first chapter is an introduction to distributed systems, with the definitions and some examples of what is and what is not a centralized (by opposition to decentralized) application or framework. For people novice to P2P, the examples and the problems are well presented.</p>
<p>The first part goes into details of a distributed environment. The P2P solution is thus exposed, with its specific aspects, social (P2P has not a good reputation, to say the least) or routing (accessing peer behind firewalls, routers). As web services are one of the main subjects of the book, they are presented next. The last chapter tackles grid computing through its evolution, the current definition of a grid and the Globus Toolkit 2 architecture. Those chapters are really interesting because they lay down the ground for the remaining of the book.</p>
<p>The second part is about several P2P technologies that can be used, as well as some specific issues. <strong>Jini</strong> and <strong>Gnutella</strong> are the first ones to be exposed. They were not developed to answer to the same questions: Jini is about <em>Remote Objects</em> and Gnutella is about <em>file sharing</em>. These technologies introduce the issues of scalability and security ; the first tackles the use of more nodes in the grid, the second the protection of the grid. Finally, <strong>Freenet</strong> and <strong>JXTA</strong> are exposed. The first is dedicated to fiel storage on a distributed data grid, the second is a generic P2P framework. The chapters on the different technologies do not bring more information than what can be found on tutorials on the net, but they explain them in a clear way. Scalability and security are aspects that are sometimes forgotten in the design of a distributed system, so their presence in this part of the book helps remembering them.</p>
<p>Part three tackles Jini, JXTA and web service deployement. The first two chapters have some code samples that can be used ; for web service, there is only some XML fragments. Several formats are exposed in this chapter with their advantages and drawbacks.</p>
<p>Finally, the fourth chapter presents web services applied to grid systems. This gives grid services. The Globus toolkit 3 can be used for those grid services, and the future version 4 is introduced as well. This part is the shortest, maybe because these special services are not widely used, and a lot should still be explored to have a clear software designed (which may be used by the Globus toolkit 4, according to the book).</p>
<h4>Conclusion</h4>
<p>The good writting style of the book helps reading the book, as some pages can be difficult to understand. The final goal is to present grid services, with the underlying frameworks and tools that are grid (and P2P) systems and web services. The beginner is taken from the basics to advanced concepts, which can be applied to concrete grids.</p>
<p>If grid computation, how they can be done, and web services are of interest to you, I suggest you read this book.</p>
<div class="subcolumns">
<div style="border: 1px solid #000; padding: 5px; margin-bottom: 15px; background: url(http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/amazon_US_small.gif) right bottom no-repeat #ffffff;">
<div style="width: 50px; float: left; margin-right: 5px;">
		<a href="http://www.amazon.com/exec/obidos/ASIN/1852338695/masbl03-20" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51RFngQ32xL._SL75_.jpg" width="50" height="75" border="0" /></a>
	</div>
<div>
<p><a href="http://www.amazon.com/exec/obidos/ASIN/1852338695/masbl03-20" target="_blank">From P2P to Web Services and Grids: Peers in a Client/Server World</a> (Paperback)<br />
		<span style="font-size: 0.8em;">by <strong>Ian J. Taylor, Andrew Harrison</strong></span><br />
		ISBN: 1852338695</p>
<p><strong>Price:</strong> <span style="color: #990000; font-weight: bold;">USD 55.60</span><br />
		<strong>22 used &#038; new</strong> available from <span style="color: #990000; font-weight: bold;">USD 31.50</span></p>
<p>		<img src="http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/stars-4.5.gif" class="asa_rating_stars" /> | 4.5 | 4
	</div>
<div style="clear: both;"></div>
</div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/05/06/book-review-from-p2p-to-web-services-and-grids-peers-in-a-clientserver-world/feed/</wfw:commentRss>
		</item>
		<item>
		<title>My favorite design pattern in Python</title>
		<link>http://matt.eifelle.com/2008/04/30/my-favorite-design-pattern-in-python/</link>
		<comments>http://matt.eifelle.com/2008/04/30/my-favorite-design-pattern-in-python/#comments</comments>
		<pubDate>Wed, 30 Apr 2008 07:16:56 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Python]]></category>

		<category><![CDATA[Design pattern]]></category>

		<category><![CDATA[Registry]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=67</guid>
		<description><![CDATA[I&#8217;ve noticed some days ago that I mainly used one design pattern in my scientific (but not only) code, the registry. How does it work? A registry is a list/dictionary/&#8230; of objects, applications add a new entry if it is needed, and then a user can tap into the registry to find the most adequate [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve noticed some days ago that I mainly used one design pattern in my scientific (but not only) code, the <strong>registry</strong>. How does it work? A registry is a list/dictionary/&#8230; of objects, applications add a new entry if it is needed, and then a user can tap into the registry to find the most adequate object for one&#8217;s purpose.</p>
<p><span id="more-67"></span></p>
<p>In fact, the registry is one of the best replacement for the <em>switch</em> statement. Indeed, it is far more modular as new cases can be introduced and deleted, and it is more readable as well.</p>
<p>I used a registry in several pieces of code:</p>
<ul>
<li>when I&#8217;m playing with manifold learning, I have a registry of the dimensionality reduction tools and I can choose the one that fits my needs (given as a commandline argument);</li>
<li>I refactored scikits.openopt so that a registry contains the different solver wrappers, and now people can add their own wrapper in the dictionary, the key being the name of the solver and the value is a string with the package hierarchy to use;</li>
<li>I&#8217;m trying to develop a JXTA implementation in Python, and the registry shows up everywhere, or almost. For instance, as in the original, Java, implementation, the advertisements are stored in a dictionary, and new kind of advertisements can be registered efficiently. The parsing of each advertisement is also done with a class registry.</li>
</ul>
<p>Python, with is dynamic and duck typing, is naturally inclined to use registers, IMHO. This is more scientific-oriented coding, but the ease of use of a registry is very helpful in my everyday work.</p>
<p>Here is a sample of the automatic use in pyP2P:</p>

<div id="wp_codebox_msgheader"><span class="right"><a href="javascript:;" onclick="toggle_collapse('p674');">[<span id="p674_symbol">-</span>]</a><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://matt.eifelle.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=67&amp;download=__init__.py">__init__.py</a></span><div class="codebox_clear"></div></div><div id="wp_codebox"><table width="100%" ><tr id="p674"><td class="code" id="p67code4"><pre class="python"><span style="color: #ff7700;font-weight:bold;">from</span> advertisement_core <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #66cc66;">*</span>
<span style="color: #ff7700;font-weight:bold;">from</span> peer_advertisement <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #66cc66;">*</span></pre></td></tr></table></div>


<div id="wp_codebox_msgheader"><span class="right"><a href="javascript:;" onclick="toggle_collapse('p675');">[<span id="p675_symbol">-</span>]</a><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://matt.eifelle.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=67&amp;download=advertisement_core.py">advertisement_core.py</a></span><div class="codebox_clear"></div></div><div id="wp_codebox"><table width="100%" ><tr id="p675"><td class="code" id="p67code5"><pre class="python"><span style="color: #ff7700;font-weight:bold;">class</span> Advertisement<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
registry = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>


<div id="wp_codebox_msgheader"><span class="right"><a href="javascript:;" onclick="toggle_collapse('p676');">[<span id="p676_symbol">-</span>]</a><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://matt.eifelle.com/wp-content/plugins/wp-codebox/wp-codebox.php?p=67&amp;download=peer_advertisement.py">peer_advertisement.py</a></span><div class="codebox_clear"></div></div><div id="wp_codebox"><table width="100%" ><tr id="p676"><td class="code" id="p67code6"><pre class="python"><span style="color: #ff7700;font-weight:bold;">import</span> advertisement_core
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> PeerAdvertisement<span style="color: black;">&#40;</span>advertisement_core.<span style="color: black;">Advertisement</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
advertisement_core.<span style="color: black;">registry</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;PA&quot;</span><span style="color: black;">&#93;</span> = PeerAdvertisement</pre></td></tr></table></div>

<p>This way, when the module is imported, the registry is also automatically populated.</p>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/04/30/my-favorite-design-pattern-in-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Book review: Twisted: Networking Programming Essentials</title>
		<link>http://matt.eifelle.com/2008/04/28/book-review-twisted-networking-programming-essentials/</link>
		<comments>http://matt.eifelle.com/2008/04/28/book-review-twisted-networking-programming-essentials/#comments</comments>
		<pubDate>Mon, 28 Apr 2008 07:11:00 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
		
		<category><![CDATA[Book review]]></category>

		<category><![CDATA[O'Reilly]]></category>

		<category><![CDATA[Python]]></category>

		<category><![CDATA[Network]]></category>

		<guid isPermaLink="false">http://matt.eifelle.com/?p=66</guid>
		<description><![CDATA[This book is different from the two last books I read. Indeed, it tackles a specific Python library, Twisted, and how to use it.

Content and opinions
Twisted is a network library aiming at simplifying the developement of network applications. It is based on an event loop for all processing (unfortunately, no word in the book about [...]]]></description>
			<content:encoded><![CDATA[<p>This book is different from the two last books I read. Indeed, it tackles a specific Python library, <strong>Twisted</strong>, and how to use it.<br />
<span id="more-66"></span></p>
<h4>Content and opinions</h4>
<p>Twisted is a network library aiming at simplifying the developement of network applications. It is based on an event loop for all processing (unfortunately, no word in the book about managing several event loops, as it is the case with GUI-based applications).</p>
<p>After an introduction of what is event programming with simple clients and servers, the reader will be introduced to basic web clients and servers. Twisted proposes a lot of bridges to create webservices with XMLRPC or SOAP. The explanations and the code is pretty clear, and it is easy to do one&#8217;s own small distributed application with these blocks.</p>
<p>When authentification is introduced, it is hard to understand at first. Zope interfaces are used, but I didn&#8217;t find the explanation of what they are and what the function <em>implements()</em> is and does. One can find out with the context, but a complete introduction of these techniques should be done at this point. Once authentification is understood, other services are exposed, like mail clients and servers (how to send a mail, process the information in the mail to send an answer), as well as NNTP.</p>
<p>SSH is only introduced towards the end of the book. And it is not simply explained as it is mixed with shells. Finally, network applications often are services or dameons, how to create them is done in the last chapter.</p>
<h4>Conclusion</h4>
<p>This book is good, a lot of explanations and of code (some mistakes can be found here and there) helps understanding the use of the library. Some parts of the book are outdated, so I hope that a new edition will be published soon, and some software tools should be more explained. Every aspect of Twisted is not developed in the book, it&#8217;s only <em>Networking Programming Essentials</em>, but once the basics are known, the rest can be learnt with the documentation.</p>
<div class="subcolumns">
<div style="border: 1px solid #000; padding: 5px; margin-bottom: 15px; background: url(http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/amazon_US_small.gif) right bottom no-repeat #ffffff;">
<div style="width: 57px; float: left; margin-right: 5px;">
		<a href="http://www.amazon.com/exec/obidos/ASIN/0596100329/masbl03-20" target="_blank"><img src="http://ecx.images-amazon.com/images/I/51PY-oaLAsL._SL75_.jpg" width="57" height="75" border="0" /></a>
	</div>
<div>
<p><a href="http://www.amazon.com/exec/obidos/ASIN/0596100329/masbl03-20" target="_blank">Twisted Network Programming Essentials</a> (Paperback)<br />
		<span style="font-size: 0.8em;">by <strong>Abe Fettig</strong></span><br />
		ISBN: 0596100329</p>
<p><strong>Price:</strong> <span style="color: #990000; font-weight: bold;">USD 19.77</span><br />
		<strong>46 used &#038; new</strong> available from <span style="color: #990000; font-weight: bold;">USD 14.44</span></p>
<p>		<img src="http://matt.eifelle.com/wp-content/plugins/amazonsimpleadmin/img/stars-3.5.gif" class="asa_rating_stars" /> | 3.5 | 9
	</div>
<div style="clear: both;"></div>
</div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://matt.eifelle.com/2008/04/28/book-review-twisted-networking-programming-essentials/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
