<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Debroglie&#039;s repository</title>
	<atom:link href="http://blog.debroglie.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.debroglie.net</link>
	<description>Rpms in chemistry for centos and fedora</description>
	<lastBuildDate>Tue, 24 Apr 2012 15:07:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Using geogebra for optics</title>
		<link>http://blog.debroglie.net/2012/04/24/using-geogebra-for-optics/</link>
		<comments>http://blog.debroglie.net/2012/04/24/using-geogebra-for-optics/#comments</comments>
		<pubDate>Tue, 24 Apr 2012 12:13:07 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Pascal's diary]]></category>
		<category><![CDATA[optics]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=698</guid>
		<description><![CDATA[New subject for this post. Since my new position at the crm2 in Nancy, I am working with optics. We set up an experiment on interferences and optical diffraction for demonstration. I am looking for a free open source software to sketch them. Nothing dedicating to optics exists so I must use something else. I &#8230; </p><p><a class="more-link block-button" href="http://blog.debroglie.net/2012/04/24/using-geogebra-for-optics/">Continue reading &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>New subject for this post. Since my new position at the <a href="http://www.crystallography.fr">crm2</a> in Nancy, I am working with optics.</p>
<p>We set up an experiment on interferences and optical diffraction for demonstration. I am looking for a free open source software to sketch them. Nothing dedicating to optics exists so I must use something else. I have seen two options so far: a drawing software like inkscape or a geometry software like <a href="http://www.geogebra.org/cms/">geogebra</a>.</p>
<p>I tried geogebra for the diffraction of an optical disc. The main disavantage is taht there is no easy access to proportionality like in a refraction and must use equivalent geometry construction.</p>
<p>For example, the refraction law is : \(n_1 \sin(\theta_1) = n_2 \sin(\theta_2) \)</p>
<p>To draw it you need to use cirles to get access to the sine of the angle. It complicates the drawing a bit.<br />
<a href="http://blog.debroglie.net/wp-content/uploads/2012/04/refraction_construction1.png"><img src="http://blog.debroglie.net/wp-content/uploads/2012/04/refraction_construction1.png" alt="" title="refraction_construction" width="1005" height="624" class="aligncenter size-full wp-image-708" /></a></p>
<p>For teaching purpose, geogebra can be really nice because of its interactive drawing. You can introduce variables and allow points to be moved. It is also possible to export the drawing as a java applet to publish it on the Internet. I have done it for optical diffraction through a polycarbonate medium (i.e. a CD or DVD). Click on the picture below to open the applet.</p>
<p><a href="http://pascal.parois.net/public/cd-diffract/cd_diffract.html"><img src="http://blog.debroglie.net/wp-content/uploads/2012/04/cd-diffract.png" alt="" title="cd-diffract" width="844" height="674" class="aligncenter size-full wp-image-700" /></a></p>
<p>For an optical set up, it is difficult to see if it is efficient, there should not be too much of the problem above, most of the times, only reflections and transmissions are used. But the laser has a certain width and I don&#8217;t know if there is any easy way to handle focusing elements.</p>
<p>For a conclusion, I would say that geogebra or a similar software is worth to have a look for optics. It may not be useable for every situation but it can have some use.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2012/04/24/using-geogebra-for-optics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>High perfomance structure factors calculations</title>
		<link>http://blog.debroglie.net/2012/03/09/high-perfomance-structure-factors-calculations/</link>
		<comments>http://blog.debroglie.net/2012/03/09/high-perfomance-structure-factors-calculations/#comments</comments>
		<pubDate>Fri, 09 Mar 2012 13:13:47 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Pascal's diary]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=687</guid>
		<description><![CDATA[Part of a study on the standard deviation of the electron density using Monte Carlo simulations, I was needing a fast way to calculate structure factors. I achieve this goal a while ago but I continued to improve things each time when I got a new idea to try. I already talked about it a &#8230; </p><p><a class="more-link block-button" href="http://blog.debroglie.net/2012/03/09/high-perfomance-structure-factors-calculations/">Continue reading &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Part of a study on the standard deviation of the electron density using Monte Carlo simulations, I was needing a fast way to calculate structure factors. I achieve this goal a while ago but I continued to improve things each time when I got a new idea to try.</p>
<p>I already talked about it a few times:</p>
<ul>
<li><a href="http://blog.debroglie.net/2010/10/03/valgrind-and-kcachegrind-spot-bottlenecks/">How to spot bottlenecks</a></li>
<li><a href="http://blog.debroglie.net/2011/07/26/code-optimisation/">Code optimisation</a></li>
<li><a href="http://blog.debroglie.net/2011/10/25/cpu-starvation/">CPU starvation</a></li>
<li><a href="http://blog.debroglie.net/2011/10/28/loop-tiling/">Loop tiling</a></li>
<li><a href="http://blog.debroglie.net/2012/01/20/autovectorizatio/">Auto vectorization</a></li>
</ul>
<p>My algorithm is no longer memory bounded and the cpu is not starving anymore. Since I am calculating reflections by batches the memory access must be more efficient. This is a nice side effect of the modifications I have made. I checked it by running the program at 2 different cpu frequency and with different cores used. It all scale up nicely. I tried up to 4 threads, the limit of my actual cpu.</p>
<p>At this date, I am reaching 60-75M reflections*atoms per second and per core. On a 200 atoms structure with 100k reflections the result comes in less than 600ms. </p>
<p>The valgrind profile is shown below.<br />
<a href="http://blog.debroglie.net/wp-content/uploads/2012/03/libFc1.png"><img src="http://blog.debroglie.net/wp-content/uploads/2012/03/libFc1-1024x752.png" alt="" title="libFc" width="620" height="455" class="aligncenter size-large wp-image-692" /></a></p>
<p>Compared to cctbx it&#8217;s almos 4 times faster. However cctbx is much more versatile, that&#8217;s probably the reason it&#8217;s slower. This efficiency is only possible with optimised trigonometric and <a href="http://netlib.org/blas/">BLAS</a> functions. Without these, the calculations are a bit slower than cctbx. BLAS is only efficient on big vectors or matrices so I had to increase the number of reflections processed by batches from 128 to 1024.</p>
<p>This interesting thing about this batch process with BLAS functions is that it should be compatible for a GPU implementation. Though Fortran binding are not widely accessible yet and BLAS functions might not be as efficient as with the cpu. The number of reflections processed simultaneously would probably need to be increased. In the end it should be beneficial only on several hundred of thousands reflections. This range can only be found in protein crystallography.</p>
<p>It turns out that it can be useful to someone else and I am in the process of releasing only this part as a library to allow more people to play with it. This high level of perfomance relies on the AMD core mathematical library both for the vectorized trigonometric function (vrsa_sincos &#8211; 64bit only!) and the BLAS functions (SAXPY mainly). It should be possible to use mkl as well.</p>
<p>Actually the interface is very crude and uses several arrays. Object oriented data has not been implemented, data won&#8217;t be contigous anymore and would affect the efficiency. Difficult to say if the result would be very different or not.</p>
<p>The project is here:<br />
<a href="https://redmine.debroglie.net/projects/debroglie/repository/show/libFc">libFc repository</a></p>
<p>The code produce both single precision and double precision routines accessible via generic interfaces. Via conditional compilation it is possible to select the vector math library and switch to blas functions.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2012/03/09/high-perfomance-structure-factors-calculations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Autovectorization</title>
		<link>http://blog.debroglie.net/2012/01/20/autovectorizatio/</link>
		<comments>http://blog.debroglie.net/2012/01/20/autovectorizatio/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 23:55:33 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Pascal's diary]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=669</guid>
		<description><![CDATA[Vectorization is a nice concept in optimisation. It allows simulaneous calculations using special units in the processor. It is working only on special cases. I tried to apply to structure factors calculations. Fortran does not allow you to write vector code as you would do in C. You can only rely on autovectorization from the &#8230; </p><p><a class="more-link block-button" href="http://blog.debroglie.net/2012/01/20/autovectorizatio/">Continue reading &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Vectorization is a nice concept in optimisation. It allows simulaneous calculations using special units in the processor. It is working only on special cases. I tried to apply to structure factors calculations. Fortran does not allow you to write vector code as you would do in C. You can only rely on autovectorization from the compiler by exposing known patterns.</p>
<p>Actually, each structure factor is calculated one by one by each thread. I modified the function to process them by batch allowing the compiler to do a better job.</p>
<ol>
<li>Current version. 3 threads, structure factors calculated one by one.</li>
</ol>
<p><code><div id="wpshdo_1" class="wp-synhighlighter-outer"><div id="wpshdt_1" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_1"></a><a id="wpshat_1" class="wp-synhighlighter-title" href="#codesyntax_1"  onClick="javascript:wpsh_toggleBlock(1)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_1" onClick="javascript:wpsh_code(1)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_1" onClick="javascript:wpsh_print(1)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_1" class="wp-synhighlighter-inner" style="display: block;"><pre class="text" style="font-family:monospace;">[pascal@vinci ediff]$ perf stat -v -d  ../../trunk-r607/edensgrid --nopeaks  xu5015_shelxl
 Calculating structure factors
100.00 % done, remaining:   0:00min, elapsed:   0:01min, rate:   13.3 us/loop
101507 reflections processed in   1.4 s (201 atoms)
 Performance counter stats for '../../trunk-r607/edensgrid --nopeaks xu5015_shelxl':
&nbsp;
       7210,750407 task-clock                #    1,664 CPUs utilized
               926 context-switches          #    0,000 M/sec
                18 CPU-migrations            #    0,000 M/sec
             7 404 page-faults               #    0,001 M/sec
    20 294 956 191 cycles                    #    2,815 GHz                     [24,85%]
      stalled-cycles-frontend
      stalled-cycles-backend
    27 473 509 707 instructions              #    1,35  insns per cycle         [37,42%]
     4 759 056 611 branches                  #  659,995 M/sec                   [37,49%]
        96 886 241 branch-misses             #    2,04% of all branches         [37,76%]
     8 209 247 003 L1-dcache-loads           # 1138,473 M/sec                   [25,22%]
       166 461 565 L1-dcache-load-misses     #    2,03% of all L1-dcache hits   [25,24%]
        88 992 738 LLC-loads                 #   12,342 M/sec                   [25,09%]
           710 262 LLC-load-misses           #    0,80% of all LL-cache hits    [24,87%]
&nbsp;
       4,333275750 seconds time elapsed</pre></div></div><br />
</code></p>
<ol>
<li>New version. 3 threads, structure factors calculated by batches of 32 of them.</li>
</ol>
<p><code><div id="wpshdo_2" class="wp-synhighlighter-outer"><div id="wpshdt_2" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_2"></a><a id="wpshat_2" class="wp-synhighlighter-title" href="#codesyntax_2"  onClick="javascript:wpsh_toggleBlock(2)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_2" onClick="javascript:wpsh_code(2)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_2" onClick="javascript:wpsh_print(2)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_2" class="wp-synhighlighter-inner" style="display: block;"><pre class="text" style="font-family:monospace;">[pascal@vinci ediff]$ perf stat -v -d  ../../trunk/edensgrid --nopeaks  xu5015_shelxl
 Calculating structure factors
100.00 % done, remaining:   0:00min, elapsed:   0:01min, rate:   11.1 us/loop
101507 reflections processed in   1.1 s (201 atoms)
        3.6175E+07 reflections*atoms*s^-1
&nbsp;
 Performance counter stats for '../../trunk/edensgrid --nopeaks xu5015_shelxl':
&nbsp;
       6510,431679 task-clock                #    1,584 CPUs utilized
               904 context-switches          #    0,000 M/sec
                20 CPU-migrations            #    0,000 M/sec
             6 750 page-faults               #    0,001 M/sec
    18 147 024 815 cycles                    #    2,787 GHz                     [25,20%]
      stalled-cycles-frontend
      stalled-cycles-backend
    26 969 494 651 instructions              #    1,49  insns per cycle         [37,78%]
     4 751 540 537 branches                  #  729,835 M/sec                   [37,76%]
        76 868 617 branch-misses             #    1,62% of all branches         [37,75%]
     7 161 160 309 L1-dcache-loads           # 1099,952 M/sec                   [25,11%]
        36 037 316 L1-dcache-load-misses     #    0,50% of all L1-dcache hits   [25,07%]
        16 708 632 LLC-loads                 #    2,566 M/sec                   [24,88%]
           568 664 LLC-load-misses           #    3,40% of all LL-cache hits    [24,99%]
&nbsp;
       4,110610561 seconds time elapsed</pre></div></div><br />
</code></p>
<p>The result is an increase of around 15-25% in speed. L1 data cache misses dropped a lot and I have more instructions per cycle executed. The pressure is pushed back on the L2 cache. Not all portions have been vectorized as I would have expect, some more insight using the flag -ftree-vectorizer-verbose will help.</p>
<p>Best results have been obtained with batches of 16 or 32 structure factors.</p>
<p>Result from the compiler analysis confirmed that more parts have been vectorized. However, the same code compiled without vectorization is still 10% faster. Meaning that a bit more is going on.</p>
<p>Before:<br />
<code><div id="wpshdo_3" class="wp-synhighlighter-outer"><div id="wpshdt_3" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_3"></a><a id="wpshat_3" class="wp-synhighlighter-title" href="#codesyntax_3"  onClick="javascript:wpsh_toggleBlock(3)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_3" onClick="javascript:wpsh_code(3)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_3" onClick="javascript:wpsh_print(3)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_3" class="wp-synhighlighter-inner" style="display: block;"><pre class="text" style="font-family:monospace;">modules/functions.f90:922: note: LOOP VECTORIZED.
modules/functions.f90:922: note: LOOP VECTORIZED.
modules/functions.f90:905: note: vectorized 2 loops in function.</pre></div></div><br />
</code></p>
<p>After:<br />
<code><div id="wpshdo_4" class="wp-synhighlighter-outer"><div id="wpshdt_4" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_4"></a><a id="wpshat_4" class="wp-synhighlighter-title" href="#codesyntax_4"  onClick="javascript:wpsh_toggleBlock(4)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_4" onClick="javascript:wpsh_code(4)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_4" onClick="javascript:wpsh_print(4)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_4" class="wp-synhighlighter-inner" style="display: block;"><pre class="text" style="font-family:monospace;">modules/functions.f90:852: note: LOOP VECTORIZED.
modules/functions.f90:842: note: LOOP VECTORIZED.
modules/functions.f90:834: note: LOOP VECTORIZED.
modules/functions.f90:839: note: LOOP VECTORIZED.
modules/functions.f90:825: note: LOOP VECTORIZED.
modules/functions.f90:827: note: LOOP VECTORIZED.
modules/functions.f90:782: note: vectorized 6 loops in function.
&nbsp;
edensgrid.F90:226: note: LOOP VECTORIZED.
edensgrid.F90:226: note: LOOP VECTORIZED.
edensgrid.F90:207: note: vectorized 2 loops in function.</pre></div></div><br />
</code></p>
<p>I am now able to process nearly 50 millions reflections*atoms per second. On a structure with about 200 atoms that&#8217;s more than 100k reflections per seconds. The results I obtained are very close to the ones obtained by Vincent Favre-Nicolin (<a href="http://arxiv.org/pdf/1010.2641">pdf</a>) but I am using a much more complicated formula suitable for small molecule crystallography.</p>
<p>I don&#8217;t know if anyone would need such a fast calculation but they can always get in touch with me.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2012/01/20/autovectorizatio/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Platon nightly bug</title>
		<link>http://blog.debroglie.net/2012/01/04/platon-nightly-bug/</link>
		<comments>http://blog.debroglie.net/2012/01/04/platon-nightly-bug/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 10:40:29 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Centos]]></category>
		<category><![CDATA[Fedora]]></category>
		<category><![CDATA[platon]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=666</guid>
		<description><![CDATA[A bug happened on the last December release of platon-nightly which was dated from 2012 instead of 2011. I forced the update (I increase the epoch tag) for the January release so that it will updated as usual.]]></description>
			<content:encoded><![CDATA[<p>A bug happened on the last December release of platon-nightly which was dated from 2012 instead of 2011. I forced the update (I increase the epoch tag) for the January release so that it will updated as usual.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2012/01/04/platon-nightly-bug/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Social integration &#8211; comments</title>
		<link>http://blog.debroglie.net/2011/12/13/social-integration-comments/</link>
		<comments>http://blog.debroglie.net/2011/12/13/social-integration-comments/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 09:48:48 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Debroglie]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=662</guid>
		<description><![CDATA[I changed a few things in the settings of the blog. You have to be logged in to comment. No worry, you don&#8217;t need to create an account on Debroglie you can just login via facebook, twitter, google or linkedin. You will see an icon on the login page.]]></description>
			<content:encoded><![CDATA[<p>I changed a few things in the settings of the blog. You have to be logged in to comment. No worry, you don&#8217;t need to create an account on Debroglie you can just login via facebook, twitter, google or linkedin. You will see an icon on the login page.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2011/12/13/social-integration-comments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plotting libraries</title>
		<link>http://blog.debroglie.net/2011/12/10/plotting-libraries/</link>
		<comments>http://blog.debroglie.net/2011/12/10/plotting-libraries/#comments</comments>
		<pubDate>Sat, 10 Dec 2011 12:29:47 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Pascal's diary]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=640</guid>
		<description><![CDATA[Small reveiw and feelings about a few graphical libraries: <a href="http://www.gnuplot.info/">gnuplot</a>, <a href="http://matplotlib.sourceforge.net/">matplotlib</a>, <a href="http://www.astro.caltech.edu/~tjp/pgplot/">pgplot</a>, <a href="http://plplot.sourceforge.net/">plplot</a>, <a href="http://www.dislin.de/">dislin</a> and <a href="http://www.r-project.org/">R</a>.]]></description>
			<content:encoded><![CDATA[<p>I have edited this post on the 14th of December 2011. I added a few things in matplolib and plplot section.</p>
<p>There are numerous choices to plot diagrams. Depending of the functionality you need and the language you are developing in, the choice might be difficult.</p>
<p>I have found six possible libraries: <a href="http://www.gnuplot.info/">gnuplot</a>, <a href="http://matplotlib.sourceforge.net/">matplotlib</a>, <a href="http://www.astro.caltech.edu/~tjp/pgplot/">pgplot</a>, <a href="http://plplot.sourceforge.net/">plplot</a>, <a href="http://www.dislin.de/">dislin</a> and <a href="http://www.r-project.org/">R</a>. Each of them have different requirements. Matplotlib is dedicated to python while plplot have numerous bindings. Their capabilities are also different, gnuplot does not seem to superimpose graphs very easily. Licensing is also important dislin and pgplot have the most restrictive license: it is free to use only for non-commercial application. In each libraries, except R and dislin, I tried to draw a Fourier Map (picture) superimposed by a contour plot.</p>
<p>My data are stored in a 2D array. It seems that this cause some troubles to some libraries.</p>
<h3>gnuplot</h3>
<p>I could not find how to superimpose a contour plot. I only get an image. Also, I don&#8217;t know if I can apply a transformation to the coordinates in order to map the array indices to real coordinates.</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/12/test.png"><img class="aligncenter size-full wp-image-641" title="test" src="http://blog.debroglie.net/wp-content/uploads/2011/12/test.png" alt="" width="420" height="320" /></a></p>
<h3>matplotlib</h3>
<p>Matplotlib can only be used in python. The combination with python makes it very easy to try quickly a few thing. The documentation is well written although not very consistent in its conventions.</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/12/test-matplotlib1.png"><img class="aligncenter size-full wp-image-659" title="test-matplotlib" src="http://blog.debroglie.net/wp-content/uploads/2011/12/test-matplotlib1.png" alt="" width="800" height="600" /></a></p>
<div id="wpshdo_5" class="wp-synhighlighter-outer"><div id="wpshdt_5" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_5"></a><a id="wpshat_5" class="wp-synhighlighter-title" href="#codesyntax_5"  onClick="javascript:wpsh_toggleBlock(5)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_5" onClick="javascript:wpsh_code(5)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_5" onClick="javascript:wpsh_print(5)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_5" class="wp-synhighlighter-inner" style="display: block;"><pre class="python" style="font-family:monospace;"><span class="co1">#!/bin/python</span>
&nbsp;
<span class="kw1">import</span> numpy
<span class="kw1">import</span> matplotlib.<span class="me1">pyplot</span> <span class="kw1">as</span> plt
<span class="kw1">from</span> pylab <span class="kw1">import</span> <span class="sy0">*</span>
&nbsp;
data=numpy.<span class="me1">loadtxt</span><span class="br0">&#40;</span><span class="st0">&quot;data.matrix&quot;</span>, dtype=float32<span class="br0">&#41;</span>
cdict = <span class="br0">&#123;</span>
<span class="st0">'red'</span>  :  <span class="br0">&#40;</span><span class="br0">&#40;</span>0., 1., 1.<span class="br0">&#41;</span>, <span class="br0">&#40;</span><span class="nu0">0.5</span>, <span class="nu0">1.0</span>, <span class="nu0">1.0</span><span class="br0">&#41;</span>, <span class="br0">&#40;</span>1., <span class="nu0">0.4</span>, <span class="nu0">0.4</span><span class="br0">&#41;</span><span class="br0">&#41;</span>,
<span class="st0">'green'</span>:  <span class="br0">&#40;</span><span class="br0">&#40;</span>0., <span class="nu0">0.4</span>, <span class="nu0">0.4</span><span class="br0">&#41;</span>, <span class="br0">&#40;</span><span class="nu0">0.5</span>, <span class="nu0">1.0</span>, <span class="nu0">1.0</span><span class="br0">&#41;</span>, <span class="br0">&#40;</span>1., 1., 1.<span class="br0">&#41;</span><span class="br0">&#41;</span>,
<span class="st0">'blue'</span> :  <span class="br0">&#40;</span><span class="br0">&#40;</span>0., <span class="nu0">0.4</span>, <span class="nu0">0.4</span><span class="br0">&#41;</span>, <span class="br0">&#40;</span><span class="nu0">0.5</span>, <span class="nu0">1.0</span>, <span class="nu0">1.0</span><span class="br0">&#41;</span>, <span class="br0">&#40;</span>1., <span class="nu0">0.4</span>, <span class="nu0">0.4</span><span class="br0">&#41;</span><span class="br0">&#41;</span>
<span class="br0">&#125;</span>
my_cmap = matplotlib.<span class="me1">colors</span>.<span class="me1">LinearSegmentedColormap</span><span class="br0">&#40;</span><span class="st0">'my_colormap'</span>, cdict, <span class="nu0">1024</span><span class="br0">&#41;</span>
&nbsp;
imshow<span class="br0">&#40;</span>data<span class="br0">&#91;</span><span class="nu0">0</span>:<span class="nu0">577</span>,:<span class="br0">&#93;</span>, cmap=my_cmap<span class="br0">&#41;</span>
contour<span class="br0">&#40;</span>data<span class="br0">&#91;</span><span class="nu0">0</span>:<span class="nu0">577</span>,:<span class="br0">&#93;</span>,  colors=<span class="st0">'#00aa00'</span>, levels=<span class="br0">&#40;</span><span class="nu0">0.1</span>,<span class="nu0">0.2</span>,<span class="nu0">0.3</span>,<span class="nu0">0.4</span>,<span class="nu0">0.5</span><span class="br0">&#41;</span><span class="br0">&#41;</span>
contour<span class="br0">&#40;</span>data<span class="br0">&#91;</span><span class="nu0">0</span>:<span class="nu0">577</span>,:<span class="br0">&#93;</span>,  colors=<span class="st0">'blue'</span>, levels=<span class="br0">&#91;</span><span class="nu0">0.0</span><span class="br0">&#93;</span><span class="br0">&#41;</span>
contour<span class="br0">&#40;</span>data<span class="br0">&#91;</span><span class="nu0">0</span>:<span class="nu0">577</span>,:<span class="br0">&#93;</span>,  colors=<span class="st0">'#aa0000'</span>, linestyles=<span class="st0">'dashed'</span>, levels=<span class="br0">&#40;</span>-<span class="nu0">0.1</span>,-<span class="nu0">0.2</span>,-<span class="nu0">0.3</span>,-<span class="nu0">0.4</span>,-<span class="nu0">0.5</span><span class="br0">&#41;</span><span class="br0">&#41;</span>
show<span class="br0">&#40;</span><span class="br0">&#41;</span></pre></div></div>
<h3>pgplot</h3>
<p>PGPLOT is used with low level languages Fortran and C. It&#8217;s an advantage or a disadvantage depending on what you want. My program is written in Fortran, so it&#8217;s good. However it makes the programming more difficult. The two main disadvantages are the license which I am not sure is compatible with the GPL and updates: the last version was released in 2001. I manage to get exactly what I wanted with this library, the documentation is really good. The only problem I have in about the layout: the resolution is limited and hard-coded, the placement of objects is troublesome.</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/12/m104i.png"><img class="aligncenter size-full wp-image-645" title="m104i" src="http://blog.debroglie.net/wp-content/uploads/2011/12/m104i.png" alt="" width="850" height="680" /></a></p>
<h3>plplot</h3>
<p>On the paper, this one my favourite. It&#8217;s LGPL, there are bindings for all kinds of languages including Fortran and python, it is actively developed with several commits every month and there are a large choices of output formats. I have two major problems: the documentation is really not sufficient and there are bugs (<a title="plplot bug" href="http://sourceforge.net/tracker/?func=detail&amp;aid=3450518&amp;group_id=2915&amp;atid=102915" target="_blank">contour plot bug</a>). However, with a few tricks I managed to get a working example.</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/12/m104i4.png"><img src="http://blog.debroglie.net/wp-content/uploads/2011/12/m104i4.png" alt="" title="m104i-plplot" width="720" height="540" class="aligncenter size-full wp-image-664" /></a></p>
<p>There is also a <a href="http://sourceforge.net/tracker/?func=detail&#038;aid=3458200&#038;group_id=2915&#038;atid=102915">bug</a> in the linear gradient function in RGB. The workaround is to used HLS space.<br />
There is no function to draw a circle, I used a polygon instead.</p>
<p>Although plplot looks promising, there are serious issues to fix.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2011/12/10/plotting-libraries/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Loop tiling</title>
		<link>http://blog.debroglie.net/2011/10/28/loop-tiling/</link>
		<comments>http://blog.debroglie.net/2011/10/28/loop-tiling/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 08:57:21 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Pascal's diary]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=633</guid>
		<description><![CDATA[This post is the continuation of the previous one on: CPU starvation I needed to rewrite a part of the code and it was a good opportunity to change the algorithm of the variance calculation. I was using the naive implementation: the first pass is the calculation of the mean, the second pass the calculation &#8230; </p><p><a class="more-link block-button" href="http://blog.debroglie.net/2011/10/28/loop-tiling/">Continue reading &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>This post is the continuation of the previous one on: <a title="CPU starvation" href="http://blog.debroglie.net/2011/10/25/cpu-starvation/">CPU starvation</a></p>
<p>I needed to rewrite a part of the code and it was a good opportunity to change the algorithm of the variance calculation. I was using the naive implementation: the first pass is the calculation of the mean, the second pass the calculation of the variance. Because I cannot store the data, they were calculated twice. Yes, doing twice more Fourier transforms is not efficient&#8230;</p>
<p>I found an <a href="http://en.wikipedia.org/wiki/Online_algorithm">online algorithm</a> that can gives me both the mean and variance at the same in one pass (<a href="http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance">online variance calculation</a>). The result is of course a big improvement but the program was still suffering from heavy starvation:<br />
<div id="wpshdo_6" class="wp-synhighlighter-outer"><div id="wpshdt_6" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_6"></a><a id="wpshat_6" class="wp-synhighlighter-title" href="#codesyntax_6"  onClick="javascript:wpsh_toggleBlock(6)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_6" onClick="javascript:wpsh_code(6)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_6" onClick="javascript:wpsh_print(6)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_6" class="wp-synhighlighter-inner" style="display: block;"><pre class="fortran" style="font-family:monospace;"><span class="kw1">if</span><span class="br0">&#40;</span>locali<span class="sy0">/=</span>1<span class="br0">&#41;</span> <span class="kw1">then</span>
    delta<span class="sy0">=</span>fftinout<span class="sy0">-</span>average<span class="sy0">/</span><span class="br0">&#40;</span>locali<span class="sy0">-</span>1<span class="br0">&#41;</span>
<span class="kw1">else</span>
    delta<span class="sy0">=</span>fftinout
<span class="kw1">end</span> <span class="kw1">if</span>
average<span class="sy0">=</span>average<span class="sy0">+</span>fftinout
sigmas <span class="sy0">=</span> sigmas <span class="sy0">+</span> delta<span class="sy0">*</span><span class="br0">&#40;</span>fftinout <span class="sy0">-</span> average<span class="sy0">/</span>locali<span class="br0">&#41;</span></pre></div></div></p>
<p>The problem here is that data are reused and do not stay in cache. delta, fftinout, average and sigmas are big 3D-arrays. When delta is computed, it does not fit into the cache, fftinout is send away to make room which is just used a line below and have to be pulled back from RAM&#8230;</p>
<p>What I did is doing some manual tiling. Usually the compiler do it automatically but in this case, gfortran was not smart enough even with -floop-block -floop-strip-mine. The result is that each stride is staying in cache which avoid moving data back and forth:<br />
<div id="wpshdo_7" class="wp-synhighlighter-outer"><div id="wpshdt_7" class="wp-synhighlighter-expanded"><table border="0" width="100%"><tr><td align="left" width="80%"><a name="#codesyntax_7"></a><a id="wpshat_7" class="wp-synhighlighter-title" href="#codesyntax_7"  onClick="javascript:wpsh_toggleBlock(7)" title="Click to show/hide code block">Source code</a></td><td align="right"><a href="#codesyntax_7" onClick="javascript:wpsh_code(7)" title="Show code only"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/code.png" /></a>&nbsp;<a href="#codesyntax_7" onClick="javascript:wpsh_print(7)" title="Print code"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/printer.png" /></a>&nbsp;<a href="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/About.html" target="_blank" title="Show plugin information"><img border="0" style="border: 0 none" src="http://blog.debroglie.net/wp-content/plugins/wp-synhighlight/themes/default/images/info.gif" /></a>&nbsp;</td></tr></table></div><div id="wpshdi_7" class="wp-synhighlighter-inner" style="display: block;"><pre class="fortran" style="font-family:monospace;"><span class="kw1">do</span> kk<span class="sy0">=</span>1,<span class="kw4">ubound</span><span class="br0">&#40;</span>fftinout,3<span class="br0">&#41;</span>
    <span class="kw1">do</span> jj<span class="sy0">=</span>1,<span class="kw4">ubound</span><span class="br0">&#40;</span>fftinout,2<span class="br0">&#41;</span>
        delta<span class="sy0">=</span>fftinout<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span><span class="sy0">-</span>tempfact<span class="sy0">*</span>average<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span>
        average<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span><span class="sy0">=</span>average<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span><span class="sy0">+</span>fftinout<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span>
        sigmas<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span> <span class="sy0">=</span> sigmas<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span> <span class="sy0">+</span>delta<span class="sy0">*&amp;</span>
        <span class="sy0">&amp;</span>   <span class="br0">&#40;</span>fftinout<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span> <span class="sy0">-</span> average<span class="br0">&#40;</span><span class="sy0">:</span>,jj,kk<span class="br0">&#41;</span><span class="sy0">/</span>locali<span class="br0">&#41;</span>
    <span class="kw1">end</span> <span class="kw1">do</span>
<span class="kw1">end</span> <span class="kw1">do</span></pre></div></div></p>
<p>The result is amazing, I had a 20% speedup just with this change. You just need to be smarter than the compiler. And I save a bit of memory, delta is just a stride now.</p>
<p>New result on my 2.7GHz, 4 cores computer with 4GB@800MHz memory:</p>
<table>
<tr>
<td>cores</td>
<td> real(s)</td>
<td> user(s)</td>
<td> loop(ms)</td>
</tr>
<tr>
<td>4</td>
<td> 64</td>
<td> 233</td>
<td> 19(76)</td>
</tr>
<tr>
<td>3 </td>
<td>72</td>
<td> 204</td>
<td> 21.7(65.1)</td>
</tr>
<tr>
<td>2 </td>
<td>93</td>
<td> 180</td>
<td> 28.7(57.4)</td>
</tr>
<tr>
<td>1</td>
<td>  171 </td>
<td> 171</td>
<td>  55</td>
</tr>
</table>
<p>Previous results were for half the work, I was skipping the average. The current code is now faster than half the job in the previous version.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2011/10/28/loop-tiling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CPU starvation</title>
		<link>http://blog.debroglie.net/2011/10/25/cpu-starvation/</link>
		<comments>http://blog.debroglie.net/2011/10/25/cpu-starvation/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 21:35:00 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Pascal's diary]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=603</guid>
		<description><![CDATA[A few weeks ago, I discovered that one of my programs has some cpu starvation issues. A cpu is starved when is waiting for the data. The most common cases are due high load on the disk but it can happen that even the RAM can be too slow. As I am just using a &#8230; </p><p><a class="more-link block-button" href="http://blog.debroglie.net/2011/10/25/cpu-starvation/">Continue reading &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, I discovered that one of my programs has some cpu starvation issues. A cpu is starved when is waiting for the data. The most common cases are due high load on the disk but it can happen that even the RAM can be too slow. As I am just using a few MB on a 4GB system, that could not be the case.</p>
<p>My first hint was that the computing speed was not scaling up with the number of cores used on the newest version of the program. An old version did not have these problem. I even tested on 48 cores. But the new version version is much more efficient. Here is the results of the program running on a different numbers of cores.</p>
<p>Cores: number of cores used<br />
real: real clock time in seconds<br />
user: user clock time in seconds<br />
loop: average time to execute a cycle of a do loop in my program in ms. In parenthesis, the one core equivalent loop time</p>
<table>
<tr>
<td>cores</td>
<td> real</td>
<td> user</td>
<td> loop</td>
</tr>
<tr>
<td>4</td>
<td> 70</td>
<td> 246</td>
<td> 22(88)</td>
</tr>
<tr>
<td>3 </td>
<td>77</td>
<td> 207</td>
<td> 24.3(72.9)</td>
</tr>
<tr>
<td>2 </td>
<td>90</td>
<td> 171</td>
<td> 28.3(56.6)</td>
</tr>
<tr>
<td>1</td>
<td>  146 </td>
<td> 146</td>
<td>  47.3</td>
</tr>
</table>
<p>Of course these results alone are not enough, problems from parallelism execution can have many sources. These results were obtained (home computer) on a intel q9505 (2800MHz) cpu equipped with 667MHz DDR2 memory (dual channel).</p>
<p>I have a similar computer at work: a intel q9400 (2666MHz) cpu equipped with 800MHz DDR2 memory (dual channel). The fact is my code is running slower at home (ie with the faster cpu) than at work. This was clearly the indication that my cpu was not running at full speed.</p>
<p>After searching a bit, I found two tools that would help me:</p>
<ul>
<li>perf, from perf-util (<a href="http://www.kernel.org/">http://www.kernel.org/)</a></li>
<li>oprofile (<a href="http://oprofile.sourceforge.net/news/">http://oprofile.sourceforge.net/news/</a>)</li>
</ul>
<p>Here is the result of the command <code># perf top</code>:</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/10/perftop.png"><img src="http://blog.debroglie.net/wp-content/uploads/2011/10/perftop-1024x601.png" alt="" title="perftop" width="590" height="346" class="aligncenter size-large wp-image-610" /></a></p>
<p>As already noted in previous posts (<a href="http://blog.debroglie.net/2011/07/26/code-optimisation/">http://blog.debroglie.net/2011/07/26/code-optimisation/</a>), complex exponentials are quite numerous and can be seen here as sincos. They represent 6.7% of the time. The clear_page_c function is more worrying, it is a kernel function related to the control of memory. A web search did not reveal any more information.</p>
<p>The command <code># perf stat -p 425 -v -d</code> is given more detailed informations about the program execution (425 was the pid of the program).</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/10/perfstat2.png"><img src="http://blog.debroglie.net/wp-content/uploads/2011/10/perfstat2-1024x437.png" alt="" title="perfstat" width="590" height="251" class="aligncenter size-large wp-image-614" /></a></p>
<p>It revealed that the LLC-load-misses is 9.89%. It means that 10% of the time, the cpu won&#8217;t find any data in the L2 cache forcing him to wait for the data to come from the ram memory. I also run perf when the program is running on a different number of cores. In the case above, the 4 cores on my cpu were used. The missed cache rate was increasing with the number of cores (openmp has an environment variable: OMP_NUM_THREADS to do this). The workload was exactly the same but by doing simultaneously different jobs, the stress on the memory got more and more important. It would be interesting to run the same test on the same architecture but with a q9500 cpu. It has 12MB of L2 cache instead of 6MB for the q9505.</p>
<p>The next tool is oprofile. Similar to perf, it is based on recent kernel features about tracing performance. It comes with a nice gui where you can choose the parameter to follow, I choose the &#8220;LLc_*&#8221; events. The gui is launched via <code># oprof_start gui</code>.</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/10/oprofile1.png"><img src="http://blog.debroglie.net/wp-content/uploads/2011/10/oprofile1.png" alt="" title="oprofile" width="662" height="482" class="aligncenter size-full wp-image-617" /></a></p>
<p>The opreport command gives you the results, honestly quite difficult to interpret. But I still manage to see that two &#8220;functions&#8221; got missed cache: the main program edensgrid (I really should put the cpu intensive part in its own subroutine&#8230;) and libfftw3f.</p>
<p>I feed the result into kcachegrind using the command: <code>$ opreport -gdf  | op2calltree</code>. It creates a bunch of files, each one can loaded into kcachegrind. I loaded the one I was interested in: oprof.out.edensgrid. No need to look to the fftw3 one, this library is highly optimised, there is no room to improvement here. When compiled with the debugging symbols and with the source code available, kcachegrind will link the results to the source code.</p>
<p><a href="http://blog.debroglie.net/wp-content/uploads/2011/10/kcachegrind-oprofile.png"><img src="http://blog.debroglie.net/wp-content/uploads/2011/10/kcachegrind-oprofile-1024x600.png" alt="" title="kcachegrind-oprofile" width="590" height="345" class="aligncenter size-large wp-image-620" /></a></p>
<p>The locations of missed cache are: <code>sigmas=sigmas+(fftinout-datadiff)**2</code> and the initialisation of the array for a minor part: <code>fftinout=0.0_fftkind</code> with the inversion <code>fftinout=-fftinout</code>.</p>
<p>About the solution, it&#8217;s another story&#8230; I have at the moment no idea. I can probably work out a solution for the fftinout assignation by tweaking the fillfftarray subroutine. I guess I can remove the zero initalisation and the inversion. The sigmas business would need a better understanding of the cpu internals to find a better way to calculate if a solution exist. </p>
<p>A note about oprofile and perf vs valgrind:<br />
I think that valgrind can also give this kind of information but as everything is virtualised, it&#8217;s slow and I am not sure if openmp code goes well with it. oprofile and perf have very little overhead and just probe the program while it&#8217;s running at full speed without any artefacts.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2011/10/25/cpu-starvation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Platon-nightly</title>
		<link>http://blog.debroglie.net/2011/09/20/platon-nightly-2/</link>
		<comments>http://blog.debroglie.net/2011/09/20/platon-nightly-2/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 14:46:21 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Centos]]></category>
		<category><![CDATA[Fedora]]></category>
		<category><![CDATA[platon]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=594</guid>
		<description><![CDATA[The automatic build is broken since a few days, a yum update broke a few things. It should be fixed by now.]]></description>
			<content:encoded><![CDATA[<p>The automatic build is broken since a few days, a yum update broke a few things. It should be fixed by now.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2011/09/20/platon-nightly-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>tonto-chem update</title>
		<link>http://blog.debroglie.net/2011/09/20/tonto-chem-update/</link>
		<comments>http://blog.debroglie.net/2011/09/20/tonto-chem-update/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 14:44:25 +0000</pubDate>
		<dc:creator>Pascal</dc:creator>
				<category><![CDATA[Centos]]></category>
		<category><![CDATA[Fedora]]></category>
		<category><![CDATA[tonto-chem]]></category>

		<guid isPermaLink="false">http://blog.debroglie.net/?p=592</guid>
		<description><![CDATA[New rpms of tonto-chem for centos 6, fedora 13-15. Centos 5 is missing, I have some troubles. There are builds for the serial version and mpi versions using openmpi, mpich2 or mvapich2. All versions can be installed simulatenously and a bash script has been added to launch the software loading the correct modules.]]></description>
			<content:encoded><![CDATA[<p>New rpms of tonto-chem for centos 6, fedora 13-15. Centos 5 is missing, I have some troubles.</p>
<p>There are builds for the serial version and mpi versions using openmpi, mpich2 or mvapich2. All versions can be installed simulatenously and a bash script has been added to launch the software loading the correct modules.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.debroglie.net/2011/09/20/tonto-chem-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Served from: blog.debroglie.net @ 2012-05-20 20:34:17 by W3 Total Cache -->
