tag:blogger.com,1999:blog-46536932527824004372024-03-13T01:18:06.222-03:00claudio canepa - prog (en)programming related blog, mostly python language, game libraries, cocos, free open softwareclaudio canepahttp://www.blogger.com/profile/07607460284839405974noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-4653693252782400437.post-92104152355633370022010-11-28T19:51:00.006-03:002010-11-30T02:17:09.848-03:00Faster cocos sprites<a href="http://progcc.blogspot.com/2010/11/mejorando-codigo.html" style="background-color: #eeeeee;">versión en castellano</a><br />
<br />
In the last blog entry we seen that cocos 0.4.0 sprites were more slow than the pyglet 1.1.4 ones. Lets see if cocos code can be improved to close the performance gap.<br />
<br />
<b>Approaching the problem</b><br />
<br />
Python includes the cProfile module which allows to measure how much time is spent in each callable, so I added a few lines in the scripts to generate profile data.<br />
<br />
Running the scripts produces a lot of statistics; to explore these I used <a href="http://www.vrplumber.com/programming/runsnakerun/">RunSnakeRun</a>, a visualizer for cProfile stats. It represents the call tree as nested boxes, with box area proportional to time spent in the callable.<br />
<br />
Playing a bit with the visualization options, I got two images that seems interesting; combined and annotated they look as:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-lOWVgsnVgtsaoyv_WncI6vZnmVZJWtti1_1_vy9ou7VBKJ8MxdILTauLidfBrxZjxiuvhgTzYrgRX2hifJt-sRykuqfUlIeeCxLF3tkqd3PPeBJgzbxTF8GAPo_vxEHPOmLoJNc1pgY/s1600/combo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-lOWVgsnVgtsaoyv_WncI6vZnmVZJWtti1_1_vy9ou7VBKJ8MxdILTauLidfBrxZjxiuvhgTzYrgRX2hifJt-sRykuqfUlIeeCxLF3tkqd3PPeBJgzbxTF8GAPo_vxEHPOmLoJNc1pgY/s1600/combo.png" /></a></div><br />
<a name='more'></a><br />
What grabs a little my attention is the area covered by lambdas and _set_position in cocosnode.py <br />
<br />
<b>From where the lambdas come from ?</b><br />
<br />
CocosNode objects have a number of python properties, like x, y, position, rotation and scale. Using w as a generic property name , the lambda appareance corresponds to the usage pattern<br />
<ul><li>define methods _get_w and _set_w</li>
<li>define the property by<br />
<script class="brush: js" type="syntaxhighlighter">
<![CDATA[
w = property(_get_w, lambda self,p:self._set_w(p))]]>
</script></li>
</ul>The reason behind that usage relates to inheritance:<br />
<ul><li>without the lambda, if a subclass redefines _set_w, the property will not use the new version except if rebinded explicitly by a w = property(...)</li>
<li>with the lambda, if a subclass redefines _set_w, the property will use the new definition</li>
</ul>Well, can it be that the overhead in lambdas account for the slowness ?<br />
Time to change the code and see what happens.<br />
<br />
<b>First try</b><br />
<br />
After branching the repo I begun to eliminate one or other lambdas; some changes gave slightly better performance, some others bigger gains but the balls stop moving.<br />
The quantity of code is not too big, but the indirections are a bit too much to follow.<br />
<br />
To understand better what should be happening, I rewrote the cocos Sprite methods eliminating the indirections and code duplications.<br />
The test script works fine, and performance is par with pyglet sprites.<br />
<br />
<b>Second try</b><br />
<br />
The first solution is fast but dirty: with the code consolidation we lose the separation of concerns between classes; any change in the base classes should be manually copied to the Sprite class.<br />
<br />
From the different stages in the previous experiment, seems that the performance loss coming from lambda use is small, and that the problem can come from a double call to some not so light method.<br />
What is needed here is to know exactly what methods are called.<br />
<br />
<b>sys.settrace to the rescue</b><br />
<br />
Calling sys.settrace(my_spy_function) in a script will instruct the python interpreter to call my_spy_function whenever some events happen, like 'call', 'ret', 'line'; it will also pass a frame object with info about the current execution state.<br />
<br />
Making my_spy_function filter events and synthesize some useful information we can get a clean picture of the call tree, restricted to the calls of interest.<br />
<br />
This <a href="http://www.dalkescientific.com/writings/diary/archive/2005/04/20/tracing_python_code.html">article</a> helped a lot to write my custom spy function.<br />
<br />
Spying with settrace, a pair of suspicious lines are confirmed to double call; when corrected the balls stops moving, but the trace tells which method was missing a call. Now it is easy to fix.<br />
<br />
Measuring the performance gives:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk7crlrYsp4QrPbOJYsK7PeKBINwd9r6u20axBo3oms9cOhX6lUCWeDkKtzSc0C86IFFCDaZ5VOVzZrg5_M5ilGmM1jgEqvgtZD65C-8Qt6crFF_5xSDFxPyFai-0xMLDMAPaoYO4dDSo/s1600/spriterel.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk7crlrYsp4QrPbOJYsK7PeKBINwd9r6u20axBo3oms9cOhX6lUCWeDkKtzSc0C86IFFCDaZ5VOVzZrg5_M5ilGmM1jgEqvgtZD65C-8Qt6crFF_5xSDFxPyFai-0xMLDMAPaoYO4dDSo/s1600/spriterel.png" /></a></div><br />
The clean solution has 90% of pyglet performance, fairly better than the 60% of cocos 0.4.0claudio canepahttp://www.blogger.com/profile/07607460284839405974noreply@blogger.com0tag:blogger.com,1999:blog-4653693252782400437.post-77451114376456152382010-11-11T15:21:00.001-03:002010-11-30T02:18:06.641-03:00sprite performance in cocos and pyglet<a href="http://progcc.blogspot.com/2010/11/performance-de-sprites-con-cocos-y.html" style="background-color: #eeeeee;">versión en castellano</a><br />
<br />
How many sprites can I show with acceptable frames per second (fps) ?<br />
What can I do for better performance ?<br />
I will set up a test situation, measure and see what can be learned.<br />
<br />
<b>Test situation, in pseudocode:</b><br />
<ul><li>initial state: add n balls with nearly random position and velocity; initial position into the screen rectangle</li>
<li>update(dt): at each frame, each ball updates position with the classical<br />
<script class="brush: py" type="syntaxhighlighter">
<![CDATA[
pos = pos + dt*vel
]]>
</script>if the ball touches the border, it bounces so as to not go off-screen.</li>
</ul><div style="text-align: center;"><b>screenshot</b></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga-nXA355h8TNTS2CKpV7-QKHNfByOz8D9mrXiEmk8jI2sYULwWOYYVTRPtdlqGnmH5dOucbd-5qPUUsh2ErXIv282rFUOXOvCh9w-84Yn_GVBFV2DsR9fgfqciWXEPvoGOQ6xbw0tpl8/s320/screenshot.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEga-nXA355h8TNTS2CKpV7-QKHNfByOz8D9mrXiEmk8jI2sYULwWOYYVTRPtdlqGnmH5dOucbd-5qPUUsh2ErXIv282rFUOXOvCh9w-84Yn_GVBFV2DsR9fgfqciWXEPvoGOQ6xbw0tpl8/s320/screenshot.jpg" /></a></div><ul></ul><br />
<b>First implementation:</b><br />
<br />
The simpler possible, the scene will have a sprite list, to draw the scene the list is iterated drawing each sprite with its own draw method.<br />
After coding and runs I got the data<br />
<script class="brush: py" type="syntaxhighlighter">
<![CDATA[
sprite_num = [ 1000, 500, 250, 100, 50]
# performance in frames por second, fps
cocos_no_batch = [ 2.54, 5.77, 12.10, 29.70, 56.70]
pyglet_no_batch = [ 2.95, 6.33, 13.21, 32.50, 62.10]
]]>
</script><br />
Using the interactive interpreter <a href="http://dreampie.sourceforge.net/">Dreampie</a> and <a href="http://matplotlib.sourceforge.net/">matplotlib</a> to plot the data:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6TMsxboHzc8cqciQDfugSisadW4HUMT1BAVw_f4iN-HPv43JInwoa4vVpEberXzTckssgfRNijk6VBWhYQd00dTDJX1rKRZRsuTWna9keNLux3OqxmUcqzI54gzRgk4xThbbCrNJYetw/s320/figure1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6TMsxboHzc8cqciQDfugSisadW4HUMT1BAVw_f4iN-HPv43JInwoa4vVpEberXzTckssgfRNijk6VBWhYQd00dTDJX1rKRZRsuTWna9keNLux3OqxmUcqzI54gzRgk4xThbbCrNJYetw/s320/figure1.png" /></a></div><br />
<b>Ouchhh! We have problems ?</b><br />
<br />
As a rough guide, visuals are smooth at 60 fps. Playability varies, fast paced games or the ones that need very precise coordination can need 60, others can be acceptable at 20 fps.<br />
So, say 30 fps is our minimum acceptable fps, and 60 is our fancy desired fps.<br />
Looking at the test data, we should limit the quantity of sprites in our game to<br />
<ul><li>100 sprites to get the minimum acceptable fps</li>
<li> 50 sprites to get the fancy desired fps</li>
</ul>That looks scarce, right ?<br />
The tutorials and mail lists for cocos and pyglet tells that using batch s gives better performance, so lets try with that.<br />
<br />
<b>Second implementation:</b><br />
<br />
Here we essentially add the sprites into a batch object, the scene is draw by the batch's draw method.<br />
Coding and running the new implementation we get the data:<br />
<script class="brush: py" type="syntaxhighlighter">
<![CDATA[
sprite_num = [ 1000, 500, 250, 100, 50]
# performance frames per second, fps
pyglet_batched = [30.40, 57.00, 104.24, 203.00, 299.80]
cocos_batched = [17.66, 33.04, 63.00, 131.50, 207.70]
]]>
</script><br />
Ploting like before fps vs sprite quantity:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPXUGPPt6wv8bpLaz2wXTIvQYXEwyFR6hRdDa-EJv_TYe1tnghx271edXVPrEOSEeAfhvq_8TgWQkX6PJErIY4diH7MOpTeGDU_gTrc37sZ7QM11EL4anY0jEVMwIGqn2txl9xf7Kaz7w/s320/figure2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPXUGPPt6wv8bpLaz2wXTIvQYXEwyFR6hRdDa-EJv_TYe1tnghx271edXVPrEOSEeAfhvq_8TgWQkX6PJErIY4diH7MOpTeGDU_gTrc37sZ7QM11EL4anY0jEVMwIGqn2txl9xf7Kaz7w/s320/figure2.png" /></a></div>Much better:<br />
<ul><li>With pyglet, for minimum acceptable fps we can use up to 1000 sprites; for fanciness up to 500.</li>
<li>With cocos, for minimum acceptable fps we can use up to 500 sprites; for fanciness up to 250.</li>
</ul><br />
<b>On hold!</b><br />
<br />
cocos sprites are a sublcass of pyglet sprites, why they perform poorly relative to pyglet ?<br />
Being a cocos comitter make me itch to investigate the issue, and hopefully fix it.<br />
<br />
So I will put on hold this blog entry and follow with one talking about the process of investigating and fixing the issue. <br />
<br />
There are more interesting things to test and tell about sprites and batches, and I will revisit later.<br />
<br />
<b>Test conditions:</b><br />
<ul><li>cocos 0.4.0 release , pyglet 1.1.4 release</li>
<li>windows xp , python 2.6.5</li>
<li>amd athlon 64 x2 5200, integated gpu ati 3200</li>
</ul><br />
Code and media used in this test can be downloaded <a href="http://sites.google.com/site/xcocos/files/sprite_performance_1.zip?attredirects=0&d=1">here</a>.claudio canepahttp://www.blogger.com/profile/07607460284839405974noreply@blogger.com0