Got some things done.
Wrapped up the optimization of text rendering
I continued to formalize the rendering of text. The initial DX rendering via DrawPrimitiveUP() has been changed to a DrawIndexedPrimitive() call. Since the index buffer is the same for every line of text this reduced the vertex writes from six to four per character.
The idea of the rendering is to fill a buffer with vertices computed from the font metrics and then draw the whole text in one draw call. Because the letters are spread out all over the font texture it is not possible to use anything but triangle lists.
Triangle lists requires six vertices per quad/character with two being duplicates. Using a constant index buffer with the size of six times the text length one can remove the two duplicate vertices that are updated for each new text to draw.
With the in-game debug overlay turned on the text drawing still ate ~45% of the CPU time, most of it spent in the DX runtime. I guess this is not unexpected because with the profiler tree being drawn each frame we are talking over 12000 characters.
In the end I decided to remove the profiler output from the debug overlay and added the option to dump it to a report file instead. It had outgrown its usefulness anyway as the tree didn't fit on screen on my lil' netbook.
Better looking text
So with the core drawing of text wrapped up and tested, I have begun adding more variants of fonts. Right now there is a small-sized monospace system font for debug info and a medium interface font for user interaction.
The screens show the new font in action. I used GIMP to add a border and some tones. Very easy once I had the core image from BMFont.
The gameplay screen only uses the new font in the kill-count panel (and it is too big to fit, I know.)
Some improvements to Shiny Profiler
For some time I have been tweaking Shiny here and there to suit my needs. Things like sorting of result on self time on output to more easily see where most time is spent.
My latest modification was to be able to compute a span of minimum and maximum time spent in a node along with the average (that is the default output).
The original routine uses a geometrical average to saturate the mean value towards the current over several iterations, based on a weight. My addition checks the current node timing if it is outside the span of min and max and expands the span if it is. If it is inside the span however it lets the min and max saturate towards the current, at a rate ten times slower than the saturation of the mean.
This way the min/max span slowly shrinks when too large and ends up as a good measurement of how the node time has varied over the past frames.
The report now looks like this:
flat profile hits (min-max) self time (min-max) total time (min-max) 0.0 ( 0.0- 0.0) 19 ms ( 18 ms- 19 ms) 1665% (1628%-1671%) 20 ms ( 20 ms- 20 ms) 1765% (1723%-1805%)cn::Application::doFrame 1.0 ( 1.0- 1.0) 78 us ( 44 us- 465 us) 7% ( 4%- 41%) 1 ms ( 1 ms- 2 ms) 100% ( 95%- 137%)cn::SceneManager::updateByFrame 1.0 ( 1.0- 1.0) 523 ns ( 0 ns- 2 us) 0% ( 0%- 0%) 194 us ( 183 us- 207 us) 17% ( 16%- 18%)cn::SceneManager::doDrawCalls 1.0 ( 1.0- 1.0) 177 ns ( 38 ns- 469 ns) 0% ( 0%- 0%) 47 us ( 46 us- 52 us) 4% ( 4%- 5%)cn::D3D9Device::addSprite 51.0 ( 50.7- 52.3) 19 us ( 16 us- 21 us) 2% ( 1%- 2%) 19 us ( 16 us- 21 us) 2% ( 1%- 2%)cn::D3D9Device::addText 4.0 ( 4.0- 4.0) 2 us ( 2 us- 3 us) 0% ( 0%- 0%) 2 us ( 2 us- 3 us) 0% ( 0%- 0%)cn::D3D9Device::commit 1.0 ( 1.0- 1.0) 68 us ( 64 us- 75 us) 6% ( 6%- 7%) 656 us ( 643 us- 674 us) 58% ( 57%- 60%)cn::D3D9Device::drawSprite 51.0 ( 50.7- 52.3) 10 us ( 9 us- 12 us) 1% ( 1%- 1%) 103 us ( 98 us- 108 us) 9% ( 9%- 10%)cn::D3D9Device::getResource 55.0 ( 54.7- 56.3) 16 us ( 14 us- 18 us) 1% ( 1%- 2%) 16 us ( 14 us- 18 us) 1% ( 1%- 2%)cn::D3D9Device::drawSprite 51.0 ( 50.7- 52.3) 76 us ( 74 us- 80 us) 7% ( 7%- 7%) 78 us ( 75 us- 83 us) 7% ( 7%- 7%)cn::D3D9SpriteResource::Load 4.0 ( 4.0- 4.0) 138 ns ( 38 ns- 523 ns) 0% ( 0%- 0%) 138 ns ( 38 ns- 523 ns) 0% ( 0%- 0%)cn::D3D9Device::setBlendMode 124.8 ( 124.2- 126.9) 6 us ( 4 us- 8 us) 1% ( 0%- 1%) 6 us ( 4 us- 8 us) 1% ( 0%- 1%)cn::D3D9Device::drawText 4.0 ( 4.0- 4.0) 309 us ( 299 us- 324 us) 27% ( 26%- 29%) 352 us ( 340 us- 369 us) 31% ( 30%- 33%)cn::BitmapFont::prepareVertices 4.0 ( 4.0- 4.0) 42 us ( 40 us- 44 us) 4% ( 4%- 4%) 42 us ( 40 us- 44 us) 4% ( 4%- 4%)cn::EntityData::set 0.4 ( 0.0- 1.5) 257 ns ( 0 ns- 994 ns) 0% ( 0%- 0%) 257 ns ( 0 ns- 994 ns) 0% ( 0%- 0%)cn::EntityData::set 36.6 ( 36.4- 37.9) 20 us ( 19 us- 21 us) 2% ( 2%- 2%) 20 us ( 19 us- 21 us) 2% ( 2%- 2%)cn::GameplayScene::updateByFrame 1.0 ( 1.0- 1.0) 30 us ( 22 us- 34 us) 3% ( 2%- 3%) 193 us ( 179 us- 217 us) 17% ( 16%- 19%)cn::ComponentManager<class cn::Gun>::updateByFrame 1.0 ( 1.0- 1.0) 2 us ( 427 ns- 9 us) 0% ( 0%- 1%) 2 us ( 427 ns- 11 us) 0% ( 0%- 1%)cn::ComponentManager<class cn::Obstacle>::updateByFra 1.0 ( 1.0- 1.0) 7 us ( 6 us- 10 us) 1% ( 0%- 1%) 59 us ( 55 us- 86 us) 5% ( 5%- 8%)cn::ObstacleGroup::testCollisions 5.0 ( 5.0- 5.0) 5 us ( 4 us- 8 us) 0% ( 0%- 1%) 52 us ( 49 us- 79 us) 5% ( 4%- 7%)cn::CircleCollider::apply 35.6 ( 35.0- 54.0) 34 us ( 33 us- 53 us) 3% ( 3%- 5%) 48 us ( 44 us- 76 us) 4% ( 4%- 7%)cn::EntityData::get 107.5 ( 106.3- 145.0) 20 us ( 18 us- 29 us) 2% ( 2%- 3%) 20 us ( 18 us- 29 us) 2% ( 2%- 3%)cn::ComponentManager<class cn::Vehicle>::updateByFram 1.0 ( 1.0- 1.0) 20 us ( 19 us- 21 us) 2% ( 2%- 2%) 99 us ( 98 us- 103 us) 9% ( 9%- 9%)cn::Vehicle::updateByFrame 36.4 ( 36.2- 37.2) 39 us ( 36 us- 40 us) 3% ( 3%- 4%) 80 us ( 75 us- 83 us) 7% ( 7%- 7%)cn::Seek::getVelocity 5.0 ( 5.0- 5.0) 946 ns ( 427 ns- 1 us) 0% ( 0%- 0%) 946 ns ( 427 ns- 1 us) 0% ( 0%- 0%)cn::Align::getOmega 1.0 ( 1.0- 1.0) 373 ns ( 38 ns- 427 ns) 0% ( 0%- 0%) 373 ns ( 38 ns- 427 ns) 0% ( 0%- 0%)cn::Facing::getOmega 13.0 ( 13.0- 13.0) 3 us ( 3 us- 4 us) 0% ( 0%- 0%) 3 us ( 3 us- 4 us) 0% ( 0%- 0%)cn::Separation::getVelocity 4.0 ( 4.0- 4.0) 7 us ( 7 us- 7 us) 1% ( 1%- 1%) 7 us ( 7 us- 7 us) 1% ( 1%- 1%)cn::Cohesion::getVelocity 4.0 ( 4.0- 4.0) 3 us ( 3 us- 3 us) 0% ( 0%- 0%) 3 us ( 3 us- 3 us) 0% ( 0%- 0%)cn::ComponentManager<class cn::Emitter>::updateByFram 1.0 ( 1.0- 1.0) 3 us ( 2 us- 4 us) 0% ( 0%- 0%) 3 us ( 2 us- 4 us) 0% ( 0%- 0%)cn::GameplayScene::doDrawCalls 1.0 ( 1.0- 1.0) 4 us ( 3 us- 5 us) 0% ( 0%- 0%) 47 us ( 45 us- 52 us) 4% ( 4%- 5%)cn::D3D9Device::addLine 1.0 ( 1.0- 1.0) 8 us ( 8 us- 10 us) 1% ( 1%- 1%) 8 us ( 8 us- 10 us) 1% ( 1%- 1%)cn::ComponentManager<class cn::Sprite>::doDrawCalls 1.0 ( 1.0- 1.0) 14 us ( 13 us- 16 us) 1% ( 1%- 1%) 30 us ( 26 us- 33 us) 3% ( 2%- 3%)cn::ComponentManager<class cn::Emitter>::doDrawCalls 1.0 ( 1.0- 1.0) 197 ns ( 0 ns- 322 ns) 0% ( 0%- 0%) 751 ns ( 0 ns- 1 us) 0% ( 0%- 0%)cn::ComponentManager<class cn::HudElement>::doDrawCal 1.0 ( 1.0- 1.0) 2 us ( 1 us- 2 us) 0% ( 0%- 0%) 3 us ( 2 us- 4 us) 0% ( 0%- 0%)cn::D3D9Device::drawLine 73.8 ( 73.5- 75.0) 129 us ( 128 us- 132 us) 11% ( 11%- 12%) 133 us ( 131 us- 137 us) 12% ( 12%- 12%)cn::SceneManager::doDebugDrawCalls 1.0 ( 1.0- 1.0) 41 us ( 39 us- 45 us) 4% ( 3%- 4%) 157 us ( 153 us- 163 us) 14% ( 14%- 14%)cn::Vehicle::doDebugDrawCalls 36.4 ( 36.2- 37.0) 54 us ( 53 us- 58 us) 5% ( 5%- 5%) 115 us ( 112 us- 121 us) 10% ( 10%- 11%)data_fetch 36.4 ( 36.2- 37.0) 6 us ( 6 us- 7 us) 1% ( 1%- 1%) 14 us ( 12 us- 16 us) 1% ( 1%- 1%)cn::EntityData::get 36.4 ( 36.2- 37.0) 7 us ( 6 us- 8 us) 1% ( 1%- 1%) 7 us ( 6 us- 8 us) 1% ( 1%- 1%)cn::D3D9Device::addLine 72.8 ( 72.5- 74.0) 47 us ( 45 us- 49 us) 4% ( 4%- 4%) 47 us ( 45 us- 49 us) 4% ( 4%- 4%)call tree hits (min-max) self time (min-max) total time (min-max) 0.0 ( 0.0- 0.0) 19 ms ( 18 ms- 19 ms) 1665% (1628%-1671%) 20 ms ( 20 ms- 20 ms) 1765% (1723%-1805%) cn::Application::doFrame 1.0 ( 1.0- 1.0) 78 us ( 44 us- 465 us) 7% ( 4%- 41%) 1 ms ( 1 ms- 2 ms) 100% ( 95%- 137%) cn::SceneManager::updateByFrame 1.0 ( 1.0- 1.0) 523 ns ( 0 ns- 2 us) 0% ( 0%- 0%) 194 us ( 183 us- 207 us) 17% ( 16%- 18%) cn::GameplayScene::updateByFrame 1.0 ( 1.0- 1.0) 30 us ( 22 us- 34 us) 3% ( 2%- 3%) 193 us ( 179 us- 217 us) 17% ( 16%- 19%) cn::EntityData::set 0.2 ( 0.0- 1.1) 160 ns ( 0 ns- 969 ns) 0% ( 0%- 0%) 160 ns ( 0 ns- 969 ns) 0% ( 0%- 0%) cn::EntityData::set 0.1 ( 0.0- 0.6) 64 ns ( 0 ns- 272 ns) 0% ( 0%- 0%) 64 ns ( 0 ns- 272 ns) 0% ( 0%- 0%) cn::ComponentManager<class cn::Gun>::updateBy 1.0 ( 1.0- 1.0) 2 us ( 427 ns- 9 us) 0% ( 0%- 1%) 2 us ( 427 ns- 11 us) 0% ( 0%- 1%) cn::EntityData::set 0.2 ( 0.0- 1.4) 97 ns ( 0 ns- 907 ns) 0% ( 0%- 0%) 97 ns ( 0 ns- 907 ns) 0% ( 0%- 0%) cn::EntityData::set 0.1 ( 0.0- 0.7) 51 ns ( 0 ns- 454 ns) 0% ( 0%- 0%) 51 ns ( 0 ns- 454 ns) 0% ( 0%- 0%) cn::ComponentManager<class cn::Obstacle>::upd 1.0 ( 1.0- 1.0) 7 us ( 6 us- 10 us) 1% ( 0%- 1%) 59 us ( 55 us- 86 us) 5% ( 5%- 8%) cn::ObstacleGroup::testCollisions 5.0 ( 5.0- 5.0) 5 us ( 4 us- 8 us) 0% ( 0%- 1%) 52 us ( 49 us- 79 us) 5% ( 4%- 7%) cn::CircleCollider::apply 35.6 ( 35.0- 54.0) 34 us ( 33 us- 53 us) 3% ( 3%- 5%) 48 us ( 44 us- 76 us) 4% ( 4%- 7%) cn::EntityData::get 71.1 ( 70.1- 108.0) 14 us ( 12 us- 22 us) 1% ( 1%- 2%) 14 us ( 12 us- 22 us) 1% ( 1%- 2%) cn::ComponentManager<class cn::Vehicle>::upda 1.0 ( 1.0- 1.0) 20 us ( 19 us- 21 us) 2% ( 2%- 2%) 99 us ( 98 us- 103 us) 9% ( 9%- 9%) cn::Vehicle::updateByFrame 36.4 ( 36.2- 37.2) 39 us ( 36 us- 40 us) 3% ( 3%- 4%) 80 us ( 75 us- 83 us) 7% ( 7%- 7%) cn::Seek::getVelocity 5.0 ( 5.0- 5.0) 946 ns ( 427 ns- 1 us) 0% ( 0%- 0%) 946 ns ( 427 ns- 1 us) 0% ( 0%- 0%) cn::Align::getOmega 1.0 ( 1.0- 1.0) 373 ns ( 38 ns- 427 ns) 0% ( 0%- 0%) 373 ns ( 38 ns- 427 ns) 0% ( 0%- 0%) cn::EntityData::set 36.4 ( 36.2- 37.2) 20 us ( 19 us- 21 us) 2% ( 2%- 2%) 20 us ( 19 us- 21 us) 2% ( 2%- 2%) cn::EntityData::get 36.4 ( 36.2- 37.2) 7 us ( 6 us- 8 us) 1% ( 0%- 1%) 7 us ( 6 us- 8 us) 1% ( 0%- 1%) cn::Facing::getOmega 13.0 ( 13.0- 13.0) 3 us ( 3 us- 4 us) 0% ( 0%- 0%) 3 us ( 3 us- 4 us) 0% ( 0%- 0%) cn::Separation::getVelocity 4.0 ( 4.0- 4.0) 7 us ( 7 us- 7 us) 1% ( 1%- 1%) 7 us ( 7 us- 7 us) 1% ( 1%- 1%) cn::Cohesion::getVelocity 4.0 ( 4.0- 4.0) 3 us ( 3 us- 3 us) 0% ( 0%- 0%) 3 us ( 3 us- 3 us) 0% ( 0%- 0%) cn::ComponentManager<class cn::Emitter>::upda 1.0 ( 1.0- 1.0) 3 us ( 2 us- 4 us) 0% ( 0%- 0%) 3 us ( 2 us- 4 us) 0% ( 0%- 0%) cn::SceneManager::doDrawCalls 1.0 ( 1.0- 1.0) 177 ns ( 38 ns- 469 ns) 0% ( 0%- 0%) 47 us ( 46 us- 52 us) 4% ( 4%- 5%) cn::D3D9Device::addSprite 0.0 ( 0.0- 0.0) 0 ns ( 0 ns- 0 ns) 0% ( 0%- 0%) 0 ns ( 0 ns- 0 ns) 0% ( 0%- 0%) cn::D3D9Device::addText 0.0 ( 0.0- 0.0) 0 ns ( 0 ns- 0 ns) 0% ( 0%- 0%) 0 ns ( 0 ns- 0 ns) 0% ( 0%- 0%) cn::GameplayScene::doDrawCalls 1.0 ( 1.0- 1.0) 4 us ( 3 us- 5 us) 0% ( 0%- 0%) 47 us ( 45 us- 52 us) 4% ( 4%- 5%) cn::D3D9Device::addLine 1.0 ( 1.0- 1.0) 8 us ( 8 us- 10 us) 1% ( 1%- 1%) 8 us ( 8 us- 10 us) 1% ( 1%- 1%) cn::ComponentManager<class cn::Sprite>::doDra 1.0 ( 1.0- 1.0) 14 us ( 13 us- 16 us) 1% ( 1%- 1%) 30 us ( 26 us- 33 us) 3% ( 2%- 3%) cn::D3D9Device::addSprite 44.4 ( 44.2- 45.0) 16 us ( 13 us- 17 us) 1% ( 1%- 1%) 16 us ( 13 us- 17 us) 1% ( 1%- 1%) cn::ComponentManager<class cn::Emitter>::doDr 1.0 ( 1.0- 1.0) 197 ns ( 0 ns- 322 ns) 0% ( 0%- 0%) 751 ns ( 0 ns- 1 us) 0% ( 0%- 0%) cn::D3D9Device::addSprite 0.6 ( 0.0- 1.4) 554 ns ( 0 ns- 1 us) 0% ( 0%- 0%) 554 ns ( 0 ns- 1 us) 0% ( 0%- 0%) cn::ComponentManager<class cn::HudElement>::d 1.0 ( 1.0- 1.0) 2 us ( 1 us- 2 us) 0% ( 0%- 0%) 3 us ( 2 us- 4 us) 0% ( 0%- 0%) cn::D3D9Device::addSprite 2.0 ( 2.0- 2.0) 1 us ( 771 ns- 2 us) 0% ( 0%- 0%) 1 us ( 771 ns- 2 us) 0% ( 0%- 0%) cn::D3D9Device::addSprite 4.0 ( 4.0- 4.0) 2 us ( 1 us- 2 us) 0% ( 0%- 0%) 2 us ( 1 us- 2 us) 0% ( 0%- 0%) cn::D3D9Device::addText 1.0 ( 1.0- 1.0) 335 ns ( 38 ns- 427 ns) 0% ( 0%- 0%) 335 ns ( 38 ns- 427 ns) 0% ( 0%- 0%) cn::D3D9Device::commit 1.0 ( 1.0- 1.0) 68 us ( 64 us- 75 us) 6% ( 6%- 7%) 656 us ( 643 us- 674 us) 58% ( 57%- 60%) cn::D3D9Device::drawSprite 51.0 ( 50.7- 52.3) 10 us ( 9 us- 12 us) 1% ( 1%- 1%) 103 us ( 98 us- 108 us) 9% ( 9%- 10%) cn::D3D9Device::getResource 51.0 ( 50.7- 52.3) 15 us ( 12 us- 17 us) 1% ( 1%- 2%) 15 us ( 12 us- 17 us) 1% ( 1%- 2%) cn::D3D9Device::drawSprite 51.0 ( 50.7- 52