Efficient SSBO Data Streaming

Started by
0 comments, last by iradicator 4 years ago

I’m upgrading my OpenGL engine to work in larger batches.

For dynamic object instanced rendering, I upload “packed” (32 byte) individual transform data through SSBO. I then run a shader to unpack the transform data and to do matrix multiplications and finally I’m using instance rendering to draw (I know I can do better with indirect draw calls and through more aggressive packing… these would be my next steps)

What I found out through experimentation is that the bottleneck is CPU->GPU SSBO submission. I watched the talks by John McDonald and Cass Everitt and I’m using coherent persistent SSBO triple buffering to send data down, which supposed to be the fastest way to go.

Now, obviously I’ll divide data into static and dynamic content and stream only the dynamic data each frame. But I’m trying to get an estimate of the amount of objects I can render using an efficient. I found out through experimentation that I’m able to send down roughly ~65k objects that are being rendered in 2 instanced draw calls without dropping frames (targeting 60 fps). That means, that this naïve implementation is sending roughly 65,536*32*60 B/sec = 120 MB/sec to the GPU.

These results seem too low. I was wondering if folks can share real-life numbers (and strategies) for reasonable amount of dynamic objects that can be submitted and rendered in each frame when targeting 60fps.

Another way to think about it: How big would you expect a particle system to be, when the simulation is driven by CPU and streamed to GPU for rendering?

Thanks!

2 draw calls with ~65 dynamic objects @ 60 fps

This topic is closed to new replies.

Advertisement