Performance

Some numbers

There are pretty much two ps2gl performance bottlenecks over which the application has control: dma transfer and vu1 rendering. See the tips section below for good ways to make the dma transfer your bottleneck. Otherwise, we're down to the speed of the vu1 renderer being used. Currently there are a measly ten different renderers, one of which is selected based on the rendering state at render time.

A lot of work has gone into making it very easy to add specialized renderers, which means that at this point less time has been spent actually writing those renderers. Ideally, users of ps2gl would write renderers for the set of conditions they care about and contribute them back to the project. Otherwise, after a little more work on the architecture, vu1 renderer development will be once again underway here at SCEA.

Here some numbers for the 30,000 tri ViewPoint head model (again, where dma is not the bottleneck). It's easy to see that there is one fast renderer which is specialized for no specular component and 3 or fewer directional lights.

NOTE: again, performance will get better soon

render context/state million vertices/second
3 directional lights, no specular 14
4 directional lights, no specular 5
4 directional lights, specular 2.9

Tips

What YOU can do to make things go faster!

  • use display lists -- display lists have been optimized at the expense of immediate-mode. The main problem with them now is inefficient use of memory when used to cache glBegin/glEnd draw commands, which brings us to..

  • use DrawArrays - memory is almost allocated efficiently (at least it's loosely related to the size of the input data..) and there's no copying

  • when rendering a model, group each of [vertices, normals, tex coords, colors] contiguously in memory.

    For example:

    < all vertices >
    < all normals >
    < all tex coords >

    NOT:

    < vertex0, normal0, texCoord0 >
    < vertex1, normal1, texCoord1 >
    ...

  • for geometry that changes frequently we have a problem. The DrawArrays call and the creation of display lists take a fair amount of time so we don't want to be doing it every frame. Furthermore, if only the values of vertices and normals are changing (and not the topology), like with a skinned model, we shouldn't need to rebuild the display list since the data is passed by reference. It would be nice if we could just create one display list that contains calls to DrawArrays pointing at our data, and then change the data behind the display list's back. But according to the documentation, glDrawArrays only mostly references the array data, i.e., some data does get copied.

    Fear not, for all hope is not lost. The only time the display list will copy any data is when it needs to transfer elements that start on a non-qword-aligned boundary. That means that if all your vertices, normals, tex coords, and colors are either 2 or 4 floats everything should be aligned correctly and nothing will be copied. (It's useful to note at this point that the "w" field of all vertices is implicitly forced to 1.0f, so it doesn't matter what is actually written to that field in memory.) The only hitch in this plan is that glNormalPointer implicitly sets the length of normals to be 3 elements. For this reason ps2gl has a new call 'pglNormalPointer' that allows you to specify the length of the normals, as in glVertexPointer.

    So to render geometry that's changing frequently, here's the plan:

    1. Allocate memory for the data starting on a qword boundary (malloc/new).
    2. Store vertices as (xyz?), tex coords as (uv), and normals as (xyz?).
    3. Create a display list and render with glDrawArrays.
    4. Now the data can be modified and glCallList will still render it correctly.

    Note that when glDrawElements is finally implemented, that will probably be the method of choice for rendering in situations like this.