Performance
Some numbers
There are pretty much two ps2gl performance bottlenecks over which the
application has control: dma transfer and vu1 rendering. See the tips section
below for good ways to make the dma transfer your bottleneck. Otherwise, we're
down to the speed of the vu1 renderer being used. Currently there are a measly
ten different renderers, one of which is selected based on the rendering state
at render time.
A lot of work has gone into making it very easy to add specialized renderers,
which means that at this point less time has been spent actually writing those
renderers. Ideally, users of ps2gl would write renderers for the set of
conditions they care about and contribute them back to the project. Otherwise,
after a little more work on the architecture, vu1 renderer development will be
once again underway here at SCEA.
Here some numbers for the 30,000 tri ViewPoint head model (again, where dma
is not the bottleneck). It's easy to see that there is one fast renderer which
is specialized for no specular component and 3 or fewer directional lights.
NOTE: again, performance will get better soon
render context/state |
million vertices/second |
3 directional lights, no specular |
14 |
4 directional lights, no specular |
5 |
4 directional lights, specular |
2.9 |
Tips
What YOU can do to make things go faster!
use display lists -- display lists have been optimized at the expense of
immediate-mode. The main problem with them now is inefficient use of memory
when used to cache glBegin/glEnd draw commands, which brings us to..
use DrawArrays - memory is almost allocated efficiently (at least it's
loosely related to the size of the input data..) and there's no copying
when rendering a model, group each of [vertices, normals, tex coords, colors]
contiguously in memory.
For example:
< all vertices >
< all normals >
< all tex coords >
NOT:
< vertex0, normal0, texCoord0 >
< vertex1, normal1, texCoord1 >
...
for geometry that changes frequently we have a problem. The
DrawArrays call and the creation of display lists take a fair amount of
time so we don't want to be doing it every frame. Furthermore, if only the
values of vertices and normals are changing (and not the topology), like
with a skinned model, we shouldn't need to rebuild the display list since
the data is passed by reference. It would be nice if we could just create
one display list that contains calls to DrawArrays pointing at our data,
and then change the data behind the display list's back. But according to
the documentation, glDrawArrays only mostly references the array
data, i.e., some data does get copied.
Fear not, for all hope is not lost. The only time the display list will
copy any data is when it needs to transfer elements that start on a
non-qword-aligned boundary. That means that if all your vertices, normals,
tex coords, and colors are either 2 or 4 floats everything should be
aligned correctly and nothing will be copied. (It's useful to note at this
point that the "w" field of all vertices is implicitly forced to 1.0f, so
it doesn't matter what is actually written to that field in memory.) The
only hitch in this plan is that glNormalPointer implicitly sets the length
of normals to be 3 elements. For this reason ps2gl has a new call
'pglNormalPointer' that allows you to specify the length of the normals, as
in glVertexPointer.
So to render geometry that's changing frequently, here's the plan:
- Allocate memory for the data starting on a qword boundary (malloc/new).
- Store vertices as (xyz?), tex coords as (uv), and normals as (xyz?).
- Create a display list and render with glDrawArrays.
- Now the data can be modified and glCallList will still render it correctly.
Note that when glDrawElements is finally implemented, that will probably be
the method of choice for rendering in situations like this.
|