Multi CPU Results
Chart of a 4 CPU Alpha EV68 with 1 GHz running Tru64
Chart of a 4 CPU Opteron with 2 GHz running Linux
Chart of a 4 CPU Xeon with 3 GHz running Linux
Chart of a MIPS R12000 with 300 MHz running Irix
Chart of a 16 CPU Itanium II with 1.6 GHz running Linux
Conclusion
In all diagrams you can see a huge perfomance gain when useing more than one thread. For the quad CPUs, the minimal time is as small as 30% to 25% (1/3 to 1/4) of the time without threads (the point on the y-axis). The minimal time for the 8 CPU MIPS is about 15% (less than 1/6) of the first measured time and the optimal value of the 16 CPU Itanium II is about 10% (1/10) of the initial value.The quite different results for the 5 runs on the Alpha and the Opteron are a sign that other computational intensive tasks have been done one two systems, so you can't trust this two results. The 8 CPU MIPS shows an almost ideal scaling behaviour in its chart (also because the diagram of the Itanium II has a too big scale on the y-axis).
So all in all: no CPU scales ideal, which would mean that the optimal value would be 1/4 for the quad CPU systems, 1/8 for the 8 CPU system and 1/16 for the 16 CPU system. But the performance gain when using more threads is HUGE.