Original Link: https://www.anandtech.com/show/592



If you had asked us last November to recommend a video card that had support for FSAA, the one and only option would have been to wait for the Voodoo4/5 from 3dfx. It wasn't until just before the official launch of the GeForce2 GTS that we realized NVIDIA had included support for FSAA in their latest Detonator drivers and in spite of ATI's feelings that FSAA isn't the way of the future, the Radeon out of the box has support for FSAA. Now that 3dfx isn't the only game in town, the next question is which of the three major players has the best looking and the best performing FSAA solution. In an attempt to help you decide the answer to this question on your own (since image quality is a very subjective topic) we've put together a comparison guide to help you notice the differences, if any, that exist between the FSAA solutions provided by 3dfx, ATI and NVIDIA.

Hardware vs. Software

One of the most confusing things about the various methods of implementing FSAA in a video card is the debate over whether the feature is implemented in "hardware" as a feature of the chip or in "software," meaning that it is a function of the drivers alone and can be enabled on any card that the drivers support.

The main thing to understand here is that regardless of whether FSAA is supported in "hardware" through 3dfx's T-Buffer or in "software" through the NVIDIA Detonator drivers, it currently takes the same performance hit. If you're implementing a 2 sample FSAA algorithm, you're going to have effectively 1/2 the fill rate at your disposal since you're rendering twice as many pixels. This applies to all of the cards we're talking about today, the Voodoo4/5, the Radeon and the GeForce/GeForce2 MX/GeForce2 GTS.

Samples

The second thing to keep in mind is a very simple principle, but it is commonly misunderstood when talking about FSAA performance. As we just finished pointing out, regardless of whether you're talking about a Voodoo5, a Radeon or a GeForce2 GTS, if you make any one of those cards render twice as many pixels, it's going to effectively have 1/2 the fill rate.

3dfx's 2 sample FSAA offers the same theoretical performance hit as NVIDIA's 2 sample FSAA since in both cases we're decreasing the available fill rate by 50% by rendering twice as many pixels. While it is true that the Voodoo5 and the GeForce2 GTS perform differently when their respective 2 sample FSAA modes are enabled, that is not because one card is "faster" at FSAA than another, it's simply because the two cards do perform differently.

Now that we've gotten that out of the way, let's move onto the various forms of FSAA offered by the three manufacturers.



3dfx

3dfx's Voodoo5 5500 makes use of its T-Buffer to achieve its FSAA effects. The T-Buffer, in theory, is very similar to what is known as an Accumulation Buffer and it works by allowing the VSA-100 chips to render multiple frames at once and blend them together before outputting them to the screen.

3dfx's FSAA works by using the T-Buffer's ability to blend multiple frames together in a single frame, but shifted around slightly, thus helping to get rid of any jagged edges that are present in a given scene.

The Voodoo5 5500 offers two FSAA settings, a two sample and a four sample setting. By definition, 3dfx's two sample setting renders the current frame twice before blending and outputting the frame and their four sample setting renders the current frame four times, hence the four samples.

Because of the fact that enabling FSAA is making the VSA-100 render either 2 or 4 times as many pixels (2 or 4 sample FSAA), the Voodoo5 5500's fill rate is either cut in half or into one fourth of what it normally would be. Theoretically, this reduction of fill rate would put the Voodoo5 5500 at either 333 Mpixels/s or 166 Mpixels/s depending on the number of samples selected.

Fill Rate Comparison
 
FSAA Disabled
2 Sample FSAA
4 Sample FSAA
3dfx Voodoo5 5500
667 MPixels/s
667 MTexels/s
333 MPixels/s
333 MTexels/s
166 MPixels/s
166 MTexels/s
3dfx Voodoo3 3500
183 MPixels/s
366 MTexels/s
NVIDIA TNT2 Ultra
300 MPixels/s
300 MTexels/s

 

As you can see by the above chart, enabling 2 sample FSAA brings the Voodoo5 5500's theoretical fill rate down to the level of a Voodoo3 3500, though almost twice as fast in single textured situations. For a better performance comparison, a Voodoo5 5500 with 2 Sample FSAA enabled has an 11% higher fill rate than a TNT2 Ultra.

Enabling 4 sample FSAA brings the fill rate of the card down to the level of a Voodoo3 3000 in single textured games and down below the level of a Voodoo2 in multi-textured games, in theory of course.



To compare 3dfx's various FSAA levels we turn to Need for Speed: Porsche Unleashed. We took three screenshots all at 640 x 480 x 32, one with FSAA disabled, one with 2 Sample FSAA enabled and the final shot with 4 Sample FSAA enabled. We then took the middle 1/3 of each screen shot and produced the comparison below.

The things to look for are:

1) There is a faint power line in the distance, the line should be smooth and continuous.

2) The left edges of the screen shouldn't be jagged.

3) The side of the 911 Turbo should be smooth and not appear to be jagged.

The first thing you'll notice is that the power line seems to appear then disappear in both the first and second screenshots (FSAA off & 2 Sample FSAA enabled). While the effect isn't as bad with 2 Sample FSAA enabled, it's still noticeable, and it is even more so on maps that have a lighter background and larger powerlines.

The side of the Turbo appears somewhat smoother with 2 sample FSAA enabled, as do the bottoms of the pillars. Other than the powerline, 2 Sample FSAA appears to be just fine.

The top of the building below the rear view mirror is also very jagged without FSAA enabled, but appears smooth as silk with 4 Sample FSAA; however, at the same time, it suffers from very poor performance with this mode enabled.



Level of Detail (LOD) Bias

We investigated the effects of the various LOD settings on the Voodoo5's FSAA image quality in our final review of the Voodoo5 5500 from a few weeks ago. For an in-depth description of how to enable the LOD slider and what it does, visit our Voodoo5 5500 Review.

Below we have the top half of two screen shots taken from the same location, once again at 640 x 480 x 32. The upper most screen shot illustrates 3dfx's 4 Sample FSAA at its default setting, and the bottom half is the same shot, this time with the LOD slider set to the -8 position.

As you can see, the 4 Sample FSAA with the LOD slider set to the -8 position really sharpens the image up when compared to the default LOD 0 setting.



ATI

The Radeon is probably the easiest card to test when it comes to FSAA image quality and performance since it only supports one setting, what they like to call "4X FSAA."

The Radeon's FSAA works using a method called "supersampling," which happens to be the same method NVIDIA uses in their FSAA implementation. This method of FSAA basically takes the current game resolution and multiplies the horizontal and vertical resolutions by a factor, renders the scene at the higher resolution, and then scales the scene back down to the game resolution.

In the Radeon's case, its "4X FSAA" is really a 4 Sample FSAA since it multiplies the horizontal and vertical resolutions by a factor of two, renders the scene and then scales it down to the game resolution before displaying it. For example, enabling FSAA at 640 x 480 would mean that the current frame is actually rendered at 1280 x 960 and then scaled back down to 640 x 480. This means that the Radeon has to render 4 times as many pixels, effectively making the Radeon's 4X FSAA a 4 Sample FSAA and thus deserving of a direct comparison to 3dfx's 4 Sample FSAA.

Unfortunately, because the Radeon only has one FSAA setting, that being a 4 Sample setting, enabling FSAA on the Radeon isn't a viable option in many cases since the performance hit is so great. The Radeon basically drops to 1/4 of its theoretical fill rate when its FSAA is enabled, which may be alright in a situation that isn't fill rate limited or where 60+ fps frame rates aren't absolutely necessary, but it certainly isn't acceptable for all games.

Fill Rate Comparison
 
FSAA Disabled
2 Sample FSAA
4 Sample FSAA
ATI Radeon 64DDR
366 MPixels/s
1.1 GTexels/s
91.5 MPixels/s
275 MTexels/s
3dfx Voodoo3 3500
183 MPixels/s
366 MTexels/s
NVIDIA TNT2 Ultra
300 MPixels/s
300 MTexels/s

 

The Radeon's fill rate is an interesting topic of discussion since it has two rendering pipelines but can process three textures per pipeline. In single textured games, the Radeon's fill rate takes a dive as it drops down to the level of a Voodoo2 when 4 sample FSAA is enabled. And since most current games only make use of two textures, the 1.1 GTexels/s fill rate drops down to 733 MTexels/s since the former theoretical figure is taken assuming that we are rendering in a three texture per pixel environment.

Once again, these are theoretical numbers illustrating raw fill rate; with the Radeon you also have to take into account the effects of its HyperZ on available fill rate as well.

Below we have a comparison of the Radeon without FSAA and with its 4 Sample setting enabled.

Just as with the Voodoo5, enabling the 4 Sample FSAA on the Radeon helps to 1) make the powerline continuous, 2) smoothen out the 'jaggies' along the lower part of the bridge and 3) smoothen out the 'jaggies' on the car itself.



NVIDIA

As we mentioned before, NVIDIA's FSAA method is identical to the way the Radeon's - using "supersampling." The only difference is that NVIDIA has quite a few more settings - 3 under OpenGL and 8 under Direct3D.

We've already described the three OpenGL settings before in older GeForce2 GTS reviews, but just as a refresher, here are the three settings:

·        1.5 screen resolution (2.25 Samples)

·        2x screen resolution, with LOD’s (MIPMaps) at the native game resolution (4 Samples)

·        2x screen resolution with MIPMaps at the 2x resolution. (4 Samples)

So if you’re running a game at 640 x 480, the first FSAA option will render the scene at 960 x 720 (640 * 1.5 x 480 * 1.5) and then scale it back down to 640 x 480 for displaying.

For instructions on how to enable these three settings using the Detonator 5.3x drivers check out our NVIDIA GeForce 2 GTS FSAA Update (Detonator 5.30 Drivers).

Fill Rate Comparison
 
FSAA Disabled
2 Sample FSAA*
4 Sample FSAA
NVIDIA GeForce2 GTS
800 MPixels/s
1.6 GTexels/s
356 MPixels/s
711 GTexels/s
200 MPixels/s
400 MTexels/s
3dfx Voodoo3 3500
183 MPixels/s
366 MTexels/s
NVIDIA TNT2 Ultra
300 MPixels/s
300 MTexels/s

 

The GeForce2 GTS definitely has the fill rate power to be fairly playable at the 4 Sample FSAA setting, but unfortunately the GeForce2 GTS is the victim of a lack of sufficient memory bandwidth, which is where these theoretical numbers don't always represent real world performance.

If you'll notice, there are two 2x2 settings (4 Samples), the only difference between the two being the LOD bias of the MIPMaps. The first 2x2 setting attempts to salvage some performance by using "biased" MIPMaps, basically keeping the LOD bias the same as if we were rendering at the lower resolution.



The second 2x2 setting allows the MIPMaps to get sharper as the number of samples increases, just as they normally would as you increase the screen resolution. This obviously produces the clearest image, as the former would appear to be more blurry. Basically, instead of providing a LOD bias slider as with 3dfx's drivers, NVIDIA makes the LOD adjustments for you with the various FSAA settings.

The 2 x 2 settings are obviously better FSAA settings than the 1.5 x 1.5 setting, and the 2 x2 Unbiased setting produces a much sharper scene than the 2 x 2 Biased setting as you can see by the blurry floor in the image above. Before we get to exactly why the sky looks horrendous, let's take a look at the performance hit each of these settings results in:

The above chart illustrates that 640 x 480 is definitely playable with any of the three FSAA settings, and 800 x 600 isn't too bad either, but once you get higher you start becoming borderline in terms of achieving a playable frame rate, depending on what you deem acceptable.



S3TC = Nasty Looking Sky?

As you've probably noticed in previous screenshots from the GeForce2 GTS as well as the GeForce and the GeForce2 MX, the sky definitely looks pretty bad. But on all competing cards, including the Voodoo5 5500 and the ATI Radeon the sky looks perfectly fine.

Let's take a quick look at how bad the sky looks first:

Pretty bad, no? But watch what happens when we disable S3TC, the GeForce2's Texture Compression algorithm that has been enabled in all 5.xx drivers:

Much better now. To disable S3TC all you need to do is set 'r_ext_compress_textures' to 0 in the Quake III console, but this will hinder your performance considerably in situations where there are a lot of textures.

So what kind of performance drop do you see when you disable S3TC? On a 32MB card, the performance hit can be up to 50% depending on the situation (Quaver is the perfect benchmark for this); 64MB owners will be happy to know that they can disable S3TC without losing much performance because of all of the extra memory.

Is it worth it? It's not as noticeable as it would be if the walls or floors looked really bad, but for some gamers it is a big problem, so you can try experimenting with turning r_ext_compress_textures off to see if the game is still playable by your standards.

The problem that we had was that both 3dfx and ATI are using texture compression as well, and neither of their solutions have the degraded sky quality. Let's hope this isn't something NVIDIA can't get around in future driver releases; we've been with S3TC for so long now (ever since the first 5.xx drivers were leaked) and for us to have to play without S3TC in order to get decent looking sky textures would probably annoy more than a handful of gamers.



NVIDIA's Direct3D FSAA

NVIDIA is probably the most flexible in terms of FSAA options under Direct3D, but unfortunately, only a couple of the settings are actually practical for use in normal gameplay. Let's take a look at the settings offered under Direct3D (these settings correspond to the slider positions on the D3D FSAA slider in the drivers):

  • 1x horizontal resolution, 2x vertical resolution - 2 sample FSAA, unbiased mipmaps
  • 2x horizontal resolution, 2x veritcal resolution - 4 sample FSAA, biased mipmaps
  • 2x horizontal resolution, 2x veritcal resolution - 4 sample FSAA, unbiased mipmaps
  • 2x horizontal resolution, 2x veritcal resolution - 4 sample FSAA, unbiased mipmaps, different filter
  • 3x horizontal resolution, 3x veritcal resolution - 9 sample FSAA, biased mipmaps
  • 3x horizontal resolution, 3x veritcal resolution - 9 sample FSAA, 9x unbiased mipmaps
  • 4x horizontal resolution, 4x veritcal resolution - 16 sample FSAA, biased mipmaps*
  • 4x horizontal resolution, 4x veritcal resolution - 16 sample FSAA, unbiased mipmaps*

*Note: These two settings require a 64MB card to run.

The first thing you have to realize is that unless you want to run your card at 1/9th or 1/16th of its current speed, the last four settings are completely useless. The GeForce2 GTS does not have the fill rate to allow for a 9 or a 16 sample FSAA algorithm to be implemented at a reasonable frame rate in most games.

Secondly, NVIDIA's lowest FSAA setting under Direct3D, the 2 sample setting, does not look as good as 3dfx's 2 sample FSAA since you're only effectively supersampling in one direction. In spite of this, the 2 sample setting still yields a relatively similar performance hit since you're forcing the chip to render twice as many pixels.

As we proved in our NVIDIA GeForce 2 GTS FSAA Update (Detonator 5.30 Drivers), the performance benefit you get from going with the lower quality 2x2 (4 sample FSAA) settings is not great enough to justify the slight drop in image quality, so if you're going to use a 4 sample FSAA setting the third 2x2 setting makes the most sense.

Below we have clips from our Need for Speed screenshots that help to illustrate the fine differences between the three 4 sample FSAA settings:

*Note: the last 2x2 setting uses a different filter

If you look closely you can see the differences in sharpness between the three settings, the latter being the most blurry because of its unbiased mipmaps whereas the first setting is using biased mipmaps at the higher resolution.



3dfx vs ATI vs NVIDIA

Now that we've looked at each manufacturer individually, it is time to compare the three together. We compared each solution based on the number of samples the specific setting took; for example, in this first comparison we have 3dfx's 2 Sample FSAA versus NVIDIA's 1 x 2 (2 Sample FSAA).

While NVIDIA's shot may be sharper than the one we took on the Voodoo5 because of LOD bias tweaking present in the setting itself, the Voodoo5's 2 sample setting does have smoother edges along the corners of the buildings, etc... The comparison is a very close one, so it's up to you to decide.



Next, moving to the 4 sample settings, we can include ATI's Radeon since it does feature a 4 sample setting. Let's have a look at the results:

In all three shots the faint powerline is visible and continuous. The Radeon and GeForce2 GTS offer very similar FSAA qualities, as they should since they are both using the same method and are taking the same number of samples. The Voodoo5 manages to come away with a few more smoothed edges on the left hand side of the screen, but overall all solutions are respectable.



In the above Quake III screenshot you can see that the Voodoo5 5500 does manage to produce a smoother image than the Radeon and the GeForce2 GTS. You can also see that both the 3dfx & ATI solutions are using texture compression without the horrible effects on the sky that are present in the GeForce2 GTS image.



Performance

Since image quality isn't the only factor, we must also take into account performance in the various FSAA settings:

As you can see, the Radeon gets dominated when in 16-bit color, due to its slow performance in 16-bit even with FSAA off. At 16-bit color, the Radeon is easily dominated by the GeForce 2 GTS. The Voodoo5 5500 in 4X mode provides for a much better image, though it does run a bit slower. In 32-bit color, the Radeon looks as good as the GeForce 2 GTS in 2x2 mode high quality, but it goes much faster. This makes gameplay in FSAA in 640x480x32 an option for gamers.



Final Words

From a performance standpoint, none of the current cards have the real world fill rate to push a 4 sample FSAA setting at 60+ fps, which is what we would ideally like. This makes the presence of 2 sample FSAA settings a very smart move on behalf of 3dfx and NVIDIA: we would definitely like to see a 2 sample setting added to the ATI Radeon's sole 4 sample setting.

3dfx definitely has the best 2 sample FSAA setting out of the bunch, although we were unsuccessful with tweaking the LOD bias settings when enabling 2 sample FSAA (other artifacts would pop up), the setting still seems to offer the best overall result in terms of image quality and performance.

ATI's Radeon benefits greatly from its efficient memory management, allowing it to offer faster performance at their 4 sample FSAA setting than NVIDIA's GeForce2 GTS running in its 4 sample mode in 32-bit color. This is an example of yet another situation where theoretical fill rate becomes meaningless as the reality of memory bandwidth limitations sets in.

There you have it, the three major contenders and the pros/cons of their FSAA implementations. What you're seeing now is just the beginnings of what these manufacturers can offer us, what we will really be able to look forward to is the next generation 3dfx part as well as NVIDIA's NV20, both of which may be able to provide a much higher performance FSAA solution that can be used for more than just showing off in screenshots.

Log in

Don't have an account? Sign up now