Today’s article offers a comprehensive review of the GeForce RTX 5090 and the underlying NVIDIA Blackwell architecture. As it is a completely new architecture, it is divided into a theoretical analysis of the new technical developments and a practical review that focuses on benchmarks, real applications and user experiences. In the first part, the focus today is on both the performance increases and the new functions that this GPU generation brings with it. Particular attention is paid to DLSS4, which is explained in detail in the theory section and analyzed in a separate section later on using suitable games. This section highlights the advances in image quality, the improvements in frame generation and the effects on system latency. But I don’t want to get ahead of myself or spoil anything.
As always, it was a feat of strength and I have also completely adapted the game selection and expanded it to a total of 11 games, each in five different settings. There is also a brand new test system and updated metrics. The GeForce RTX 5080, which will follow shortly, will then be added in the same way. I would also like to point out in advance that I have not disassembled the GeForce RTX 5090 FE as usual this time and the teardown is (still) missing. This is because I still need the card in its original state for further tests and the process of teardown and subsequent reconstruction is so complex that it cannot be guaranteed that the card will be able to operate in the same state afterwards (liquid metal that can only be removed mechanically).
In addition, a special article will be published soon, which will deal intensively with DLSS4, its image quality and the aspects of latency. Unfortunately, one of the cards was defective, so we had to rework it after a delay and the replacement only arrived yesterday. But the scope of the content will definitely be worth it, even if it comes a little later. You will certainly still have to look for it in this depth next week. So stay tuned, it’s worth it.
Another follow-up to the review will focus on workstation performance and professional applications. This part has been split off as a separate article, because due to driver problems and other technical details that needed to be clarified, some tests were postponed by me in order to ensure a fair and comprehensive evaluation. After all, if there is a changed driver after all, I won’t save myself the trouble of retesting, but I will save myself the superfluous export of extensive chart graphics. And so I can include the GeForce RTX 5080 that will follow and save on redundant content.
But that shouldn’t stop me from comparing the new card today with the GeForce RTX 4090 in particular and the entire Ada lineup of suitable super cards in general. I’m also leaving out Ampere due to time constraints, but since there are enough comparisons between Ada and Ampere on my site, this is certainly easy to get over. This time we’ll go into a bit more detail, including the theory, because it’s worth it. I also have to explain at the beginning why I could not escape a certain euphoria for one or the other feature in the conclusion and of course, as always, find something to complain about.
The GB202 GPU in detail
The GB202 GPU from the new NVIDIA Blackwell architecture represents a milestone in GPU technology and is the heart of the GeForce RTX 5090. The architecture of the GB202 GPU is based on a total of 92.2 billion transistors, which are integrated into a chip with an area of 750 mm² and manufactured in the optimized TSMC 4N process. With a boost clock frequency of 2407 MHz, the GPU offers a peak performance of 104.8 TFLOPS for FP32 computations, 209.5 TFLOPS for FP16 computations and an impressive 1676 TFLOPS for FP4 computations. This performance is driven by 24,576 CUDA cores, 192 fourth-generation RT cores and 768 fifth-generation Tensor cores. Compared to the Ada architecture, which was limited to 16.38 MB L1 cache and 72 MB L2 cache, the GB202 offers a significant increase with 21.76 MB L1 cache and 96 MB L2 cache, which significantly improves data access times.
According to NVIDIA, another highlight is the energy efficiency of the GB202 GPU. With a maximum power consumption of 575 watts, the architecture relies on advanced power-saving mechanisms such as separate voltage rails for GPU cores and memory as well as accelerated clock frequency switching. These innovations minimize energy consumption during idle phases and maximize responsiveness under load. I will go into this in more detail in a separate section, including my own measurements.
Streaming multiprocessor (SM) architecture
The streaming multiprocessors (SMs) of the GB202 GPU form the core of its computing power. Each of the 192 SMs comprises 128 CUDA cores, one fourth-generation RT core and four fifth-generation tensor cores. While the Ada architecture already offered similar CUDA core counts, the Blackwell RT cores and Tensor cores have made significant advances in terms of efficiency and functionality. The fifth-generation Blackwell Tensor Cores also offer new features such as support for FP4 data formats, which can dramatically reduce memory requirements while doubling performance in AI inference applications. This is a significant improvement over the Ada architecture, whose fourth-generation Tensor Cores did not support FP4. These advances enable more efficient use of resources in neural networks, especially for large models that are increasingly used in real-time applications such as image and speech processing.
The number of texture units has been increased to 768, enabling a texel fill rate of 1636.76 gigatexels per second, a significant increase over the 1290.2 gigatexels per second of the RTX 4090. This greatly improves the processing of complex textures and neural texture compression. The L1 cache has also been expanded compared to the Ada architecture, further optimizing efficiency and speed for memory-intensive tasks.
The fourth-generation RT cores in the GB202 GPU have also been significantly enhanced. They offer double the performance in ray triangle intersection calculation compared to the previous generation and support new features such as Linear Swept Spheres (LSS) for more efficient rendering of complex geometries such as hair or grass. While the Ada architecture had already introduced significant advances in ray tracing, the new RT cores in Blackwell allow for even more realistic rendering through improved hardware implementations for Bounding Volume Hierarchies (BVH) and Opacity Micromaps. These innovations not only improve image quality, but also significantly improve performance in demanding ray tracing scenarios.
GDDR7 memory system
The GDDR7 memory system of the GB202 GPU sets new standards in memory technology. With a memory capacity of 32 GB and a bandwidth of 1.792 TB/s via a 512-bit interface, the Blackwell generation offers a significant improvement over the Ada architecture, which used GDDR6X with a bandwidth of 1.008 TB/s. The use of PAM3 signaling instead of PAM4 technology in GDDR6X enables a better signal-to-noise ratio and higher energy efficiency.
The GB202 GPU’s memory architecture includes 96MB of L2 cache, compared to the 72MB of the RTX 4090, significantly accelerating memory-intensive applications such as ray tracing. In addition, the Blackwell architecture offers advanced CRC features that improve memory system reliability and stability. These advances make the GDDR7 memory system an indispensable part of the GB202 GPU and contribute significantly to handling high-resolution graphics and AI workloads.
After so much theory and text, I’ll briefly summarize it all in a clear table and compare it with Ampere and Ada:
- 1 - Details zur Blackwell GB202 GPU
- 2 - DLSS4 einfach und im Detail erklärt
- 3 - Neurale Shader als echte Game-Changer?
- 4 - Pathtracing: Grundlagen und Verbesserungen mit Benchmarks
- 5 - Testsystem und Equipment
- 6 - Gaming: Full-HD 1920x1080 Pixels (Rasterization Only)
- 7 - Gaming: WQHD 2560x1440 Pixels (Rasterization Only)
- 8 - Gaming: Ultra-HD 3840x2160 Pixels (Rasterization Only)
- 9 - Gaming: WQHD 2560x1440 Pixels, Supersampling, RT & FG
- 10 - Gaming: Ultra-HD 3840x2160 Pixels, Supersampling, RT & FG
- 11 - DLSS4 und MFG: Cyberpunk 2077 im Detail
- 12 - DLSS4 und MFG: Alan Wake 2 im Detail
- 13 - PCIe 5 Probleme, Leistungsaufnahme in Theorie und Praxis
- 14 - Lastspitzen nativ vs. DLSS4, Netzteilempfehlung
- 15 - Kühler, Temperaturen, Thermografie, Geräuschentwicklung
- 16 - Zusammenfassung und Fazit




































































267 Antworten
Kommentar
Lade neue Kommentare
Mitglied
Urgestein
Urgestein
1
Mitglied
Urgestein
1
Mitglied
Urgestein
Urgestein
Mitglied
Mitglied
Mitglied
Urgestein
1
Urgestein
Neuling
Urgestein
Mitglied
Alle Kommentare lesen unter igor´sLAB Community →