On September 20th NVIDIA announced the GeForce RTX4090 at the GTC conference. Unfortunately, we didn’t get a FE 4090, so ASUS sent us their flagship model, the ROG Strix RTX4090 OC.
Chef Jensen was bragging about the DLSS 3 technology at GTC, so let’s start with the RTX4090’s performance.
Before we announce the test results, let’s explain the specs and features of the RTX4090, as well as what DLSS 3 is and what the difference is compared to the previous generation flagship RTX3090Ti.
The RTX40 series uses for the new Ada Lovelace architecture, replaced by a 4nm process from Samsung 8nm to TSMC, compared to the GA102 core area from 628mm² down to 608mm², more integrated and the transistor count skyrocketed to 76.3 billion. However, the RTX4090, the first flagship graphics card in the 40 series, does not use the full version AD102 chip. NVIDIA has reservations about the RTX4090, meaning that higher end cards such as the RTX4090Ti will follow.
With the graphics architecture diagram, the full spec AD102 should be 12 groups of GPCs, 72 groups of TPCs and 144 groups of SMs with 18,432 CUDA cores with 144 3rd generation RT Core RT cores and 576 4th generation Tensor cores, 576 texture units, 192 ROP units, 18MB of L1 cache, 96MB of L2 cache, 36MB of registers. 96MB L2 cache, and 36MB register file.
On the other hand, the RTX4090 has been streamlined somewhat from the full AD102 core, with 1 group of GPC and 4 groups of SM units neutered, 2048 stream processors less compared to the full AD102 architecture, as a number of 16,384 stream processor units in total, and a slightly lower RT Core and Tensor Core count.
But even so, the RTX4090 still crushes the RTX3090Ti in every specification, especially in terms of stream processors, core frequency has been significantly increased, if only comparing the number of stream processors, the RTX4090 is 1.52 times higher than the previous generation RTX3090Ti, so the theoretical performance has been improved by at least 50-60%. Furthermore, the RTX4090 still maintains the same 450W as the RTX3090Ti, which is inevitably thanks to TSMC’s 4nm process.
In addition to the usual upgrades, the RTX40 series adds support for AV1 encoding, which will undoubtedly be a benefit to video creators with special needs. the RTX40 series cards have a new optical flow unit within the core, which, if enabled with the new DLSS 3, additionally allows for real-time frame fill for games in addition to the original multi-frame composite super resolution, bringing better silky-smooth gaming experience. Considering the extra latency that the frame fill process brings, NVIDIA has also introduced Reflex low latency technology to synchronise the CPU and GPU, eliminating useless frames from the GPU rendering queue to ensure optimal responsiveness. In addition, the RTX40 Series graphics cards have been upgraded to fourth generation Tensor units and third generation RT units which further enhanced the RT performance.
According to NVIDIA, the 3rd Generation RT Core, which is used in the Ada Lovelace architecture core, not only doubles ray and triangle intersection performance, but also more than doubles the peak RT-TFLOP performance. The 3rd generation RT Core also features the new Opacity Micromap (OMM) engine and Displaced Micro-Mesh (DMM) engine, which further improves performance and reduces computational overhead for better ray-tracing performance, and the 4th generation Tensor Core with the Ada Lovelace architecture core introduces the FP8 Transformer engine from the professional-grade H100 data centre GPU for even more Ai power.
The Ada Lovelace architecture core introduces Shader Execution Reordering (SER) on the SM unit for better shader work scheduling and sequencing, which allows rendering to be performed according to different load requirements, enabling more uniformly loaded images to be rendered and reducing work overhead.
This SER feature will be made available to developers as an API. NVIDIA also claim that the addition of SER will deliver up to twice the performance improvement of RT Core and an enhanced experience for ray-tracing games. According to NVIDIA, the new SM (Streaming Multiprocessor) can deliver up to twice the performance and power efficiency, which is a significant upgrade.
In addition to the 3rd gen RT Core, 4th gen Tensor Core and SM (Streaming Multiprocessor) upgrades, the Ada Lovelace architecture brings the new 8th gen NVENC dual hardware encoder, which integrates support for the AV1 video encoding format, allowing for the encoding and decoding of the AV1 format in addition to the traditional H.264 format. The new 8th generation NVENC Dual Hardware Encoder integrates support for AV1 video encoding format, allowing for encoding and decoding of the AV1 format in addition to the traditional H.264 format, effectively improving productivity for creative users and game streamers. According to NVIDIA, the 8th gen NVENC dual hardware encoder is 40% more efficient in AV1 than H.264, which is a significant improvement.
In terms of performance, the Ada Lovelace architecture(AD 102) has 96MB of L2 cache, which provides a significant increase in data hit rate. With the current stage of effective graphics memory bandwidth enhancement, the upgrade to a larger capacity L2 cache can further
improve data hit performance and bring about a performance upgrade. According to NVIDIA, the addition of a large capacity L2 cache brings about higher gains in a range of GPU operations, especially in ray tracing scenarios.
A series of upgrades to the Ada Lovelace architecture also brings new DLSS 3 technology exclusivity and experience enhancements in terms of NVIDIA Reflex.
DLSS 3 includes DLSS 2 (DLSS Super Resolution) and the new DLSS Frame Generation technology, which relies on the Ada Lovelace architecture’s independent optical flow and Tensor Core’s AI algorithm. During the operation of a technology, the Optical Flow Acceleration (OFA) is used to generate frame information from a series of data obtained from the optical flow processor, while the AI algorithm is responsible for further optimisation of the frames, thus adding more frames to the game and improving the frame rate.
DLSS 3 generates 3/4 of the pixels of the first frame of the game and the entire second frame of the game directly, which is equivalent to using 1/8th of the computing power to complete the entire calculation process, and by cycling through this process, the frame rate can be increased while saving resources overhead, further leading to a better gaming experience.
DLSS 3’s frame generation technology will also be used to reduce rendering and operation latency by integrating NVIDIA Reflex. According to NVIDIA, DLSS 3 will integrate NVIDIA Reflex and a range of other technologies to deliver a rendering and manipulation experience that matches or exceeds the native experience.
Let’s take a closer look at the performance of the RTX4090 graphics card through professional testing software, with the following test platform configuration.
Test Bench:
3DMark Performance test
The 3DMark includes include PortRoyal, TimeSpy and TimeSpy Extreme to test the performance of DX12 at 2K resolution, 4K resolution and ray-tracing performance respectively.
Through the 3DMARK benchmark can be seen, RTX4090 performance has been improved across levels, compared to RTX3090Ti in theoretical performance increased by about 40% respectively, this is only the theoretical performance difference under DX12 games, we are going to test the different game performance differences, let’s check it out.
Gaming test
As we know, the previous generation RTX3090Ti is a graphics card that cannot play Cyberpunk 2077 at 4K resolution without DLSS enabled, but if the RTX4090 can handle Cyberpunk 2077 at more than 60 fps under 4K, it means that the RTX4090 will fully open the 4K era. For this flagship graphics card, the game benchmark session was not considered at 1080p and 2K, but directly on 4K resolution for testing. The games we chose are CS:GO, Cyberpunk 2077, DOTA2, Monster Hunter: World, PUBG, Red Dead Redemption 2 and World of Tanks under 4K maximum graphics setting.
In terms of gaming, the RTX4090 has a larger lead in most 4K games compared to the RTX3090Ti. The reason for its modest improvement in individual games should be that the graphics card is less stressed, has a lower graphics card footprint, and the stress is all on the CPU. At 4K resolution, the RTX4090 is able to exceed 60 fps and reach close to 90 fps without DLSS, with DLSS 3.0 on it is even possible to reach more than 100 fps, while the RTX3090Ti is still unable to reach 50 fps and requires DLSS to be enabled.
DLSS 3.0 test
We have also tested the recently updated DLSS 3 project, and from the 3D Mark results, we can see that the ASUS ROG Strix RTX4090 OC can achieve a very noticeable improvement in performance thanks to the DLSS 3 technology. We have also obtained test versions of several games that support DLSS 3 and will be testing some of these games to experience the improvements that DLSS 3 technology can bring.
According to Nvidia’s official introduction, more than 35 games and apps have already announced that they will soon support DLSS 3 technology, including popular gaming titles such as Cyberpunk 2077 and RacerX. For this DLSS 3 game test, we will use Cyberpunk 2077, Marvel’s Spider-Man Remastered and A Plague Tale: Requiem. Please notice that the DLSS version for RTX3090Ti is v2.5
As you can see through a range of figures, the ASUS ROG Strix RTX4090 OC we tested has very strong power and performs even better in a range of games.
Productivity performance benchmarks
AV1 codec
As AV1 has become the new standard encoding used for decent video content. And the reason is simple. It is because AV1 codecs can achieve the same picture quality as other mainstream codecs at a lower bitrate. AV1 codecs has higher video compression algorithm to enable creators saving a lot of space on their hard drives.
EPIC PC is currently uses DaVinci Resolve 18 for long video editing. This software supports Nvidia’s AV1 hardware-accelerated codec. A 4K MP4 video of around 10GB in size can be exported in around 40 seconds.
H.265 hardware accelerated codec
Also for this 10GB size 4K footage, encoded in H.264 and exported to MP4, this ASUS ROG Strix RTX4090 OC took roughly 35 seconds to process, while the RTX3090Ti took 70 seconds, which is twice as slow..
Industrial design benchmarks
For a test that leans further towards the needs of professional users, we used SPECviewperf 13 for a comparison test of the two cards.
The comparison tests also show that the ASUS ROG Strix RTX4090 OC is an all-round improvement card in comparison to the RTX3090Ti. achieving an all-round advantage in a series of benchmark tests of the industrial needs.
Through hands-on testing in various aspects such as gaming, video creation and benchmarking with professional tools, we can see that the ASUS ROG Strix RTX4090 OC has the latest technological advantages and all-round performance improvements compared to the previous generation flagship model RTX3090Ti, and its strong performance also brings users excellent experiences.
Thermal and power consumption benchmarks
With a room temperature of 28±2°C and an open test bench, we tested this ASUS ROG Strix RTX4090 OC card and the temperature was stable at around 71°C at around 30 minutes, while the total power consumption of the card was around 440W, which is a relatively good performance.
Summary
As the flagship of the new generation card, the ASUS ROG Strix RTX4090 OC shows us very strong performance. With the new Ada Lovelace architecture of the AD 102-300 display core, it brings a full jump in gaming and productivity compared to the previous generation RTX3090Ti. The addition of new technologies such as DLSS 3 provides an immersive experience with clearer picture quality and smoother graphics, making it a must-have for enthusiast gamers.
For gamers, content creators and other users, the ASUS ROG Strix RTX4090 OC’s power and large 24GB VRam will enhance the experience and help users to create more creative inspiration.
The ASUS ROG Strix RTX4090 OC also has a very cool looking, with a semi-retro rounded enclosure and RGB glowing grille at the rear for a Cyberpunk industrial level design, as well as a decent cooling performance. If you’re thinking of upgrading your old graphics card, this is definitely a great choice.
EPIC PC RATE SCORE : 93/100
ASUS ROG STRIX GEFORCE RTX4090 OC OFFICIAL WEBSITE: HERE