The next RTX 40 2x faster than the RTX 30?


This is the rumor of the day about the upcoming Nvidia cards. These new leaks come from Kopte7kimi and talk about the architecture block diagram of the new generation of greens. An image of the AD102 ‘Ada Lovelace’ GPU block diagram will allow us to project the performance of the next RTX 40.

RTX 40

RTX 40: an impressive datasheet ( if true )

To begin with, the Ada Lovelace AD102 GPU will feature up to 12 GPCs (Graphics Processing Clusters). This is a 70% increase over the GA102 (the largest in the current lineup) which only has 7 GPCs. Each GPU will be composed of 6 GPCs and 2 SMs, which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores, which is also the same as the GA102 GPU. The real change is the FP32 and INT32 core configuration. Each sub-core will have 128 FP32 units, but the combined FP32 + INT32 units will go up to 192. This is because FP32 units do not share the same sub-core as IN32 units. The 128 FP32 cores are separated from the 64 INT32 cores.

RTX 40 schema
One of the diagram images of an RTX 40 GPU from Kopte7kimi

Caching should be another area where NVIDIA has gone all out compared to existing Ampere GPUs. The Ada Lovelace GPUs will contain 192KB of L1 cache per SM, a 50% increase over Ampere. That’s a total of 4.5MB of L1 cache on the top AD102 GPU. The L2 cache will be increased to 96 MB, a figure regularly mentioned in several leaks. That’s nearly 16 times more compared to the Ampere GPU, which only hosts 6MB of L2 cache. The cache will be shared across the GPU. If the leaks are true, we have an exponential increase in L2 cache, which increases to a total of 96 MB for the AD102. Concerning the ROPs we would have on this architecture twice as many units, 32 per GPC to be precise, which would give us a total of 384 ROPs for a possible RTX 4090 against 112 for the RTX 3090… On paper it’s monstrous.

See also  RTX 3060 Mobile: GPUs converted into a mining card!
RTX 40 comparaison
GPU feature comparison. AD102 would be the top of the RTX 40 range

But after this orgy of technical data, what can we really expect to gain?

It is obviously still early to have a clear idea, but if these elements are confirmed, the datasheet allows us to see a huge difference compared to Ampere. To summarize:

  • X2 GPC (compared to Ampere)
  • 50% more cores (compared to Ampere)
  • 50% more L1 cache (compared to Ampere)
  • 16x more L2 cache (compared to Ampere)
  • X2 ROP (compared to Ampere)
  • 4th generation tensor and 3rd generation RT cores

But what can we expect in terms of real performance? This is very difficult because we are missing a key data: the operating frequency. If we speculate a little on this subject, we can project ourselves on a FP32 power of 90 TFLOPS, more than double that of the current GA102. However, with the TFLOPS we can also have surprises. If they give an idea of a raw performance, they never allow to prejudge the results in “everyday” use. The leaks announce x2 to x2.2 compared to the RTX 30… There will be a gain obviously, it seems to be substantial. But to decide beyond that, we’ll have to wait a bit more.