Tag Archives: Shader

AMD RDNA 3 “Navi 3X” GPUs Feature Double The Cache Per Compute Unit & Shader Array

AMD has listed down the latest information regarding the Cache sizes of their upcoming RDNA 3 “Navi 3X” GPUs within Linux patches.

AMD’s Next-Gen RDNA 3 GPUs For Navi 3X Lineup To Double The Cache Size For Compute Units & Shader Array

Published over at the FreeDesktop Linux repository by AMD’s Aaron Liu and discovered by Coelacanth-Dream and Kepler_L2, we get to see the first details regarding the cache sizes in upcoming RDNA 3 GPUs such as the Navi 31, Navi 32, and Navi 33 chips which were recently leaked.

Coming to the details, the AMD RDNA 3 (GFX11) GPU lineup will feature double the L0 vector cache for each Compute Unit or CU and also double the GL1 data cache (RDNA L1 cache for each Shader Array or SA. As per the new information, the Vector Register File per SIMD will increase to 192KB vs 128KB on RDNA 2, the L0 Vector/Texture cache will increase from 16 KB to 32 KB per CU, the GPU L1 Data Cache per Shader Array will increase from 128 KB to 256 KB while the L2 Data Cache will remain the same as RDNA 2.

There are also cache sizes listed for AMD’s Navi 33 and Phoenix APUs which will also feature an RDNA 3 graphics core but in a monolithic package. The L0 Vector/Texture size is increased from 16 KB to 32 KB while the L1 Data Cache (Graphics) is increased from 128 KB to 256 KB. The Register file size remains untouched on Navi 33 GPUs & Phoenix APUs.

Cache Info Yellow Carp (Rembrandt) RDNA 3(GFX11 Navi 31/32) Phoenix (GC 11.0.1, GFX1103)
L0 Vector Register File per SIMD 128KiB 192KiB 128KiB
L0 Vector Data (per CU) 16KiB 32KiB 32KiB
L1 Scalar Inst. (per WGP) 32KiB 32KiB 32KiB
L1 Scalar Data (per WGP) 16KiB 16KiB 16KiB
GL1 Date (per SA) 128KiB 256KiB 256KiB
L2 Data 2048KiB (2MiB) 2048KiB (2MiB) 2048KiB (2MiB)
L3 (MALL) N/A Yes N/A

Coelacanth-Dream also states that all RDNA 3 “Navi 3X” GPUs come with VODP (Dual-Issue Wave32) instructions, WMMA (Wave Matrix Multiply-Accumulate) support and the performance per WGP has been improved a lot. The increased GL1 cache is said to improve pixel processing performance and is amongst the many changes that AMD is bringing within its RDNA 3 Navi 3x GPU family.

AMD confirmed that its RDNA 3 GPUs will be coming later this year with a huge performance uplift. The company’s Senior Vice President of Engineering, Radeon Technologies Group, David Wang, said that the next-gen GPUs for Radeon RX 7000 series will offer over 50% performance per watt uplift vs the existing RDNA 2 GPUs. Some of the key features of the RDNA 3 GPUs highlighted by AMD will include:

  • 5nm Process Node
  • Advanced Chiplet Packaging
  • Rearchitected Compute Unit
  • Optimized Graphics Pipeline
  • Next-Gen AMD Infinity Cache
  • >50% Perf/Watt vs RDNA 2

AMD will be rearchitecting the compute units within RDNA 3 to deliver enhanced raytracing capabilities. Although there’s no mention of what these capabilities are if we were to guess, we would say it’s definitely talking about performance and a set of advanced features on the RDNA 3 GPU core for Radeon RX 7000 graphics cards. The AMD Radeon RX 7000 graphics cards are going to launch later this year and offer a big leap in gaming performance so stay tuned for more info in the coming weeks.

AMD RDNA 3 Navi 3X GPU Configurations (Preliminary)

GPU Name Navi 21 Navi 33 Navi 32 Navi 31 Navi 3X
Codename Sienna Cichlid Hotpink Bonefish Wheat Nas Plum Bonito TBD
GPU Process 7nm 6nm 5nm/6nm 5nm/6nm 5nm/6nm
GPU Package Monolithic Monolithic MCM (1 GCD + 4 MCD) MCM (1 GCD + 6 MCD) MCM (TBD)
GPU Die Size 520mm2 203mm2 (Only GCD) 200mm2 (Only GCD)
425mm2 (with MCDs)
308mm2 (Only GCD)
533mm2 (with MCDs)
TBD
Shader Engines 4 2 4 6 8
GPU WGPs 40 16 30 48 64
SPs Per WGP 128 256 256 256 256
Compute Units (Per Die) 80 32 60 96 128 (per GPU)
256 (Total)
Cores (Per Die) 5120 4096 7680 12288 8192
Cores (Total) 5120 4096 7680 12288 16,384
Memory Bus 256-bit 128-bit 256-bit 384-bit 384-bit x2?
Memory Type GDDR6 GDDR6 GDDR6 GDDR6 GDDR6
Memory Capacity Up To 16 GB Up To 8 GB Up To 16 GB Up To 24 GB Up To 32 GB
Memory Speed 16-18 Gbps TBD TBD 20 Gbps TBD
Memory Bandwidth 512-576 GB/s TBD TBD 960 GB/s TBD
Infinity Cache 128 MB 32 MB 64 MB 96/192 MB TBD
Flagship SKU Radeon RX 6900 XTX Radeon RX 7600 XT? Radeon RX 7800 XT?
Radeon RX 7700 XT?
Radeon RX 7900 XT? Radeon Pro
TBP 330W ~150W ~250W ~350W TBD
Launch Q4 2020 Q4 2022? Q4 2022? Q4 2022? 2023?



Read original article here

Valve Fixed Elden Ring Stuttering Just For The Steam Deck

Image: Valve | Kotaku

While I guess I’m one of the lucky ones, with my performance having been pretty good so far, there are a lot of people out there having problems trying to play Elden Ring on the PC. Those playing on Valve’s Steam Deck are not among them.

Stuttering has been a huge issue for PC players since the game’s launch, even after an update, and many suspected this was down to the way the game compiled shaders, or in this case, how it doesn’t do it very well (not that Elden Ring is alone here, google “compile shaders PC” and you’ll find a ton of games suffering performance hits and a case of the stutters).

This doesn’t happen on consoles because, with fixed hardware (as in, everyone’s console is exactly the same, and doesn’t have the infinite component variations present on PC), it can be done ahead of time instead of every time you fire a game up, as so often happens with a PC game. The Steam Deck, while being a PC, is also a piece of fixed hardware, so can enjoy the same benefits, provided Valve is able to implement them.

Which in this case they have. Here’s Valve’s Pierre-Loup Griffais showing off the Steam Deck version’s improvements last month in a preview build of an optimisation fix that is now live for all users:

As Griffais tells Eurogamer,

On the Linux/Proton side, we have a pretty extensive shader pre-caching system with multiple levels of source-level and binary cache representations pre-seeded and shared across users. On the Deck, we take this to the next level, since we have a unique GPU/driver combination to target, and the majority of the shaders that you run locally are actually pre-built on servers in our infrastructure. When the game is trying to issue a shader compile through its graphics API of choice, those are usually skipped, as we find the pre-compiled cache entry on disk.

That said, it turns out shader compiling wasn’t the main issue here, as was originally thought. Instead, Graffais says it was actually down to:

Shader pipeline-driven stutter isn’t the majority of the big hitches we’ve seen in that game. The recent example we’ve highlighted has more to do with the game creating many thousand resources such as command buffers at certain spots, which was making our memory manager go into overdrive trying to handle it. We cache such allocations more aggressively now, which seems to have helped a ton.

Who could have guessed that one of this handheld PC’s biggest surprise strengths would be the fact it was basically built like a console. Anyway if you want get really into the technical stuff behind this, Eurogamer and Digital Foundry put a video together explaining the nitty gritty here. And if you’re currently playing on PC and still having problems, a fan-made fix turned up overnight that might help.

Read original article here