Ampere (microarchitecture) GPU microarchitecture by Nvidia
Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020, and is named after French mathematician and physicist André-Marie Ampère .[ 1] [ 2]
Nvidia announced the Ampere architecture GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.[ 3] [ 4] Nvidia announced the A100 80 GB GPU at SC20 on November 16, 2020.[ 5] Mobile RTX graphics cards and the RTX 3060 based on the Ampere architecture were revealed on January 12, 2021.[ 6]
Nvidia announced Ampere's successor, Hopper , at GTC 2022, and "Ampere Next Next" (Blackwell ) for a 2024 release at GPU Technology Conference 2021.
Details
Architectural improvements of the Ampere architecture include the following:
CUDA Compute Capability 8.0 for A100 and 8.6 for the GeForce 30 series [ 7]
TSMC 's 7 nm FinFET process for A100
Custom version of Samsung 's 8 nm process (8N) for the GeForce 30 series[ 8]
Third-generation Tensor Cores with FP16, bfloat16 , TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.[ 9] The individual Tensor cores have with 256 FP16 FMA operations per clock 4x processing power (GA100 only, 2x on GA10x) compared to previous Tensor Core generations; the Tensor Core Count is reduced to one per SM.
Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series
High Bandwidth Memory 2 (HBM2) on A100 40 GB & A100 80 GB
GDDR6X memory for GeForce RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070 Ti
Double FP32 cores per SM on GA10x GPUs
NVLink 3.0 with a 50 Gbit/s per pair throughput[ 9]
PCI Express 4.0 with SR-IOV support (SR-IOV is reserved only for A100)
Multi-instance GPU (MIG) virtualization and GPU partitioning feature in A100 supporting up to seven instances
PureVideo feature set K hardware video decoding with AV1 hardware decoding[ 10] for the GeForce 30 series and feature set J for A100
5 NVDEC for A100
Adds new hardware-based 5-core JPEG decode (NVJPG ) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)
Chips
GA100[ 11]
GA102
GA103
GA104
GA106
GA107
GA10B
Comparison of Compute Capability: GP100 vs GV100 vs GA100[ 12]
GPU features
Nvidia Tesla P100
Nvidia Tesla V100
Nvidia A100
GPU codename
GP100
GV100
GA100
GPU architecture
Pascal
Volta
Ampere
Compute capability
6.0
7.0
8.0
Threads / warp
32
32
32
Max warps / SM
64
64
64
Max threads / SM
2048
2048
2048
Max thread blocks / SM
32
32
32
Max 32-bit registers / SM
65536
65536
65536
Max registers / block
65536
65536
65536
Max registers / thread
255
255
255
Max thread block size
1024
1024
1024
FP32 cores / SM
64
64
64
Ratio of SM registers to FP32 cores
1024
1024
1024
Shared Memory Size / SM
64 KB
Configurable up to 96 KB
Configurable up to 164 KB
Comparison of Precision Support Matrix[ 13] [ 14]
Supported CUDA Core Precisions
Supported Tensor Core Precisions
FP16
FP32
FP64
INT1
INT4
INT8
TF32
BF16
FP16
FP32
FP64
INT1
INT4
INT8
TF32
BF16
Nvidia Tesla P4
No
Yes
Yes
No
No
Yes
No
No
No
No
No
No
No
No
No
No
Nvidia P100
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
No
No
No
No
Nvidia Volta
Yes
Yes
Yes
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Nvidia Turing
Yes
Yes
Yes
No
No
No
No
No
Yes
No
No
Yes
Yes
Yes
No
No
Nvidia A100
Yes
Yes
Yes
No
No
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Legend:
FPnn: floating point with nn bits
INTn: integer with n bits
INT1: binary
TF32: TensorFloat32
BF16: bfloat16
Comparison of Decode Performance
Concurrent streams
H.264 decode (1080p30)
H.265 (HEVC) decode (1080p30)
VP9 decode (1080p30)
V100
16
22
22
A100
75
157
108
Ampere dies
Die
GA100[ 15]
GA102[ 16]
GA103[ 17]
GA104[ 18]
GA106[ 19]
GA107[ 20]
GA10B[ 21]
GA10F
Die size
826 mm2
628 mm2
496 mm2
392 mm2
276 mm2
200 mm2
448 mm2
?
Transistors
54.2B
28.3B
22B
17.4B
12B
8.7B
21B
?
Transistor density
65.6 MTr/mm2
45.1 MTr/mm2
44.4 MTr/mm2
44.4 MTr/mm2
43.5 MTr/mm2
43.5 MTr/mm2
46.9 MTr/mm2
?
Graphics processing clusters
8
7
6
6
3
2
2
1
Streaming multiprocessors
128
84
60
48
30
20
16
12
CUDA cores
8192
10752
7680
6144
3840
2560
2048
1536
Texture mapping units
512
336
240
192
120
80
64
48
Render output units
192
112
96
96
48
32
32
16
Tensor cores
512
336
240
192
120
80
64
48
RT cores
N/A
84
60
48
30
20
8
12
L1 cache
24 MB
10.5 MB
7.5 MB
6 MB
3 MB
2.5 MB
3 MB
1.5 MB
192 KB per SM
128 KB per SM
192 KB per SM
128 KB per SM
L2 cache
40 MB
6 MB
4 MB
4 MB
3 MB
2 MB
4 MB
1 MB
A100 accelerator and DGX A100
The Ampere-based A100 accelerator was announced and released on May 14, 2020.[ 9] The A100 features 19.5 teraflops of FP32 performance, 6912 FP32/INT32 CUDA cores, 3456 FP64 CUDA cores, 40 GB of graphics memory, and 1.6 TB/s of graphics memory bandwidth.[ 22] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.[ 9] Also included in the DGX A100 is 15 TB of PCIe gen 4 NVMe storage,[ 22] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox -powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[ 9]
Comparison of accelerators used in DGX:[ 23] [ 24] [ 25]
Model
Architecture
Socket
FP32 CUDA cores
FP64 cores (excl. tensor)
Mixed INT32/FP32 cores
INT32 cores
Boost clock
Memory clock
Memory bus width
Memory bandwidth
VRAM
Single precision (FP32)
Double precision (FP64)
INT8 (non-tensor)
INT8 dense tensor
INT32
FP4 dense tensor
FP16
FP16 dense tensor
bfloat16 dense tensor
TensorFloat-32 (TF32) dense tensor
FP64 dense tensor
Interconnect (NVLink)
GPU
L1 Cache
L2 Cache
TDP
Die size
Transistor count
Process
Launched
P100
Pascal
SXM/SXM2
3584
1792
N/A
N/A
1480 MHz
1.4 Gbit/s HBM2
4096-bit
720 GB/sec
16 GB HBM2
10.6 TFLOPS
5.3 TFLOPS
N/A
N/A
N/A
N/A
21.2 TFLOPS
N/A
N/A
N/A
N/A
160 GB/sec
GP100
1344 KB (24 KB × 56)
4096 KB
300 W
610 mm2
15.3 B
TSMC 16FF+
Q2 2016
V100 16GB
Volta
SXM2
5120
2560
N/A
5120
1530 MHz
1.75 Gbit/s HBM2
4096-bit
900 GB/sec
16 GB HBM2
15.7 TFLOPS
7.8 TFLOPS
62 TOPS
N/A
15.7 TOPS
N/A
31.4 TFLOPS
125 TFLOPS
N/A
N/A
N/A
300 GB/sec
GV100
10240 KB (128 KB × 80)
6144 KB
300 W
815 mm2
21.1 B
TSMC 12FFN
Q3 2017
V100 32GB
Volta
SXM3
5120
2560
N/A
5120
1530 MHz
1.75 Gbit/s HBM2
4096-bit
900 GB/sec
32 GB HBM2
15.7 TFLOPS
7.8 TFLOPS
62 TOPS
N/A
15.7 TOPS
N/A
31.4 TFLOPS
125 TFLOPS
N/A
N/A
N/A
300 GB/sec
GV100
10240 KB (128 KB × 80)
6144 KB
350 W
815 mm2
21.1 B
TSMC 12FFN
A100 40GB
Ampere
SXM4
6912
3456
6912
N/A
1410 MHz
2.4 Gbit/s HBM2
5120-bit
1.52 TB/sec
40 GB HBM2
19.5 TFLOPS
9.7 TFLOPS
N/A
624 TOPS
19.5 TOPS
N/A
78 TFLOPS
312 TFLOPS
312 TFLOPS
156 TFLOPS
19.5 TFLOPS
600 GB/sec
GA100
20736 KB (192 KB × 108)
40960 KB
400 W
826 mm2
54.2 B
TSMC N7
Q1 2020
A100 80GB
Ampere
SXM4
6912
3456
6912
N/A
1410 MHz
3.2 Gbit/s HBM2e
5120-bit
1.52 TB/sec
80 GB HBM2e
19.5 TFLOPS
9.7 TFLOPS
N/A
624 TOPS
19.5 TOPS
N/A
78 TFLOPS
312 TFLOPS
312 TFLOPS
156 TFLOPS
19.5 TFLOPS
600 GB/sec
GA100
20736 KB (192 KB × 108)
40960 KB
400 W
826 mm2
54.2 B
TSMC N7
H100
Hopper
SXM5
16896
4608
16896
N/A
1980 MHz
5.2 Gbit/s HBM3
5120-bit
3.35 TB/sec
80 GB HBM3
67 TFLOPS
34 TFLOPS
N/A
1.98 POPS
N/A
N/A
N/A
990 TFLOPS
990 TFLOPS
495 TFLOPS
67 TFLOPS
900 GB/sec
GH100
25344 KB (192 KB × 132)
51200 KB
700 W
814 mm2
80 B
TSMC 4N
Q3 2022
H200
Hopper
SXM5
16896
4608
16896
N/A
1980 MHz
6.3 Gbit/s HBM3e
6144-bit
4.8 TB/sec
141 GB HBM3e
67 TFLOPS
34 TFLOPS
N/A
1.98 POPS
N/A
N/A
N/A
990 TFLOPS
990 TFLOPS
495 TFLOPS
67 TFLOPS
900 GB/sec
GH100
25344 KB (192 KB × 132)
51200 KB
1000 W
814 mm2
80 B
TSMC 4N
Q3 2023
B100
Blackwell
SXM6
N/A
N/A
N/A
N/A
N/A
8 Gbit/s HBM3e
8192-bit
8 TB/sec
192 GB HBM3e
N/A
N/A
N/A
3.5 POPS
N/A
7 PFLOPS
N/A
1.98 PFLOPS
1.98 PFLOPS
989 TFLOPS
30 TFLOPS
1.8 TB/sec
GB100
N/A
N/A
700 W
N/A
208 B
TSMC 4NP
Q4 2024 (expected)
B200
Blackwell
SXM6
N/A
N/A
N/A
N/A
N/A
8 Gbit/s HBM3e
8192-bit
8 TB/sec
192 GB HBM3e
N/A
N/A
N/A
4.5 POPS
N/A
9 PFLOPS
N/A
2.25 PFLOPS
2.25 PFLOPS
1.2 PFLOPS
40 TFLOPS
1.8 TB/sec
GB100
N/A
N/A
1000 W
N/A
208 B
TSMC 4NP
Products using Ampere
GeForce MX series
GeForce MX570 (mobile) (GA107)
GeForce 20 series
GeForce RTX 2050 (mobile) (GA107)
GeForce 30 series
GeForce RTX 3050 Laptop GPU (GA107)
GeForce RTX 3050 (GA106 or GA107)[ 26]
GeForce RTX 3050 Ti Laptop GPU (GA107)
GeForce RTX 3060 Laptop GPU (GA106)
GeForce RTX 3060 (GA106 or GA104)[ 27]
GeForce RTX 3060 Ti (GA104 or GA103)[ 28]
GeForce RTX 3070 Laptop GPU (GA104)
GeForce RTX 3070 (GA104)
GeForce RTX 3070 Ti Laptop GPU (GA104)
GeForce RTX 3070 Ti (GA104 or GA102)[ 29]
GeForce RTX 3080 Laptop GPU (GA104)
GeForce RTX 3080 (GA102)
GeForce RTX 3080 12 GB (GA102)
GeForce RTX 3080 Ti Laptop GPU (GA103)
GeForce RTX 3080 Ti (GA102)
GeForce RTX 3090 (GA102)
GeForce RTX 3090 Ti (GA102)
Nvidia Workstation GPUs (formerly Quadro )
RTX A1000 (mobile) (GA107)
RTX A2000 (mobile) (GA106)
RTX A2000 (GA106)
RTX A3000 (mobile) (GA104)
RTX A4000 (mobile) (GA104)
RTX A4000 (GA104)
RTX A5000 (mobile) (GA104)
RTX A5500 (mobile) (GA103)
RTX A4500 (GA102)
RTX A5000 (GA102)
RTX A5500 (GA102)
RTX A6000 (GA102)
A800 Active
Nvidia Data Center GPUs (formerly Tesla )
Nvidia A2 (GA107)
Nvidia A10 (GA102)
Nvidia A16 (4 × GA107)
Nvidia A30 (GA100)
Nvidia A40 (GA102)
Nvidia A100 (GA100)
Nvidia A100 80 GB (GA100)
Nvidia A100X
NVIDIA A30X
Products using Ampere (per Chip)
Type
GA10B
GA107
GA106
GA104
GA103
GA102
GA100
GeForce MX series
—
GeForce MX570 (mobile)
—
—
—
—
—
GeForce 20 series
—
GeForce RTX 2050 (mobile)
—
—
—
—
—
GeForce 30 series
—
GeForce RTX 3050 Laptop GeForce RTX 3050 GeForce RTX 3050 Ti Laptop
GeForce RTX 3050 GeForce RTX 3060 Laptop GeForce RTX 3060
GeForce RTX 3060 GeForce RTX 3060 Ti GeForce RTX 3070 Laptop GeForce RTX 3070 GeForce RTX 3070 Ti Laptop GeForce RTX 3070 Ti GeForce RTX 3080 Laptop
GeForce RTX 3060 Ti GeForce RTX 3080 Ti Laptop
GeForce RTX 3070 Ti GeForce RTX 3080 GeForce RTX 3080 Ti GeForce RTX 3090 GeForce RTX 3090 Ti
—
Nvidia Workstation GPUs
—
RTX A1000 (mobile)
RTX A2000 (mobile) RTX A2000
RTX A3000 (mobile) RTX A4000 (mobile) RTX A4000 RTX A5000 (mobile)
RTX A5500 (mobile)
RTX A4500 RTX A5000 RTX A5500 RTX A6000
—
Nvidia Data Center GPUs
—
Nvidia A2 Nvidia A16
—
—
—
Nvidia A10 Nvidia A40
Nvidia A30 Nvidia A100
Tegra SoCs
AGX Orin Orin NX Orin Nano
—
—
—
—
—
—
See also
References
^ "NVIDIA's New Ampere Data Center GPU in Full Production" . NVIDIA News . May 14, 2020.
^ Krashinsky, Ronny; Giroux, Olivier; Jones, Stephen; Stam, Nick; Ramaswamy, Sridhar (May 14, 2020). "NVIDIA Ampere Architecture In-Depth" . NVIDIA Developer Blog .
^ "NVIDIA Delivers Greatest-Ever Generational Leap with GeForce RTX 30 Series GPUs" . Nvidia Newsroom . September 1, 2020. Retrieved April 9, 2023 .
^ "NVIDIA GeForce Ultimate Countdown" . Nvidia .
^ "NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World's Most Powerful GPU for AI Supercomputing" . Nvidia Newsroom . November 16, 2020. Retrieved April 9, 2023 .
^ "NVIDIA GeForce Beyond at CES 2023" . NVIDIA .
^ "I.7. Compute Capability 8.x" . Nvidia . Retrieved September 23, 2020 .
^ Bosnjak, Dominik (September 1, 2020). "Samsung's old 8nm tech at the heart of NVIDIA's monstrous Ampere cards" . SamMobile . Retrieved September 19, 2020 .
^ a b c d e Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator" . AnandTech.
^ Delgado, Gerardo (September 1, 2020). "GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode" . Nvidia . Retrieved April 9, 2023 .
^ Morgan, Timothy Prickett (May 29, 2020). "Diving Deep Into The Nvidia Ampere GPU Architecture" . The Next Platform . Retrieved March 24, 2022 .
^ "NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Accerlation at Every Scale" (PDF) . Nvidia . Retrieved September 18, 2020 .
^ "NVIDIA Tensor Cores: Versatility for HPC & AI" . NVIDIA .
^ "Abstract" . docs.nvidia.com .
^ "NVIDIA A100 Tensor Core GPU Architecture" (PDF) . NVIDIA Corporation . Retrieved April 29, 2024 .
^ "NVIDIA GA102 GPU Specs" . TechPowerUp . Retrieved April 29, 2024 .
^ "NVIDIA GA103 GPU Specs" . TechPowerUp . Retrieved April 29, 2024 .
^ "NVIDIA GA104 GPU Specs" . TechPowerUp . Retrieved April 29, 2024 .
^ "NVIDIA GA106 GPU Specs" . TechPowerUp . Retrieved April 29, 2024 .
^ "NVIDIA GA107 GPU Specs" . TechPowerUp . Retrieved April 29, 2024 .
^ "NVIDIA AGX Orin Series Technical Brief v1.2" (PDF) . NVIDIA Corporation . Retrieved April 29, 2024 .
^ a b Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC" . The Verge.
^ Smith, Ryan (March 22, 2022). "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder" . AnandTech.
^ Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator" . AnandTech.
^ "NVIDIA Tesla V100 tested: near unbelievable GPU power" . TweakTown . September 17, 2017.
^ Igor, Wallossek (February 13, 2022). "The two faces of the GeForce RTX 3050 8GB" . Igor's Lab . Retrieved February 23, 2022 .
^ Shilov, Anton (September 25, 2021). "Gainward and Galax List GeForce RTX 3060 Cards With GA104 GPU" . Tom's Hardware . Retrieved September 23, 2022 .
^ Tyson, Mark (February 23, 2022). "Zotac Debuts First RTX 3060 Ti Desktop Cards With GA103 GPU" . Tom's Hardware . Retrieved September 23, 2022 .
^ WhyCry (October 26, 2022). "ZOTAC launches GeForce RTX 3070 Ti with GA102-150 GPU" . VideoCardz . Retrieved May 21, 2023 .
^ "Nintendo Switch 2 teardown confirms Nvidia Tegra T239 chip, SK Hynix memory, more details" . TechSpot . April 24, 2025. Retrieved May 31, 2025 .
External links
Software and technologies
Multimedia acceleration Software Technologies GPU microarchitectures