AI Training

Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.


Click here to view other performance data.


NVIDIA Performance on MLPerf 5.1 Training Benchmarks


NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Single Node, Closed Division

Framework Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf-ID Precision Dataset GPU Version
NemoLlama2-70B-Lora60.925 Eval loss8x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0058MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
8.50.925 Eval loss8x B300Nebius B300 n1 (8x B300-SXM-270GB)5.1-0008MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
90.925 Eval loss8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0067MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
8.90.925 Eval loss8x B2001xXE9680Lx8B200-SXM-180GB5.1-0030MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama3.1 8B67.43.3 log perplexity8x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0058Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
75.83.3 log perplexity8x B300Nebius B300 n1 (8x B300-SXM-275GB)5.1-0008Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
79.33.3 log perplexity8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0067Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
84.43.3 log perplexity8x B200SYS-422GS-NBRT-LCC5.1-0081Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
PyTorchRetinaNet22.334.0% mAP8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0068MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
21.534.0% mAP8x B200AS-A126GS-TNBR5.1-0079MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200-SXM-180GB)
DGLR-GAT572.0 % classification8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0065MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
4.972.0 % classification8x B200AS-A126GS-TNBR5.1-0079MixedIGBH-FullNVIDIA Blackwell GPU (B200-SXM-180GB)
NVIDIA Merlin HugeCTRDLRM-dcnv22.20.80275 AUC8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0066MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (GB200)
2.30.80275 AUC8x B200G894-AD15.1-0040MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (B200-SXM-180GB)

NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Multi Node, Closed Division

Framework Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf-ID Precision Dataset GPU Version
NVIDIA NeMoLlama 3.1 405B64.65.6 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
84.95.6 log perplexity512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
18.85.6 log perplexity2,560x GB200hsg (40x NVIDIA GB200 NVL72)5.1-0003Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
105.6 log perplexity5,120x GB200hsg (80x NVIDIA GB200 NVL72)5.1-0004Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
256.35.6 log perplexity256x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0087Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
147.25.6 log perplexity448x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0089Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
Llama2-70B-Lora1.20.925 Eval loss72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
0.40.925 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
1.40.925 Eval loss72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0092MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
0.50.925 Eval loss512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
5.80.925 Eval loss16x B200Nebius B200 n2 (16x B200-SXM-180GB)5.1-0006MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
3.10.925 Eval loss32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
1.90.925 Eval loss128x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0085MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama 3.1 8B133.3 log perplexity72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
5.23.3 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
153.3 log perplexity72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0063Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
5.43.3 log perplexity512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
51.83.3 log perplexity16x B200Nebius B200 n2 (16x B200-SXM-180GB)5.1-0006Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
27.83.3 log perplexity32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
18.13.3 log perplexity64x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0090Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
103.3 log perplexity256x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0087Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
PyTorchFlux1146.30.586 Eval loss16x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0059MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
44.50.586 Eval loss72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
17.10.586 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
160.70.586 Eval loss16x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0069MixedCC12MNVIDIA Blackwell GPU (GB200)
49.70.586 Eval loss72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0063MixedCC12MNVIDIA Blackwell GPU (GB200)
17.90.586 Eval loss512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071MixedCC12MNVIDIA Blackwell GPU (GB200)
12.50.586 Eval loss1152x GB200hsg (18x NVIDIA GB200 NVL72)5.1-0002MixedCC12MNVIDIA Blackwell GPU (GB200)
173.40.586 Eval loss16x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0086MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
93.20.586 Eval loss32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
54.50.586 Eval loss72x GB200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0091MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
RetinaNet3.834.0% mAP72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0064MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
1.434.0% mAP512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0072MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
12.334.0% mAP16x B2002xXE9680Lx8B200-SXM-180GB5.1-0037MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200)
10.134.0% mAP128x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0085MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200)
DGLR-GAT1.172.0 % classification72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0062MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
0.872.0 % classification256x GB200Tyche (4x NVIDIA GB200 NVL72)5.1-0070MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
3.272.0 % classification16x B2002xXE9680Lx8B200-SXM-180GB5.1-0037MixedIGBH-FullNVIDIA Blackwell GPU (B200)
NVIDIA Merlin HugeCTRDLRM-dcnv20.70.80275 AUC64x GB200SRS-GB200-NVL72-M1 (16x ARS-121GL-NBO)5.0-0087MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (GB200)

MLPerf™ v5.1 Training Closed: 5.0-0087, 5.1-0002, 5.1-0003, 5.1-0004, 5.1-0006, 5.1-0007, 5.1-0008, 5.1-0030, 5.1-0037, 5.1-0040, 5.1-0057, 5.1-0058, 5.1-0059, 5.1-0060, 5.1-0062, 5.1-0063, 5.1-0064, 5.1-0065, 5.1-0066, 5.1-0067, 5.1-0068, 5.1-0069, 5.1-0070, 5.1-0071, 5.1-0072, 5.1-0079, 5.1-0081, 5.1-0085, 5.1-0086, 5.1-0087, 5.1-0089, 5.1-0090, 5.1-0091, 5.1-0092 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For Training rules and guidelines, click here


LLM Training Performance on NVIDIA Data Center Products


GB300 Training Performance


Framework Model Time to Train (days) Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v32.44,691 tokens/sec/gpu256x GB300NVIDIA DGX GB300nemo:26.02.01409612132FP84096NVIDIA GB300
GPT-OSS 120B0.619,366 tokens/sec/gpu64x GB300NVIDIA DGX GB300nemo:26.02.01409611164BF161280NVIDIA GB300
Qwen3 235B a22B1.76,583 tokens/sec/gpu256x GB300NVIDIA DGX GB300nemo:26.02.01409614132FP88192NVIDIA GB300
Kimi K22.25,072 tokens/sec/gpu256x GB300NVIDIA DGX GB300nemo:26.02.01409614164FP84096NVIDIA GB300

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism
Time to Train is estimated time to train on 1T tokens with 1K GPUs

B200 Training Performance


Framework Model Time to Train (days) Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v34.12,790 tokens/sec/gpu256x B200NVIDIA DGX B200nemo:26.02.01409611618FP84096NVIDIA B200
GPT-OSS 120B0.813,722 tokens/sec/gpu64x B200NVIDIA DGX B200nemo:26.02.0140961118BF161280NVIDIA B200
Qwen3 30B a3B0.426,695 tokens/sec/gpu8x B200NVIDIA DGX B200nemo:26.02.0140961118FP8512NVIDIA B200

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism
Time to Train is estimated time to train on 1T tokens with 1K GPUs

H100 Training Performance


Framework Model Time to Train (days) Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoGPT-OSS 120B1.95,984 tokens/sec/gpu64x H100NVIDIA DGX H100nemo:26.02.0140961418BF161280H100-SXM5-80GB
Qwen3 30B a3B1.29,058 tokens/sec/gpu16x H100NVIDIA DGX H100nemo:26.02.0140961218FP81024H100-SXM5-80GB
Qwen3 235B a22B71,611 tokens/sec/gpu256x B200NVIDIA DGX H100nemo:26.02.01409628132FP88192H100-SXM5-80GB
Nemotron3 Nano0.814,890 tokens/sec/gpu16x B200NVIDIA DGX H100nemo:26.02.0181921118FP81024H100-SXM5-80GB

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism
Time to Train is estimated time to train on 1T tokens with 1K GPUs



View More Performance Data

AI Inference

Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.

Learn More

AI Pipeline

NVIDIA Riva is an application framework for multimodal conversational AI services that deliver real-performance on GPUs.

Learn More