HomeBlogA100 vs H100: Should You Pay for Hopper or Stick with Ampere? (2026)
HardwareApr 6, 20267 min read

A100 vs H100: Should You Pay for Hopper or Stick with Ampere? (2026)

A100 vs H100 comparison across cost, performance, FP8 support, and real AI workloads. See when H100 justifies the premium and when A100 still makes sense in 2026.

M

Mercatus Compute

Author

A100 vs H100: Should You Pay for Hopper or Stick with Ampere? (2026)

The A100 is six years old in 2026. It’s still everywhere.

That fact alone tells you the comparison isn’t as one-sided as the spec sheet implies. H100 is faster, newer, and more capable on every measurable axis — but A100 80GB still wins specific workloads at meaningful cost savings, and a refurbished A100 in your colo can be the right answer for projects that don’t need what H100 offers.

This guide compares them on specs, real cost, and workload-by-workload winners — and tells you when each is genuinely the right choice.

For the broader three-way comparison including H200, see A100 vs H100 vs H200.

TL;DR

Pick H100 if: you’re training foundation models, doing FP8 mixed-precision work, running inference at high throughput, or building a multi-year fleet from scratch.

Pick A100 if: you’re fine-tuning open-source models with LoRA/QLoRA, running research workloads, doing inference on small/medium models, or operating under capital constraints.

The honest median answer: if you’re buying GPUs new in 2026, you buy H100s. The A100 case is mostly for teams that already have A100s, or for teams whose workloads genuinely don’t need Hopper-generation features.

Specs head-to-head

SpecificationA100 80GB SXM4H100 SXM5Delta
ArchitectureAmpereHopperOne generation
Released202020222 years
Process nodeTSMC 7nmTSMC 4NSmaller node
Memory80 GB HBM2e80 GB HBM3Same capacity, faster gen
Memory bandwidth2.0 TB/s3.35 TB/s+67%
FP16 Tensor (dense)312 TFLOPS989 TFLOPS+217%
BF16 Tensor (dense)312 TFLOPS989 TFLOPS+217%
FP8 Tensor (dense)not supported1,979 TFLOPSA100 lacks FP8
TDP400W700W+75%
NVLink bandwidth600 GB/s900 GB/s+50%
OEM price (2026 new)n/a (mostly retired from new sales)$25,000 – $30,000
Refurb / used (2026)$8,000 – $15,000$18,000 – $22,000A100 ~50% cheaper
Cloud $/hr (typical)$1.10 – $1.80$1.99 – $3.50H100 ~75% premium

The headline gap is 3.2× compute at 1.75× power — H100 delivers materially more performance per watt. The deeper gap is FP8 support: A100 doesn’t have it.

Why FP8 matters more than the compute number suggests

The Transformer Engine on H100 is the under-discussed reason A100 has fallen behind for new training runs. Here’s what it does in practice:

Most modern transformer training runs use FP8 mixed precision. The Transformer Engine automatically casts tensors between FP16 and FP8 based on numerical sensitivity, achieving up to 6× the throughput of pure FP16 training without measurable accuracy loss for the model.

A100 doesn’t support FP8. It has to fall back to FP16 (or BF16) for the same workload. That has two consequences:

Throughput. A100 runs at FP16/BF16 throughput (312 TFLOPS), while H100 runs at FP8 throughput (1,979 TFLOPS dense). For FP8-amenable training, H100 is ~6× faster than A100, not the 3.2× that pure-compute comparisons suggest.

Memory pressure. FP16 weights take twice the memory of FP8 weights. A 70B model that fits comfortably in FP8 on H100 requires significantly more memory in FP16 on A100 — often forcing additional parallelism strategies that come with their own throughput costs.

For workloads that don’t use FP8 (most fine-tuning, smaller-scale training, inference of older models that weren’t optimized for FP8), the Transformer Engine doesn’t activate, and the gap narrows back to the 3.2× pure-compute number. That’s when A100 looks competitive.

When A100 is still the right answer in 2026

The A100 case in 2026 isn’t “it’s still good.” It’s: for these specific workloads, A100 delivers acceptable performance at materially lower cost.

Fine-tuning open-source models with LoRA / QLoRA

LoRA fine-tuning trains only a small fraction of model parameters. The base model (frozen) runs through the GPU; LoRA adapters update. This workload:

  • Doesn’t saturate H100 compute
  • Doesn’t need FP8 (LoRA training is typically BF16)
  • Runs in 80GB memory comfortably for 7B–30B base models
  • Wall-clock-finishes in similar time on A100 as H100

For LoRA fine-tuning, A100 80GB at $1.20/hr cloud rental finishes the job at the same time as H100 at $2.50/hr. The cost difference is real and the performance difference isn’t.

Inference of small and medium models

Llama 3 8B, Mistral 7B, smaller fine-tunes — these models don’t saturate H100. A100 throughput at typical batch sizes is sufficient for production serving. Going to H100 buys you compute headroom you won’t use.

Research, exploration, dev work

Variable workloads, frequent experimentation, sensitivity to cost. A100s in cheap providers or on the secondary market are the practical answer. Save the H100 budget for the workloads that need it.

Capital-constrained academic and startup work

A refurbished A100 80GB at $10,000 vs a new H100 at $28,000. For a startup or research group with a budget, the difference can be the difference between owning compute and not. The A100 is “good enough” for most use cases short of training a frontier model.

When you already have A100 fleet

The strongest A100 case is operational: if you already operate A100s, the marginal cost of using them is low. Migrating workloads to H100 only makes sense for FP8-relevant training and high-throughput inference where the throughput delta justifies the cost.

When H100 is unambiguously better

The flip side: H100 is the right answer for these workloads, and the cost gap is justified.

Training foundation models from scratch

Any frontier-class training run in 2026 happens on H100 (or H200, or Blackwell-generation hardware). FP8 throughput, NVLink 4 bandwidth for tensor parallelism, and dense compute matter compoundingly at scale. A 70B+ training run that takes 30 days on H100 takes 90+ days on A100 — and the cost differential favors H100 once you account for time.

High-throughput inference of large models

Llama 3 70B+ inference at production scale needs H100 throughput, especially with FP8 quantization. A100 fallbacks add latency that shows up in user-facing metrics.

Multi-year fleet builds

If you’re acquiring GPUs for a 3+ year operational horizon starting in 2026, H100 is the default. A100s are 6 years old at start; by year 3, they’ll be 9 years old and on the depreciation tail. H100 holds value better and supports longer software-stack support windows.

Cost comparison — when does the H100 premium pay back?

The simple version: H100 is 75–100% more expensive than A100 in the cloud, but for FP8-amenable workloads it’s 3–6× faster. Throughput-per-dollar tilts heavily toward H100 for those workloads.

A worked example: training a 13B model fine-tune to convergence.

A100 80GBH100 SXM5
Cloud rate$1.50/hr (specialty provider)$2.80/hr
Wall-clock time96 hours32 hours (FP8 + raw compute)
Total compute cost$144$90

H100 wins on absolute cost because the wall-clock advantage dominates the per-hour premium. This pattern holds for almost any FP8-amenable training workload.

For LoRA fine-tuning where FP8 doesn’t activate:

A100 80GBH100 SXM5
Cloud rate$1.50/hr$2.80/hr
Wall-clock time24 hours22 hours
Total compute cost$36$62

A100 wins because the wall-clock is similar and the per-hour rate is half.

The pattern: H100’s cost premium pays back when its features (FP8, raw compute, NVLink bandwidth) actually activate for your workload. When they don’t, A100 wins on cost.

The secondary market for A100s

One thing that’s changed in 2026: the A100 secondary market has matured. Refurbished A100 80GB cards now trade in a real market at $8,000–$15,000, with reasonable warranty terms (typically 12-month limited).

This changes the A100 case substantially. A new H100 at $28,000 vs a refurb A100 at $10,000 — for workloads that don’t need Hopper, the 2.8× capex differential dominates. For an academic group, a research startup, or a team building dev infrastructure, the A100 secondary market is a genuine option.

The risk: refurb cards have unknown remaining useful life. NVIDIA’s stated lifespan for A100 in production is 5–7 years; at 2026, refurbs are halfway through that window. For 1–2 year horizons, refurb A100 makes sense. For 4+ year horizons, buying new H100 is safer.

For the underlying depreciation curve, see GPU Depreciation: How Fast Do H100s Lose Value?

What this means for buyers (and providers)

If you’re an API buyer rather than a GPU operator: providers running A100 fleets typically offer cheaper inference for smaller models, while H100-fleet providers handle large model serving better. Mercatus’s Token Index makes this visible — you can see per-model pricing across providers and infer which underlying hardware they’re running.

If you operate a fleet of A100s: there’s still a buyer base for them. Smaller models, fine-tuning, research workloads. Listing A100 capacity on Mercatus reaches that audience without sales overhead.

→ Become a Provider

For the broader market thesis on why all of this is opening up, see The Open AI Compute Economy.

Frequently Asked Questions

Is the A100 obsolete in 2026?

No. It’s a generation behind, but for fine-tuning, smaller-model inference, and research, A100 80GB is still capable hardware. The case for A100 in 2026 is value-driven: 50%+ cheaper than H100 for workloads that don’t activate Hopper-generation features.

Can I train a foundation model on A100s?

Technically yes; practically no. Foundation-model training runs without FP8 take ~3× longer on A100 than on H100, often shifting the timeline from 30 days to 90 days. The cost arithmetic favors H100 for any training run at frontier scale.

Should I buy refurbished A100s in 2026?

For 1–2 year horizons with budget constraints and workloads that don’t need FP8: yes, refurb A100s at $8K–$15K are a defensible option. For 4+ year horizons or production-critical work, buy new H100. The remaining lifespan on refurb cards is the risk.

How much faster is H100 than A100 for inference?

For small/medium models: ~2× faster, but most workloads don’t saturate H100 anyway. For large models (70B+) with FP8 quantization: 3–6× faster. For long-context inference: H200 outperforms both — see H100 vs H200 Cost: Is the Upgrade Worth It?

What about A100 40GB vs 80GB?

The 40GB variant is harder to recommend in 2026. Most modern open-source models (7B+) hit memory limits faster on 40GB. The 80GB variant is the version that’s still useful. If you’re acquiring A100s now, get the 80GB.

Where do I find current A100 and H100 pricing?

Mercatus GPU Index tracks live pricing across 22+ providers. The cross-provider spread is large for both cards (~3× for A100, ~2.5× for H100), so the price you see at a hyperscaler is rarely the best available.

Methodology

Specifications sourced from NVIDIA’s A100 and H100 datasheets. Cloud pricing from Mercatus GPU Index, May 2026. Refurb pricing reflects public quotes from secondary-market resellers and broker channels. Last verified: 2026-05-04. Methodology: https://docs.mercatus-ai.com/methodology