The Hidden Cost of JSON

I found json.Marshal at the top of my flame graph and thought the profiler was broken.

It wasn't. Not my business logic. Not my database queries. Not even the network I/O. JSON serialization — the thing I'd never thought twice about — was the single largest CPU consumer in my service.

I closed the profiler, convinced I'd misconfigured something. I ran it again. Same result. Then I built a proper benchmark harness, tested eight libraries across two languages, and discovered something I wish I'd known years ago: the standard library JSON parser is never the fastest option. Not in Go. Not in Java. Not anywhere.

Why Nobody Notices

JSON is the universal data format. Every API speaks it. Every microservice produces it. The standard library's json.Marshal and json.Unmarshal are the default choice — they're built into the language, well-documented, and require zero configuration. They work with any struct you define. You write one line of code and move on.

That convenience is the trap.

The standard library optimizes for generality. It has to support every possible type in the language without code generation, without a build step, without configuration. To do that, it uses reflection — inspecting struct tags at runtime, iterating over fields by name, and dynamically dispatching to the right serializer for each type. Every. Single. Call.

This isn't negligence on the part of the standard library authors. It's the correct tradeoff for a library that needs to work with anything. But "correct" isn't the same as "fast." And in a hot path, the difference compounds into something real.

What Reflection Actually Costs You

Reflection has three costs:

CPU. The runtime has to walk your struct's type information, resolve field names, check tags, and dispatch to type-specific encoders. All of this happens at runtime, on every call, even when the struct never changes.

Memory. Reflective serialization allocates intermediate objects — maps for dynamic field lookup, reflect.Value wrappers for type inspection. These allocations add to heap pressure and increase the frequency and duration of garbage collection pauses.

Predictability. Reflection makes your serialization latency variable. The same struct can take different amounts of time to marshal depending on cache state, GC state, and the JIT compiler's decisions. In a system where tail latency matters — and most production systems do — that variability is worse than the average cost.

The irony is that compiled languages give you the tool to eliminate all three: the compiler already knows your type at compile time. Code generation just asks it to write the serialization code once, ahead of time, instead of rediscovering it on every call.

I Measured Eight JSON Libraries in Go

I built a benchmark harness. One payload — a nested sports-betting struct with 30+ types, ~3.5KB of JSON — run through eight Go JSON libraries, 100,000 messages each, on an Apple M5 Pro. Here's what the numbers said:

Library	Marshal	vs stdlib	Unmarshal	vs stdlib
`encoding/json`	4,176 ns	1.0×	6,417 ns	1.0×
`goccy/go-json`	2,473 ns	1.7×	2,904 ns	2.2×
`sonic.Fastest`	1,131 ns	3.7×	3,274 ns	2.0×
`jsoniter`	1,145 ns	3.6×	3,622 ns	1.8×
`easyjson`	1,008 ns	4.1×	2,143 ns	3.0×

easyjson doesn't just win on speed. It uses 50% less memory per operation — 2,197 bytes vs 3,453–3,470 for everyone else. That matters because less allocation means less GC pressure, which means less GC pause, which means better and more predictable tail latency.

What's interesting is how each library achieves its speed:

encoding/json: Pure reflection. Inspects every field at runtime. No shortcuts. Correct and general — the right default for most code.
sonic: Uses a JIT compiler that generates optimized serialization code at runtime via AVX2 SIMD instructions. On amd64 servers it can be 5–6× the standard library. But here's the catch: on arm64 — including every MacBook and every AWS Graviton instance — the JIT doesn't work. Sonic falls back to a reflection-like path and ends up being only 2–3× stdlib. Your architecture matters more than the benchmark chart suggests.
jsoniter: Iterator-based API that avoids reflection on the hot path. Excellent marshal performance — within 1% of sonic. But unmarshal is weaker: 1.8× stdlib with 105 allocations per operation, nearly double everyone else's. If your workload is marshal-heavy, jsoniter is the best no-codegen option.
easyjson: Code generation. At build time, it reads your struct definitions and emits optimized MarshalJSON/UnmarshalJSON methods that hardcode every field access. No reflection. No indirection. No runtime type inspection. The compiler inlines what it can, and the generated code is just a series of w.WriteString("field_name") calls. The downside: you need to run a code generator when your structs change. In Go, that's one line in your Makefile.

This Pattern Repeats in Every Language

I spent some time looking at what other languages do, and the pattern is identical.

Java is the most dramatic example. org.json is the simplest — pure reflection, no dependencies, easy to use. It's also the slowest. Jackson is the default for most Spring Boot applications, typically running 2–3× faster than org.json with default settings. But Jackson with Afterburner or Blackbird — modules that generate bytecode at startup instead of using reflection — can add another 30–50%. DSL-JSON goes further: it generates serialization code at compile time via annotation processing, eliminating reflection entirely. The performance gap between org.json and DSL-JSON on complex payloads can exceed 10×.

The Java ecosystem has the same hierarchy: reflection-based (org.json, Gson) → bytecode generation (Jackson+Afterburner) → compile-time codegen (DSL-JSON, Moshi with codegen). The more the compiler knows ahead of time, the less work happens at runtime.

C++ takes this to its logical conclusion. nlohmann/json — the most popular C++ JSON library — is beautifully designed, header-only, and uses modern C++ idioms. It's also 25× slower than simdjson, which uses SIMD vector instructions to parse JSON at over 3 GB/s on a single core. The difference isn't just SIMD — it's architectural. nlohmann/json builds an intermediate DOM tree in memory. simdjson walks the raw bytes on-demand, giving you values only when you ask for them. It never builds a tree it doesn't need.

When the Standard Library Is the Right Choice

I'm not telling you to rip out encoding/json from every file in your codebase. The standard library is the right choice when:

JSON is not on your hot path. If your service spends 95% of its time in the database and 1% in JSON serialization, a 4× speedup in that 1% is not worth the dependency.
Your payloads are small and your throughput is low.
You're prototyping, or building internal tools where correctness and minimal dependencies matter more than throughput.
You have a small number of types and they don't change often.

But if you've ever opened a CPU profile and seen json.Marshal near the top, or json.Unmarshal showing up in your allocation profile, the fix is likely one import change away. That's not a micro-optimization. That's free performance you're leaving on the table.

Five Rules for JSON Performance

The standard library is never the fastest option. In every compiled language with a reflective JSON parser, there exists a library that's 2–10× faster. The gap is structural, not incidental.
Speed comes from eliminating runtime reflection. Whether it's compile-time code generation (easyjson, DSL-JSON), JIT compilation (sonic), bytecode instrumentation (Jackson Afterburner), or SIMD-optimized parsing (simdjson), the playbook is the same: move decisions from runtime to build time.
Architecture matters. The library that wins on an x86 server might be ordinary on AWS Graviton or Apple Silicon. Sonic's AVX2 JIT is the most vivid example: 5–6× on amd64, 2–3× on arm64. Test on the hardware you deploy to.
Allocation pressure is as important as throughput. A library that's 3× faster but generates 2× the garbage will hurt your tail latency through GC pauses. easyjson wins not just on speed but on memory — 50% fewer bytes allocated per operation.
Measure before you switch. The right library is the one that improves your actual bottleneck, not the one that wins a benchmark. Profile first. Then reach for the faster JSON library.

JSON is the most successful data format in the history of software. It's simple, human-readable, and universally supported. But its simplicity hides a cost — a cost paid in CPU cycles, memory allocations, and garbage collection pauses. The standard library ships with a parser that's correct, safe, and general. Those are good defaults. They're not fast defaults. And in a world where every API call touches json.Marshal and json.Unmarshal, the difference between "correct" and "fast" compounds into something you can measure — and something you can fix.

Of course, the other option is to stop using JSON on the hot path entirely. Binary formats like Protobuf, MessagePack, and Avro sidestep the reflection problem at the protocol level — no parsing overhead, no field-name duplication, smaller wire size. But that's a bigger change than swapping one import, and it comes with its own tradeoffs: tooling, debuggability, schema management. I'll write about that comparison next.

The next time you import encoding/json, ask yourself: is this on my hot path? If it is, you're paying a tax you might not know about.