Chasing Rabbits

A poorly updated blog about what I’m working on

Rewriting Rust Serialization, Part 2: Performance

As I said in the last post, Rust’s serialize library, specifically serialize::json is pretty slow. Back when I started this project a number of months ago, I wanted to benchmark to see how we compared to some other languages. There are a bunch of JSON benchmarks, but the one I chose was Cloudflare’s Go language. Goser, mainly because it was using a complex real world log structure, and they did the hard work of implementing benchmarks for encoding/json, goprotobuf, gogoprotobuf, and go-capnproto. I also included the Go ffjson and C++ rapidjson, which both claim to be the fastest JSON libraries for those languages. Here are the results I got:

language library format serialization (MB/s) deserialization (MB/s)
C++ rapidjson JSON 294 164 (DOM) / 192 (SAX)
Go encoding/json JSON 71.47 25.09
Go ffjson JSON 156.67 (not supported)
Go goprotobuf Protocol Buffers 148.78 99.57
Go gogoprotobuf Protocol Buffers 519.48 319.40
Go go-capnproto Cap’n Proto 3419.54 665.35
Rust serialize::json JSON 40-ish 10-ish

Notes:

  • rapidjson supports both DOM-style and SAX-style deserializing. DOM-style means deserializing into a generic object, then from there into the final object, SAX-style means a callback approach where a callback handler is called for each JSON token.
  • Go’s encoding/json uses reflection to serialize arbitrary values. ffjson uses code generation to get it’s serialization sped, but it doesn’t implement deserialization.
  • both goprotobuf and gogoprotobuf use code generation, but gogoprotobuf uses Protocol Buffer’s extension support to do cheaper serialization.
  • Cap’n Proto doesn’t really do serialization, but lays the serialized data out just like it is in memory so it has nearly zero serialization speed.
  • The Rust numbers are from a couple months ago and I couldn’t track down the exact numbers.

So. Yikes. Not only are we no where near rapidjson, we were being soundly beaten by Go’s reflection-based framework encoding/json. Even worse, our compile time was at least 10 times theirs. So, not pretty at all.

But that was a couple months ago. Between then and now, Patrick Walton, Luqman Aden, myself, and probably lots others found and fixed a number of bugs across serialize::json, std::io, generic function calls, and more. All this work got us to more than double our performance:

language library format serialization (MB/s) deserialization (MB/s)
Rust serialize::json JSON 117 25

We’re (kind of) beating Go! At least the builtin reflection-based solution. Better, but not great. I think our challenge is those dang closures. While LLVM can optimize simple closures, it seems to have a lot of trouble with all these recursive closure calls. While having finished unboxed closures might finally let us break through this performance bottleneck, it’s not guaranteed.

All in all, this, and the representational problems from post 1 make it pretty obvious we got some fundamental issues here and we need to use an alternative solution. Next post I’ll start getting into the details of the design of serde.