I’m thrilled to announce Serde 0.7.0!
It’s been a long time coming, and has a number of long awaited new features,
breaking changes, and other notable changes. Serde 0.6.x is now deprecated,
and while I’ll try to keep serde_codegen and serde_macros while projects
switch over to 0.7, I’m going to shift more to a pull-based approach, so please
file a bug ticket if a nightly release has broken you.
On to the list on the major changes!
Serde
Removed the word Error from serde::de::Error variants.
Renamed visit_ methods to serialize_ and deserialize_.
Renamed serde::de::Error::syntax to serde::de::Error::custom.
Serde Codegen and Macros
Serde now by default ignores unknown fields when deserializing. The previous
behavior, where Serde will report unknown fields as an error, can be
opted in with the container annotation #[serde(deny_unknown_fields)],
as in:
Added the variantl annotation #[serde(rename="...")] to rename
variants, as in:
1234
enumValue{#[serde(rename="type")]Type}
Added rename annotation #[serde(rename(serialize="...", deserialize="..."))]
that supports crazy schemas like AWS that expect serialized fields with the
first character lower case, and the deserialized response with the first
character upper cased.
Removed support for the unused format-specific rename support.
Added the field annotation #[serde(default="$path")], where $path is a
reference to some function that returns a default value for a field if it’s
not present when deserializing. For example:
Added the field annotation #[serde(skip_serializing_if="$path")], where
$path is a path reference to some function that returns a bool, that if
true, should skip serializing the field.
Added the field annotations #[serde(serialize_with="$path")] and
#[serde(deserialize_with="$path")], where $path us a path reference to
some function that serializes a field, as in:
Added StreamDeserializer, that enables parsing a stream of JSON values optionally
separated by whitespace into an iterator of those deserialized values.
Thanks
I’d like to thank everyone that’s helped out over the past few months. Please
forgive me if I accidentally left you off the list:
As I mentioned in the last
part,
Stateful has some challenges it needs to overcome in order to add new and
exciting control flow mechanisms to Rust. While we don’t get access to any of
the cool analysis passes inside the Rust compiler, Stateful is able to sneak
around their necessity in many cases since it really only needs to support a
subset of Rust. Here are some of the techniques it exploits, err, uses.
Variables
First off, let’s talk about variables. One of the primary things Stateful
needs to do is manage the process of state flowing through the machine. However,
consider a statement like this:
1
letx=...;
“Obviously it’s a variable, right?” Actually you can’t be sure. What if
someone did:
The next problem is typing. Sure, Rust is nice and all that you can write a
local variable like let x = ... and it’ll infer the type for you. All Rust
asks for is that the user explicitly specify the type of a value that enters or
leaves the bounds of a function. Our problem is that one of the main tasks of
Stateful is to lift variables into some State structure so that their
available when the function is re-entered. So in effect, all variables inside
Stateful must be typed. Consider the example from last week:
This State enumeration is what I’m talking about. It gets passed into and
out of the advance function. It needs to be some concrete type, which looks
something like this:
So how can we resolve this? Well first, we could wait for
RFC 105
or RFC 1305 to get implemented,
but that’s not happening any time soon. Until then, there is cheat number two:
Hide state variables in a boxed trait. This one is from Eduard Burtescu.
Instead of the nice well typed example from the last post, we actually generate
some code that hides the types with an over abundance of generic types:
All for the cost of a boxed variable. It’s not ideal, but it does let us keep
experimenting. However, if we do want to avoid this allocation, we can just
require that all variables that survive across a yield point have their type
specified. So our previous example would be written as:
The type of iter, by the way, is impossible to write because there is
currently no way to specify the type of the closure. Instead, it needs to be
rewritten to use a free function:
The type of opt is Option<&'a mut usize>, and value is &'a mut usize.
So we’ve got two outstanding mutable borrows, which is illegal. The real
problem is that Stateful without Resolve and the Borrow Checker pass, it
cannot know if a use of the variable is a copy or move in all cases. So we now
have cheat number 3: Use pseudo-macros to hint to Stateful if a type is
copyable or movable. This is the same technique we use to implement the
pseudo-macro yield_!(...), where we would add move_!(...) and copy_!(...)
to inform Stateful when something has been, well, moved or copied. Our
previous example would then be written as:
I’m also considering some default rules, that can be overridden with these
macros:
If a value is known to be copyable (it’s a primitive type, or it’s a &T
type), then it’s always copied. All other types are assumed to not be
copyable.
Non-copyable types are moved when passed into a function argument, unless
wrapped in a copy_!(...) hint.
Non-copyable type method calls are by reference, unless explicitly wrapped in
a move_!(...) hint.
Non-copyable types are moved in match statement, unless one of the match
arms uses ref or ref mut.
Hopefully this will enable a vast majority of code to work without
copy_!(...) or move_!(...).
Conclusion
Those are our major cheats! I’m sure there will be plenty more in the future.
In the meantime, I want to show off some some actual working code! Check this
puppy out!
Hello internet! It’s been too long. Not only are the
Rust Meetups back up and running, it’s
time for me to start back to blogging. For the past couple months, I’ve been
working on a new syntax extension that will allow people to create fun and
exciting new control flow mechanisms in stable Rust. “For the love of all that
is sigils, why?!” Well, Because I can. Sometimes when you stare into the
madness, it stares back into you? Or something like that?
It’s called Stateful, which helpfully has
no documentation. Such an innocent name, right? It’s very much in progress
(and mostly broken) implementation of some of the ideas in this and future
posts. So don’t go and think these code snippets are executable just yet :)
Anyway, lets show off Stateful by showing how we can implement
Generators.
We’ve got an RFC ticket to
implement them, but wouldn’t it be nice to have them sooner? For those of you
unfamiliar with the concept, Generators are function that can be returned from
multiple times, all while preserving state between those calls. Basically,
they’re just a simpler way to write
Iterators.
Say we wanted to iterate over the numbers 0, 1, and 2. Today, we would write
an Iterator with something like this:
The struct preserves our state across these function calls. It’s a pretty
straightforward implementation, but it does have some amount of boilerplate
code. For large iterator implementations, this state management can get quite
complicated. Instead, lets see how this same code could be expressed with
something like Stateful:
Where yield_!(i) is some magical control flow mechanism that not only
returned some value Some(i), but also made sure on the iter.next() would
jump the execution to just after the yield. At the end of the generator, we’d
just return None. We could simplify this even more by unrolling that loop
into:
The fun part is figuring out how to convert these generators into something
that’s roughly equivalent to Iter3. At it’s heart, Iter3 really is a
simple state machine, where we save the counter state in the structure before
we “yield” the value to the caller. Let’s look at what we would generate for
gen3_unrolled.
First, we need some boilerplate, that sets up the state of our generator. We
don’t yet have impl
trait, so we hide all
our stuff in a module:
We move the current state into advance, then have this loop-match state
machine. Then there are 2 new control flow constructs:
return_!($expr; $next_state) and our old friend goto!($next_state).
return_!() returns some value and also sets the position the generator should
resume at, and goto!() just sets the next state without leaving the function.
Relatively straightforward transformation, right? But that’s an easy case.
Things start to get a wee bit more complicated when we start thinking about how
we’d transform gen3, because it’s got both a while loop and a mutable
variable. Lets see that in action. I’ll leave out the boilerplate code and
just focus on the advance function:
Now things are getting interesting! There are two critical things we can see
off the bat. First, we need to reify the loops and conditionals into the state
machine, because they affect the control flow. Second, we need to lift any
variables that are accessed across states into the State enum.
We can also start seeing the complications. The obvious one is mutable
variables. We need to somehow thread the information about i’s mutability
through each of the states. This naive implementation would trip over the
#[warn(unused_mut)] lint. And now you might start to get a sense of the
horror that lies beneath Stateful.
At this point, you might be thinking to yourself, “Self, if mutable variables
are going to be complicated, what about copies and moves?” You sound like a
pretty sensible person. Therein lies madness. You might want to stop thinking
too deeply on it. If you can’t, maybe you think “Wait.
What about Generics?” Yep. “Borrows?!” Now I’m getting a little worried.
“How do you even know what’s a variable!?!” Sorry.
Yeah so there are one or two things that might be a tad challenging.
So that’s Stateful. It’s an experiment to get some real world experience
with these control flow mechanisms that may someday feed into RFCs, and maybe,
just maybe, might get implemented in the compiler. There’s no reason we need
to support everything, which would require us to basically reimplement the
compiler. Instead, I believe there’s a subset of Rust that we can support in
order to start getting real experience now.
Generators area really just the start. There’s a whole host of other things
that really are just other things that, if you just squint at em, are really
just state machines in disguise. It’s quite possible if we can pull
Stateful, we’ll also be able to implement things like
Coroutines,
Continuations, and
that hot new mechanism all the cool languages are implementing these days,
Async/Await.
But that’s all for later. First is to get this to work. In closing, I leave
you with these wise words.
One of the coolest things about the Rust typesystem is that you can use it to
make unsafe bindings safe. Read all about it in the
Rustonomicon. However, it can be
really quite easy to slip in a bug where you’re not actually making the
guarantees you think you’re making. For example, here’s a real bug I made in
the ZeroMQ FFI bindings (which have been
edited for clarity):
My intention was to tie the lifetime of PollItem<'a> to the lifetime of the
Socket, but because I left out one measly 'a, Rust doesn’t tie the two
together, and instead is actually using the 'static lifetime. This then lets
you do something evil like:
123456789
// leak the pointer!letpoll_item={letcontext=zmq::Context::new();letsocket=context.socket(zmq::PAIR).unwrap();socket.as_poll_item(0)};// And use the now uninitialized pointer! Wee! Party like it's C/C++!poll(&[poll_item],0).unwrap();
It’s just that easy. Fix is simple, just change the function to use &'a self
and Rust will refuse to compile this snippet. Job well done!
Well, no, not really. Because what was particularly devious about this bug is
that it actually came back. Later on I accidentally reverted &'a self back to
&self because I secretly hate myself. The project and examples still compiled
and ran, but that unitialized dereference was just waiting around to cause a
security vulnerability.
Oops.
Crap.
Making sure Rust actually rejects programs that it ought to be rejecting
fundamentally important when writing a library that uses Unsafe Rust.
That’s where compiletest comes in.
It’s a testing framework that’s been extacted from
rust-lang/rust
that lets you write these “shouldn’t-compile” tests. Here’s how to use it.
First add this to your Cargo.toml. We do a little feature dance because
currently compiletest only runs on nightly:
Finally, add the test! Here’s the one I wrote, tests/compile-fail/no-leaking-poll-items.rs:
123456789
externcratezmq;fnmain(){letmutcontext=zmq::Context::new();let_poll_item={letsocket=context.socket(zmq::PAIR).unwrap();socket.as_poll_item(0)//~ ERROR error: `socket` does not live long enough};}
Now you can live in peace with the confidence that this bug won’t ever appear again:
In summary, use compiletest, and demand it’s use from the Unsafe Rust
libraries you use! Otherwise you can never be sure if unsafe and undefined
behavior like this will sneak into your project.
Hello all you beautiful and talented people! I’m pleased to announce
serde 0.5.0. We’re bumping the major
(unstable) version number here because there have been a huge amount of
breaking changes in the API. This has been done to better support serialization
formats like bincode, which relies on
the Serializee to hint to the Serializer how to parse the next bytes.
This will enable Servo to use
bincode for its IPC protocol.
Here are the major changes:
serde::json was factored out into its own separate crate
serde_json#114.
Added serialization and deserialization type hints.
Renamed many functions to change visit_named_{map,seq} to
visit_struct and visit_tuple_struct#114#120.
Added hooks to allow serializers to serialize newtype tuple structs without a
wrapper type #121.
Rewrote json parser to not consume the whole stream
#127.
Fixed serde_macros for generating fully generic code
#117.
Thank you to everyone that’s helped with this release:
Craig Brandenburg
Hugo Duncan
Jarred Nicholis
Oliver Schneider
Patrick Walton
Sebastian Thiel
Skylar Lipthay
Thomas Bahn
dswd
Benchmarks
It’s been a bit since we last did some
benchmarks,
so here are the latest numbers with these compilers:
rustc: 1.4.0-nightly (1181679c8 2015-08-07)
go: version go1.4.2 darwin/amd64
clang: Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
bincode’s serde support makes its first
appearance, which starts out roughly 1/3 slower at serialization, but about the
same speed at deserialization. I haven’t done much optimization, so there’s
probably a lot of low hanging fruit.
serde_json saw a good amount of
improvement, mainly from some compiler optimizations in the 1.4 nightly. The
deserializer is slightly slower due to the parser rewrite.
capnproto-rust’s unpacked format
shows a surprisingly large large serialization improvement, with a 10x
improvement from 4GB/s to 15GB/s. Good job dwrensha! Deserialization is half
as slow as before though. Perhaps I have a bug in my code?
I’ve changed the Rust MessagePack implementation to
rmp, which has a wee bit faster
serializer, but deserialization was about the same.
I’ve also updated the numbers for Go and C++, but those numbers stayed roughly
the same.
Hello Internet! I’m pleased to announce
serde 0.4.0, which now supports many new
features with help from our growing serde community. The largest is now serde
supports syntax extensions in stable Rust by way of
syntex. syntex is a fork of Rust’s
parser library libsyntax that has been modified to enable code generation.
serde uses it along with a
Cargo build script to expand the
#[derive(Serialize, Deserialize)] decorator annotations. Here’s how to use
it.
First, lets start with a simple serde 0.3.x project that’s forced to use
nightly because it uses serde_macros. The Cargo.toml is:
In order to use Stable Rust, we can use the new serde_codegen. Our strategy
is to split our input into two files. The first is the entry point Cargo will
use to compile the library, src/lib.rs. The second is a template that
contains the macros, src/lib.rs.in. It will be expanded into
$OUT_DIR/lib.rs, which is included in src/lib.rs. So src/lib.rs looks
like:
While syntex is quite powerful, there are a few major downsides. Rust does
not yet support the ability for a generated file to provide error location
information from a template file. This means that tracking down errors requires
manually looking at the generated code and trying to identify where the error
in the template. However, there is a workaround. It’s actually not that
difficult to support syntex and the Rust Nightly compiler plugins. To update
our example, we’ll change the Cargo.toml to:
Then most development can happen with using the Nightly Rust and
cargo build --no-default-features --features nightly for better error
messages, but downstream consumers can use Stable Rust without worry.
Downside 2: Macros in Macros
Syntex can only expand macros inside macros it knows about, and it doesn’t know
about the builtin macros. This is because a lot of the stable macros are using
unstable features under the covers. So unfortunately if you’re using a library
like the quasiquoting library quasi,
you cannot write:
1
letexprs=vec![quote_expr!(cx,1+2)];
Instead you have to pull out the syntex macros into a separate variable:
12
letexpr=quote_expr!(cx,1+1);letexprs=vec![expr];
Downside 3: Compile Times
Syntex can take a while to compile. It may be possible to optimize this, but
that may be difficult while keeping compatibility with libsyntax.
That’s v0.4.0. I hope you enjoy it! Please let me know if you run into any
problems.
Release Notes
Here are other things that came with this version:
Added field annotation to enable renaming fields for different backends
#69. For example:
I just pushed up serde 0.3.1 to
crates.io, which is now compatible with beta!
serde_macros 0.3.1, however still requires nightly. But this means that if
you implement the all the traits using stable features, then any users of serde
should work with rust 1.0.
Here’s what’s also new in serde v0.3.1:
Renamed ValueDeserializer::deserializer into ValueDeserializer::into_deserializer.
Renamed the attribute that changes the name a field is serialized
#[serde(alias="...")] to #[serde(rename="...")].
Added implementations for Box, Rc, and Arc.
Updated VariantVisitor to hint to the deserializer which variant kind it is expecting.
This allows serializers to serialize a unit variant as a string.
Added an Error::unknown_field_error error message.
Progress on the documentation, but there’s still plenty more to go.
Upstream of serde, I’ve been also doing some work on
aster and
quasi, which are my helper libraries to
simplify writing syntax extensions.
aster v0.2.0:
Added builders for qualified paths, slices, Vec, Box, Rc, and Arc.
Extended item builders to support use simple paths, globs, and lists.
Added a helper for building the #[automatically_derived] annotation.
quasi v0.1.9:
Backported support for quote_attr!() and quote_matchers!() from libsyntax.
I’m happy to announce that I’ve released
serde 0.3 on
crates.io today. For those unfamiliar with
serde, it’s a generic serialization framework, much like
rustc-serialize, but much more
powerful. Check out my serialization series
if you’re interested in serde’s original development.
There’s been a ton of work since 0.2. Here are the highlights:
Ported over from std::old_io to std::io. There is a bit of a performance hit
when serializing to &mut [u8], although it’s really not that bad. In my goser
benchmarks, previously it ran in 373 MB/s, but now it’s running at 260 MB/s.
However, this hasn’t impacted the Vec<u8> serialization performance, nor
deserialization performance.
Much better JSON deserialization errors. Now std::io::Error is properly
propogated, and error locations are reported when a Deserialize raises an error.
Merged serde::ser::Serializer and serde::ser::Visitor.
Renamed serde::ser::Serialize::visit to serde::ser::Serialize::serialize.
Replaced serde::ser::{Seq,Map}Visitor::size_hint with a len() method that
returns an optional length. This has a little stronger emphasis that we either
need an exactly length or no length. Formats that need an exact length should
make sure to verify the length passed in matches the actual amount of values
serialized.
serde::json now deserializes missing values as a ().
Finished implementing #[derive(Serialize, Deserialize)] for all struct and
enum forms.
Ported serde_macros over to aster
and quasi, which simplies code
generation.
Removed the unnessary first argument from visit_{seq,map}_elt.
Rewrote enum deserializations to not require allocations. Oddly enough this
is a tad slower than the allocation form. I suspect it’s coming from the
function calls not getting inlined away.
Allowed enum serialization and deserialization to support more than one
variant.
Allowed Deserialize types to hint that it’s expecting a sequence or a map.
Allowed maps to be deserialized from a ().
Added a serde::bytes::{Bytes,ByteBuf}, which wrap &[u8]/Vec<u8> to allow
some formats to encode these values more efficiently than generic sequences.
Added serde::de::value, which contains some helper deserializers to
deserialize from a Rust type.
Added impls for most collection types in the standard library.
Thanks everyone that’s helped out with this release!
Well it’s a long time coming, but serde2 is finally in a mostly usable
position! If you recall from
part 3,
one of the problems with serde1 is that we’re paying a lot for tagging our
types, and it’s really hurting us on the deserialization side of things. So
there’s one other pattern that we can use that allows for lookahead that
doesn’t need tags: visitors. A year or so ago I rewrote our generic hashing
framework to use the visitor pattern to great success. serde2 came out of
experiments to see if I could do the same thing here. It turned out that it was
a really elegant approach.
Serialize
It all starts with a type that we want to serialize:
Things get more interesting when we get to compound structures like a sequence.
Here’s Visitor again. It needs to both be able to visit the overall structure
as well as each item:
We also have this SeqVisitor trait that the type to serialize provides. It
really just looks like an Iterator, but the type parameter has been moved to
the visit method so that it can return different types:
Finally, to implement this for a type like &[T] we create an
Iterator-to-SeqVisitor adaptor and pass it to the visitor, which then in
turn visits each item:
SeqIteratorVisitor is publically exposed, so it should be easy to use it with
custom data structures. Maps follow the same pattern (and also expose
MapIteratorVisitor), but each item instead uses visit_visit_map_elt(first,
key, value). Tuples, struct tuples, and tuple enum variants are all really
just named sequences. Likewise, structs and struct enum variants are just named
maps.
Because struct implementations are so common, here’s an example how to do it:
It’s the responsibility of the serializer to create a visitor and then pass it
to the type. Oftentimes the serializer also implements Visitor, but it’s not
required. Here’s a snippet of the JSON serializer visitor:
Now serialization is the easy part. Deserialization is where it always gets
more tricky. We follow a similar pattern as serialization. A deserializee
creates a visitor which accepts any type (most resulting in an error), and
passes it to a deserializer. This deserializer then extracts it’s next value
from it’s stream and passes it to the visitor, which then produces the actual
type.
It’s achingly close to the same pattern between a serializer and a serializee,
but as hard as I tried, I couldn’t unify the two. The error semantics are
different. In serialization, you want the serializer (which creates the
visitor) to define the error. In deserialization, you want the deserializer
which consumes the visitor to define the error.
Let’s start first with Error. As opposed to serialization, when we’re
deserializing we can error both in the Deserializer if there is a parse
error, or in the Deserialize if it’s received an unexpected value. We do this
with an Error trait, which allows a deserializee to generically create the
few errors it needs:
Here is an example struct deserializer. Structs are deserialized as a map, but
since maps are unordered, we need a simple state machine to extract the values.
In order to get the keys, we just create an enum for the fields, and a custom
deserializer to convert a string into a field without an allocation:
It’s a little more complicated, but once again there is
#[derive_deserialize], which does all this work for you.
Deserializer
Deserializers then follow the same pattern as serializers. The one difference
is that we need to provide a special hook for Option<T> types so formats like
JSON can treat null types as options.
12345678910111213141516171819
pubtraitDeserializer{typeError:Error;fnvisit<V:Visitor,>(&mutself,visitor:&mutV)->Result<V::Value,Self::Error>;/// The `visit_option` method allows a `Deserialize` type to inform the/// `Deserializer` that it's expecting an optional value. This allows/// deserializers that encode an optional value as a nullable value to/// convert the null value into a `None`, and a regular value as/// `Some(value)`.#[inline]fnvisit_option<V:Visitor,>(&mutself,visitor:&mutV)->Result<V::Value,Self::Error>{self.visit(visitor)}}
Performance
So how does it perform? Here’s the serialization benchmarks, with yet another
ordering. This time sorted by the performance:
language
library
format
serialization (MB/s)
Rust
capnproto-rust
Cap’n Proto (unpacked)
4226
Go
go-capnproto
Cap’n Proto
3824.20
Rust
bincode
Binary
1020
Rust
capnproto-rust
Cap’n Proto (packed)
672
Go
gogoprotobuf
Protocol Buffers
596.78
Rust
rust-msgpack
MessagePack
397
Rust
serde2::json (&[u8])
JSON
373
Rust
rust-protobuf
Protocol Buffers
357
C++
rapidjson
JSON
316
Rust
serde2::json (Custom)
JSON
306
Rust
serde2::json (Vec)
JSON
288
Rust
serde::json (Custom)
JSON
244
Rust
serde::json (&[u8])
JSON
222
Go
goprotobuf
Protocol Buffers
214.68
Rust
serde::json (Vec)
JSON
149
Go
ffjson
JSON
147.37
Rust
serialize::json
JSON
183
Go
encoding/json
JSON
80.49
I think it’s fair to say that on at least this benchmark we’ve hit our
performance numbers. Writing to a preallocated buffer with BufWriter is 18%
faster than rapidjson (although to be
fair they are allocating). Our Vec<u8> writer comes in 12% slower. What’s
interesting is this custom Writer. It turns out LLVM is still having trouble
lowering our generic Vec::push_all into a memcpy. This Writer variant
however is able to get us to rapidjson’s level:
12345678910111213141516171819202122232425262728
fnpush_all_bytes(dst:&mutVec<u8>,src:&[u8]){letdst_len=dst.len();letsrc_len=src.len();dst.reserve(src_len);unsafe{// we would have failed if `reserve` overflowed.dst.set_len(dst_len+src_len);::std::ptr::copy_nonoverlapping_memory(dst.as_mut_ptr().offset(dst_lenasisize),src.as_ptr(),src_len);}}structMyMemWriter1{buf:Vec<u8>,}implWriterforMyMemWriter1{#[inline]fnwrite_all(&mutself,buf:&[u8])->IoResult<()>{push_all_bytes(&mutself.buf,buf);Ok(())}}
Deserialization we do much better than serde because we aren’t passing around
all those tags, but we have a ways to catch up to rapidjson. Still, being just 37%
slower than the fastest JSON deserializer makes me feel pretty proud.
language
library
format
deserialization (MB/s)
Rust
capnproto-rust
Cap’n Proto (unpacked)
2123
Go
go-capnproto
Cap’n Proto (zero copy)
1407.95
Go
go-capnproto
Cap’n Proto
711.77
Rust
capnproto-rust
Cap’n Proto (packed)
529
Go
gogoprotobuf
Protocol Buffers
272.68
C++
rapidjson
JSON (sax)
189
C++
rapidjson
JSON (dom)
162
Rust
bincode
Binary
142
Rust
rust-protobuf
Protocol Buffers
141
Rust
rust-msgpack
MessagePack
138
Rust
serde2::json
JSON
122
Go
ffjson
JSON
95.06
Go
goprotobuf
Protocol Buffers
79.78
Rust
serde::json
JSON
67
Rust
serialize::json
JSON
25
Go
encoding/json
JSON
22.79
Conclusion
What a long trip it’s been! I hope you enjoyed it. While there are still
a few things left to port over from serde1 to serde2 (like the JSON pretty
printer), and some things probably should be renamed, I’m happy with the design
so I think it’s in a place where people can start using it now. Please let me
know how it goes!
I use and love syntax extensions, and I’m planning on using them to simplify
down how one interacts with a system like
serde. Unfortunately though, to write
them you need to use Rust’s libsyntax, which is not going to be exposed in
Rust 1.0 because we’re not ready to stablize it’s API. That would really hamper
future development of the compiler.
It would be so nice though, writing this for every type we want to serialize:
So I want to announce my plan on how to deal with this (and also publically
announce that everyone can blame me if this turns out to hurt Rust’s future
development). I’ve started syntex, a
library that enables code generation using an unofficial tracking fork of
libsyntax. In order to deal with the fact that libsyntax might make
breaking changes between minor Rust releases, we will just release a minor or
major release of syntex, depending on if there were any breaking changes in
libsyntax. Ideally syntex will allow the community to experiment with
different code generation approaches to see what would be worth merging
upstream. Or even better, what hooks are needed in the compiler so it doesn’t
have to think about syntax extensions at all.
I’ve got the basic version working right now. Here’s a simple
hello_world_macros syntax extension (you can see the actual code
here).
First, the hello_world_macros/Cargo.toml:
Now to use it. This is a little more complicated because we have to do code
generation, but Cargo helps with that. Our strategy is use a build.rs script
to do code generation in a main.rss file, and then use the include!() macro
to include it into our dummy main.rs file. Here’s the Cargo.toml we need:
// Include the real maininclude!(concat!(env!("OUT_DIR"),"/main.rs"));
And finally the main.rss:
1234
fnmain(){lets=hello_world!();println!("{}",s);}
One limitiation you can see above is that we unfortunately can’t compose our
macros with the Rust macros is that syntex currently has no awareness of the
Rust macros, and since macros are parsed outside-in, we have to leave the
tokens inside a macro like println!() untouched.
That’s syntex. There is a bunch of more work left to be done in syntex
to make it really useable. There’s also a lot of work in Rust and Cargo that
would help it be really effective:
We need a way to inform Rust that this block of code is actually coming from
a different file than the one it’s processing. This is roughly equivalent to
the #line macros in
C.
We could upstream the “ignore unknown macros” patch to minimize
changes to libsyntax.
It would be nice if we could allow #[path] to reference an environment
variable. This would be a little cleaner than using include!(...).
We need a way to extract macros from a crate.
On Cargo’s side:
It would be nice if Cargo could be told to use a generated file as the
main.rs/lib.rs/etc.
Cargo.toml could grow a plugin mechanism to remove the need to write a
build.rs script.
I’m sure there’s plenty more that needs to get done! So, please help out!