I’ve written a bunch of simple parsers for Rust, and it’s starting to get a little obnoxious. So I added a Rust backend to the Ragel State Machine Compiler. You can find my fork here. I’m waiting for Rust to stablize before I try to push it upstream.
Ragel is a rather neat way of writing simple parsers. In some ways it’s pretty similar to Lex, but Ragel also allows you execute arbitrary code at any point in the state machine. Furthermore, this arbitrary code can manipulate the state machine itself, so it can be used in many places you’d traditionally need a full parser, such as properly handling parentheses.
Here’s an example of a atoi
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
While this is probably a bit more verbose than writing atoi
by hand, it does
make the grammar pretty explicit, which can help keep it accurate.
Unfortunately there are some pretty severe performance issues at the moment. Ragel supports two state machine styles, table-driven and goto-driven. My backend uses tables, but since Rust doesn’t yet support global constant vectors, I need to malloc the state machine table on every function call. This results in the ragel-based url parser being about 10 times slower than the equivalent table-based parser in OCaml. You can see the generated code here.
The goto
route could be promising to explore. We could simulate it using
mutually recursive function calls. OCaml does this. But again, since Rust
doesn’t support tailcalls (and may
ever), we could run into a stack
explosion. It may work well for small grammars though, and maybe LLVM could
optimize calls into tailcalls.
Unless I’m doing something glaringly wrong, it seems likely that we are going to need some compiler help before these performance issues get solved.