Stateful, Part 2: How Stateful Cheats at Analysis

As I mentioned in the last part, Stateful has some challenges it needs to overcome in order to add new and exciting control flow mechanisms to Rust. While we don’t get access to any of the cool analysis passes inside the Rust compiler, Stateful is able to sneak around their necessity in many cases since it really only needs to support a subset of Rust. Here are some of the techniques it exploits, err, uses.

Variables

First off, let’s talk about variables. One of the primary things Stateful needs to do is manage the process of state flowing through the machine. However, consider a statement like this:

let x = ...;

“Obviously it’s a variable, right?” Actually you can’t be sure. What if someone did:

enum Foo {
    x,
}
use Foo::*;

Well then the compiler would helpfully report:

foo.rs:7:9: 7:10 error: declaration of `x` shadows an enum variant or unit-like struct in scope [E0413]
foo.rs:7     let x = x;

But that warning only works for simple let statements. Consider what happens with matches. Consider:

match ... {
    x => { ... }
    y => { ... }
}

Is x or y a variable, or a variant? There’s no way to know unless you perform name resolution, otherwise known as the resolve pass in the compiler. Unfortunately though, there’s no way for Stateful to run that analysis. As Sméagol said, “There is another way. More secret, and dark way.”. This leads us to Cheat Number One: Stateful assumes that all lowercase identifiers are variables, and uppercase ones are enum variants. Sure, Rust supports lowercase variants, but there’s no reason why Stateful has to use them. It makes our lives much easier.

Types

The next problem is typing. Sure, Rust is nice and all that you can write a local variable like let x = ... and it’ll infer the type for you. All Rust asks for is that the user explicitly specify the type of a value that enters or leaves the bounds of a function. Our problem is that one of the main tasks of Stateful is to lift variables into some State structure so that their available when the function is re-entered. So in effect, all variables inside Stateful must be typed. Consider the example from last week:

fn advance(mut state: State) -> (Option<usize>, State) {
    loop {
        match state {
            State::Enter => {
                let mut i = 0;
                goto!(State::Loop { i: i });
            }
            State::Loop { mut i } => {
                if i < 3 {
                    goto!(State::Then { i: i });
                } else {
                    goto!(State::Else { i: i });
                }
            }
            State::Then { mut i } => {
                return_!(Some(i); State::AfterYield { i: i });
            }
            State::Else { mut i } => {
                goto!(State::AfterLoop { i: i });
            }
            State::AfterYield { mut i } => {
                i += 1;
                goto!(State::Loop { i: i });
            }
            State::AfterLoop { mut i } => {
                goto!(State::Exit);
            }
            State::Exit => {
                return_!(None; State::Exit);
            }
        }
    }
}

This State enumeration is what I’m talking about. It gets passed into and out of the advance function. It needs to be some concrete type, which looks something like this:

enum State {
    Enter,
    Loop { i: usize },
    Then { i: usize },
    Else { i: usize },
    AfterYield { i: usize },
    AfterLoop { i: usize },
    Exit,
}

The problem is that we want to write code like this:

#[generator]
fn gen3() -> Iterator<Item=usize> {
    let mut i = 0;
    while i < 3 {
        yield_!(i);
        i += 1;
    }
}

So how can we resolve this? Well first, we could wait for RFC 105 or RFC 1305 to get implemented, but that’s not happening any time soon. Until then, there is cheat number two: Hide state variables in a boxed trait. This one is from Eduard Burtescu. Instead of the nice well typed example from the last post, we actually generate some code that hides the types with an over abundance of generic types:

fn gen3() -> Iterator<Item=usize> {
    struct Wrapper<S, F> {
        state: S,
        next: F,
    }

    impl<S, T, F> Iterator for Wrapper<S, F>
        where S: Default,
              F: Fn(S) -> (Option<T>, S)
      {
          type Item = T;

          fn next(&mut self) -> Option<Self::Item> {
              let old_state = ::std::mem::replace(
                  &mut self.state,
                  S::default());

              let (value, next_state) = (self.next)(old_state);
              self.state = next_state;
              value
          }
      }

    enum State<T0> {
        Enter,
        Loop { i: T0 },
        Then { i: T0 },
        Else { i: T0 },
        AfterYield { i: T0 },
        AfterLoop { i: T0 },
        Exit,
    }

    impl<T0> Default for State<T0> {
        fn default() -> Self {
            {
                State::State1End
            }
        }
    }

    Box::new(Wrapper::new(State::Start, |mut state| {
        loop {
            match state {
                State::Enter => {
                    let mut i = 0;
                    goto!(State::Loop { i: i });
                }
                State::Loop { mut i } => {
                    if i < 3 {
                        goto!(State::Then { i: i });
                    } else {
                        goto!(State::Else { i: i });
                    }
                }
                State::Then { mut i } => {
                    return_!(Some(i); State::AfterYield { i: i });
                }
                State::Else { mut i } => {
                    goto!(State::AfterLoop { i: i });
                }
                State::AfterYield { mut i } => {
                    i += 1;
                    goto!(State::Loop { i: i });
                }
                State::AfterLoop { mut i } => {
                    goto!(State::Exit);
                }
                State::Exit => {
                    return_!(None; State::Exit);
                }
            }
        }
    }))
}

All for the cost of a boxed variable. It’s not ideal, but it does let us keep experimenting. However, if we do want to avoid this allocation, we can just require that all variables that survive across a yield point have their type specified. So our previous example would be written as:

#[generator]
fn gen3() -> Iterator<Item=usize> {
    let mut i: usize = 0;
    while i < 3 {
        yield_!(i);
        i += 1;
    }
}

It’s not so bad here, but it’d get obnoxious if we had a generator like:

#[generator]
fn complicated(items: &[usize]) -> Iterator<Item=usize> {
    let iter = items.iter().map(|item| item * 3);

    for item in iter {
        yield_!(item);
    }
}

The type of iter, by the way, is impossible to write because there is currently no way to specify the type of the closure. Instead, it needs to be rewritten to use a free function:

#[generator]
fn complicated(items: &[usize]) -> Iterator<Item=usize> {
    fn map<'a>(item: &'a usize) -> usize { *item * 3 }

    let iter: std::iter::Map<
            std::slice::Iter<'a, usize>,
            fn(&'a usize) -> usize
        >
        = items.iter().map(map);

    for item in iter {
        yield_!(item);
    }
}

If we want to support closures though, we need to use the Box<Iterator<...> trick.

References

This one’s a doozy. Here’s an example of the problem. Consider:

#[generator]
fn gen<'a>(opt: Option<&'a mut usize>) -> Iterator<Item=&'a mut usize> {
    match opt {
        Some(value) => { yield_!(value); }
        None => { }
    }
}

This would look something like this (which also demonstrates how match statements are represented):

...
Box::new(Wrapper::new(State::State0Start { opt: opt }, |mut state| {
    loop {
        match state {
            State::State0Start { opt: opt } => {
                match opt {
                    Some(value) => {
                        state = State::State2Arm {
                            opt: opt,
                            value: value,
                        };
                        continue;
                    }
                    None => {
                        state = State::State3Arm { opt: opt };
                        continue;
                    }
                }
            }
            State::State1End => {
                return (::std::option::Option::None, State::State1End);
            }
            State::State2Arm { opt: opt, value: value } => {
                return (::std::option::Option::Some(value),
                        State::State5AfterYield {
                    opt: opt,
                    value: value,
                });
            }
            State::State3Arm { opt: opt } => {
                {
                };
                state = State::State4MatchJoin { opt: opt };
                continue;
            }
            State::State4MatchJoin { opt: opt } => {
                ::std::mem::drop(opt);
                state = State::State1End;
                continue;
            }
            State::State5AfterYield {
                              opt: opt, value: value } => {
                ::std::mem::drop(value);
                state = State::State4MatchJoin { opt: opt };
                continue;
            }
        }
    }
}))

Zero in on this block:

...
Some(value) => {
    state = State::State2Arm {
        opt: opt,
        value: value,
    };
    continue;
}
...

The type of opt is Option<&'a mut usize>, and value is &'a mut usize. So we’ve got two outstanding mutable borrows, which is illegal. The real problem is that Stateful without Resolve and the Borrow Checker pass, it cannot know if a use of the variable is a copy or move in all cases. So we now have cheat number 3: Use pseudo-macros to hint to Stateful if a type is copyable or movable. This is the same technique we use to implement the pseudo-macro yield_!(...), where we would add move_!(...) and copy_!(...) to inform Stateful when something has been, well, moved or copied. Our previous example would then be written as:

#[generator]
fn gen<'a>(opt: Option<&'a mut usize>) -> Iterator<Item=&'a mut usize> {
    match move_!(opt) {
        Some(value) => { yield_!(value); }
        None => { }
    }
}

Which would then give Stateful enough information to generate something like this, which would then know that the match consumed the option:

Box::new(Wrapper::new(State::State0Start { opt: opt }, |mut state| {
    loop {
        match state {
            State::State0Start { opt: opt } => {
                match opt {
                    Some(value) => {
                        state = State::State2Arm {
                            value: value,
                        };
                        continue;
                    }
                    None => {
                        state = State::State3Arm;
                        continue;
                    }
                }
            }
            State::State1End => {
                return (::std::option::Option::None, State::State1End);
            }
            State::State2Arm { value: value } => {
                return (::std::option::Option::Some(value),
                State::State5AfterYield);
            }
            State::State3Arm => {
                state = State::State4MatchJoin;
                continue;
            }
            State::State4MatchJoin => {
                state = State::State1End;
                continue;
            }
            State::State5AfterYield => {
                state = State::State4MatchJoin;
                continue;
            }
        }
    }
}))

I’m also considering some default rules, that can be overridden with these macros:

If a value is known to be copyable (it’s a primitive type, or it’s a &T type), then it’s always copied. All other types are assumed to not be copyable.
Non-copyable types are moved when passed into a function argument, unless wrapped in a copy_!(...) hint.
Non-copyable type method calls are by reference, unless explicitly wrapped in a move_!(...) hint.
Non-copyable types are moved in match statement, unless one of the match arms uses ref or ref mut.

Hopefully this will enable a vast majority of code to work without copy_!(...) or move_!(...).

Conclusion

Those are our major cheats! I’m sure there will be plenty more in the future. In the meantime, I want to show off some some actual working code! Check this puppy out!

#![feature(plugin)]
#![plugin(stateful)]

#![allow(dead_code)]
#![allow(non_shorthand_field_patterns)]
#![allow(unused_mut)]
#![allow(unused_variables)]

#[generator]
fn gen<'a, T>(items: &'a [T]) -> &'a T {
    let mut iter = items.iter();
    loop {
        match iter.next() {
            Some(item) => {
                yield_!(item);
            }
            None => {
                break;
            }
        };
    };
}

fn main() {
    let items = &[1, 2, 3];
    for value in gen(items).take(20) {
        println!("{}", value);
    }
}

Produces:

1
2
3

Isn’t it beautiful? We got generics, mutable variables, loops, matches, breaks, and a whole host of ignored warnings!

Chasing Rabbits

A poorly updated blog about what I’m working on

Stateful, Part 2: How Stateful Cheats at Analysis

Variables

Types

References

Conclusion