Parsing
Parsing is implemented using the Pest crate. This uses a custom grammar
and generates from it a parser which implements error display, and yields an
iterator called Pairs with the result of the parsing. These iterators are consumed
and turned into a Pattern<Parsed>
, which is the internal representation of
an AST of a raw, parsed pattern.
For example, a pattern such as sk(-\w{english}){3}[0-9]{2}
might look like this
after going through this parsing:
Every pattern parses as a group. In this case, the root group contains a single segment. Segments consists of items, which hold syntax elements along with their modifiers (repeat and optional). Literals are parsed and stored individually.
Abstract Syntax Tree
The data structure used to represent the AST of a parsed pattern in Passgen
is called Pattern<T>
, where T
is a type argument which configures what
data structures are used for certain elements. We will gloss over these for
now, and show a simplified version of the AST.
Pattern
#![allow(unused)] fn main() { enum Pattern { Literal(Literal), Set(Set), Special(Special) Group(Group), } }
The base data type for Passgen patterns is the Pattern
type, which can be
any of the possible syntax elements:
- Literal: Raw characters that are emitted unchanged
- Set: Set of possible characters to choose from
- Group: Segments of syntax elements, out of which one is chosen at random.
- Special: Special elements, such as a random word from a wordlist or a predefined pattern.
Literal
#![allow(unused)] fn main() { struct Literal { value: char, } }
Literals are raw characters which are output unchanged. For the initial AST,
every Literal is just a single character, after optimization they are represented
as String
(consecutive literals are grouped together).
Set
#![allow(unused)] fn main() { struct Set { characters: BTreeMap<char, usize>, } }
Sets consists of characters along with their weight. When generating, a random
character is chosen (with the weight being taken into account). The real data
structure used to store these is more efficient than the BTreeMap
shown here,
but it is sufficient for understanding.
Special
#![allow(unused)] fn main() { struct Special { Wordlist(String), Markov(String), Pattern(String), } }
Special elements can be one of wordlist, markov or pattern. These refer to a wordlist, markov-chain or pattern preset by name.
Group
#![allow(unused)] fn main() { struct Group { segments: Vec<Segment> } }
Groups consist of segments, out of which one is chosen at random at generation time. Groups can also just have one segment.
Segment
#![allow(unused)] fn main() { struct Segment { items: Vec<Item> } }
Segments are sequences of Passgen syntax items.
Item
#![allow(unused)] fn main() { struct Item { element: Pattern, optional: bool, repeat: RangeInclusive<usize>, } }
Passgen syntax items are Passgen elements, with additional modifiers.