Parsing

Parsing is implemented using the Pest crate. This uses a custom grammar and generates from it a parser which implements error display, and yields an iterator called Pairs with the result of the parsing. These iterators are consumed and turned into a Pattern<Parsed>, which is the internal representation of an AST of a raw, parsed pattern.

Pattern parsing workflow

For example, a pattern such as sk(-\w{english}){3}[0-9]{2} might look like this after going through this parsing:

AST after parsing

Every pattern parses as a group. In this case, the root group contains a single segment. Segments consists of items, which hold syntax elements along with their modifiers (repeat and optional). Literals are parsed and stored individually.

Abstract Syntax Tree

The data structure used to represent the AST of a parsed pattern in Passgen is called Pattern<T>, where T is a type argument which configures what data structures are used for certain elements. We will gloss over these for now, and show a simplified version of the AST.

Pattern

#![allow(unused)]
fn main() {
enum Pattern {
    Literal(Literal),
    Set(Set),
    Special(Special)
    Group(Group),
}
}

The base data type for Passgen patterns is the Pattern type, which can be any of the possible syntax elements:

  • Literal: Raw characters that are emitted unchanged
  • Set: Set of possible characters to choose from
  • Group: Segments of syntax elements, out of which one is chosen at random.
  • Special: Special elements, such as a random word from a wordlist or a predefined pattern.

Literal

#![allow(unused)]
fn main() {
struct Literal {
    value: char,
}
}

Literals are raw characters which are output unchanged. For the initial AST, every Literal is just a single character, after optimization they are represented as String (consecutive literals are grouped together).

Set

#![allow(unused)]
fn main() {
struct Set {
    characters: BTreeMap<char, usize>,
}
}

Sets consists of characters along with their weight. When generating, a random character is chosen (with the weight being taken into account). The real data structure used to store these is more efficient than the BTreeMap shown here, but it is sufficient for understanding.

Special

#![allow(unused)]
fn main() {
struct Special {
    Wordlist(String),
    Markov(String),
    Pattern(String),
}
}

Special elements can be one of wordlist, markov or pattern. These refer to a wordlist, markov-chain or pattern preset by name.

Group

#![allow(unused)]
fn main() {
struct Group {
    segments: Vec<Segment>
}
}

Groups consist of segments, out of which one is chosen at random at generation time. Groups can also just have one segment.

Segment

#![allow(unused)]
fn main() {
struct Segment {
    items: Vec<Item>
}
}

Segments are sequences of Passgen syntax items.

Item

#![allow(unused)]
fn main() {
struct Item {
    element: Pattern,
    optional: bool,
    repeat: RangeInclusive<usize>,
}
}

Passgen syntax items are Passgen elements, with additional modifiers.