Cover

Passgen generates random sequences from a dialect of regular expressions. It is suitable for generating secure passphrases that are memorable but have high-entropy. It is available for download as a command-line utility. For a quick overview of how it works, see usage.

Preface

Background

spg

Libary used by 1Password to generate secure passwords.

Research

Password Research

Apple passwords

https://raw.githubusercontent.com/PolyPasswordHasher/PolyPasswordHasher/refs/heads/master/academic-writeup/paper.pdf
https://stuartschechter.medium.com/before-you-use-a-password-manager-9f5949ccf168
https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2016/03/time-rethink-mandatory-password-changes
https://cups.cs.cmu.edu/passwords.html
https://patrickmn.com/security/storing-passwords-securely/
https://www.ise.io/casestudies/password-manager-hacking/
https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/
https://techcommunity.microsoft.com/t5/microsoft-entra-blog/your-pa-word-doesn-t-matter/ba-p/731984
https://blog.codinghorror.com/password-rules-are-bullshit/
https://anarc.at/blog/2017-02-18-passwords-entropy/
https://mitxela.com/projects/shamirs_password_store
https://fusionauth.io/articles/security/math-of-password-hashing-algorithms-entropy
https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/
https://seirdy.one/posts/2021/01/12/password-strength/
https://crypto.stanford.edu/PwdHash/
https://bitspook.in/blog/how-i-manage-my-passwords/
https://er.educause.edu/blogs/2017/10/time-for-password-expiration-to-die
https://brandur.org/fragments/password-hashing
https://lock.cmpxchg8b.com/passmgrs.html
https://www.troyhunt.com/86-of-passwords-are-terrible-and-other-statistics/
https://seirdy.one/posts/2021/01/12/password-strength/

Explainer of password strength

Features

Syntax

Instead of hard-coding different ways to generate random passphrases, Passgen uses a language to describe the shape of the passphrase it should generate. In the code, this is called a pattern.

The language it uses is similar to regular expressions, which is a language that is often used to validate or search text. However, Passgen uses that in reverse: instead of using the pattern to match some text, it generates some text which matches the pattern.

Inputs

As Passgen is designed to generate passphrases, it operates on text. To be compatible with different languages and keyboard layouts, it uses the Unicode encoding. This allows it to generate passphrases for nearly all languages which are alive today.

Any valid Unicode character can be used in the language that Passgen uses. In Unicode terminology, characters are called codepoints.
Any Unicode codepoint can also be input in an escaped form using only ASCII characters. For example, from Passgen's point of view, \u{ab} and x are the same character.
Characters may also be escaped. This is only useful for characters that Passgen uses for syntactic purposes.

Paper

Installation

You can install the Passgen command-line interface like this:

cargo install passgen-cli

Configuration

Compiling

Code Documentation

Parsing

Passgen uses a custom language to encode the pattern it uses to generate passphrases. It is fully AST-based, meaning that the result of parsing is an AST, and the AST is also consumed by the generator (as opposed to using a bytecode approach, whereby the parsed pattern AST is first compiled into bytecode, which is later executed).

Parsing overview

Processing of parsed patterns in Passgen is done in three steps:

Parsing: In this stage, the pattern is parsed and loaded into a raw AST.
Optimizing: An optimizer simplifies the AST, yielding an AST that is optimized for generation.
Preparing: During preparation, all external dependencies are resolved and injected into the AST, yielding a Pattern that can be used for generation.

The goal of this is to yield an AST that can be fed to the generator, which includes all the data needed to generate a passphrase from the parsed pattern.

Finally, the pattern is prepared for generation. This consumes the optimized AST and injects dependencies.

Parsing

Parsing is implemented using the Pest crate. This uses a custom grammar and generates from it a parser which implements error display, and yields an iterator called Pairs with the result of the parsing. These iterators are consumed and turned into a Pattern<Parsed>, which is the internal representation of an AST of a raw, parsed pattern.

Pattern parsing workflow

For example, a pattern such as sk(-\w{english}){3}[0-9]{2} might look like this after going through this parsing:

AST after parsing

Every pattern parses as a group. In this case, the root group contains a single segment. Segments consists of items, which hold syntax elements along with their modifiers (repeat and optional). Literals are parsed and stored individually.

Abstract Syntax Tree

The data structure used to represent the AST of a parsed pattern in Passgen is called Pattern<T>, where T is a type argument which configures what data structures are used for certain elements. We will gloss over these for now, and show a simplified version of the AST.

Pattern

#![allow(unused)]
fn main() {
enum Pattern {
    Literal(Literal),
    Set(Set),
    Special(Special)
    Group(Group),
}
}

The base data type for Passgen patterns is the Pattern type, which can be any of the possible syntax elements:

Literal: Raw characters that are emitted unchanged
Set: Set of possible characters to choose from
Group: Segments of syntax elements, out of which one is chosen at random.
Special: Special elements, such as a random word from a wordlist or a predefined pattern.

Literal

#![allow(unused)]
fn main() {
struct Literal {
    value: char,
}
}

Literals are raw characters which are output unchanged. For the initial AST, every Literal is just a single character, after optimization they are represented as String (consecutive literals are grouped together).

Set

#![allow(unused)]
fn main() {
struct Set {
    characters: BTreeMap<char, usize>,
}
}

Sets consists of characters along with their weight. When generating, a random character is chosen (with the weight being taken into account). The real data structure used to store these is more efficient than the BTreeMap shown here, but it is sufficient for understanding.

Special

#![allow(unused)]
fn main() {
struct Special {
    Wordlist(String),
    Markov(String),
    Pattern(String),
}
}

Special elements can be one of wordlist, markov or pattern. These refer to a wordlist, markov-chain or pattern preset by name.

Group

#![allow(unused)]
fn main() {
struct Group {
    segments: Vec<Segment>
}
}

Groups consist of segments, out of which one is chosen at random at generation time. Groups can also just have one segment.

Segment

#![allow(unused)]
fn main() {
struct Segment {
    items: Vec<Item>
}
}

Segments are sequences of Passgen syntax items.

Item

#![allow(unused)]
fn main() {
struct Item {
    element: Pattern,
    optional: bool,
    repeat: RangeInclusive<usize>,
}
}

Passgen syntax items are Passgen elements, with additional modifiers.

Optimizer

This AST representation is then sent through an optimization pass, which simplifies the pattern and yields a list of dependencies. Dependencies are generally named wordlists, markov-chains and pattern presets.

Pattern processing workflow

The optimizer does three things:

Simplifies the AST
Turns data structures that are optimized for parsing into data structures optimized for generation
Finds unbound references (references to wordlists or pattern presets)

Fundamentally, the optimizer works by traversing the AST.

Simplifications

Single-Segment Group

The optimizer will turn a single-segment group into its contents.

passgen("(abc)") => passgen("abc")

Single-Character Set

The optimizer will turn a single-character set into just the single character.

passgen("[aa]") => passgen("a")

Data structure transform

Unbound references

Preparing

Data Structures

Passgen uses data structures to optimize both the parsing of patterns into the AST, and the generation of passphrases from them.

The AST of a Passgen pattern is represented by the Pattern type, which comes in three variants:

Type	Description
`Pattern<Parsed>`	Optimized for parsing a Passgen pattern (insertion).
`Pattern<Optimized>`	Data structure optimized for generating a Passgen pattern.
`Pattern<Prepared>`	Data structure which has all external dependencies resolved and is ready for generation.

Literals

Literals are characters which are passed through unchanged when generating a passphrase.

Generally, they are represented by a Literal struct:

#![allow(unused)]
fn main() {
struct Literal {
    value: char,
}
}

After optimization, literals are represented by a String:

#![allow(unused)]
fn main() {
struct Literal {
    value: String,
}
}

The reason for this is that multiple literals might be collapsed into one, if they do not contain modifiers.

Sets

Sets in Passgen contain sets of character ranges (a single character being represented as a range which contains just one character), mapped to their weight.

Rust has a special-purpose data structure that is useful for representing these efficiently, which is the RangeMap. This data structure allows us

Groups

Wordlists

Fundamentally, word lists are just that: lists of words. In addition to storing the words, Passgen also accepts a frequency indicator.

There are two options for data structures to represent these. The tradeoffs that need to be made are in the memory usage of the loaded wordlist, and the time it takes to load it into memory.

When parsing wordlists, a BTree or a HashMap are convenient, as they typically allow for fast $O(n log n)$ insertion times.

BTreeMap
HashMap
Vec<()>
Trie

// insertion time graph

// memory usage graph

With these results, the best option

Markov-Chain

Markov-chains store the letter frequency distribution for each $n$-length prefix, and are able to use this to generate randomized pseudo-words with a similar letter distribution as the wordlist they were generated from.

Frequency table

A frequency table stores the frequency (occurences) of each letter. For example:

Character	Count
`a`	6
`b`	3
`c`	1
`d`	9
...	...

Generating a random character from such a table is implemented using Weighted Generation.

Prefix frequency table

Data Structures

For example, a word such as discombobulated would be split up into character tokens like this:

\0 \0 d i s c o m b o b u l a t e d \0

There are three considerations to make for the in-memory representation of a markov-chain:

Time to build (insertion time)
Memory usage
Time to generate it (access time).

In terms of data structures, implementing a Markov-chain requires two primary data structures: a data structure to represent character frequency (such as a map from character to occurences), and a data structure to map prefixes to this character frequency data. In Rust terms, such a table could be represented as such:

// Stores frequency per character
type Frequency = Map<char, usize>;

// Prefix of length two
type Prefix = [char; 2];

// Stores frequency for every prefix
type Markov = Map<Prefix, Frequency>;

There are different built-in data structures in Rust which can be used for this mapping. For this reason, both the frequency table and the prefix table were made into traits, and several different implementations were tested to evaluated for their memory usage and insertion speed.

The implementations that were tested are:

Rust's built-in HashMap
Rust's built-in BTreeMap
Rust's built-in Vec
The hashbrown crate HashMap

Table creation

The memory usage of the different options were very similar:

However, a larger difference can be seen in the insertion time.

Generation

Randomness

As a passphrase generator, Passgen depends heavily on having access to sources of randomness.

There are two kinds of sources of randomness that Passgen supports:

Non-deterministic: Sources such as the system secure random number generator always generates unpredictable output. This is the default for Passgen, if no other option is specified.
Deterministic: In some cases, it is desired to use deterministic sources of randomness. For example, to be able to generate passphrases from a master-passphrase. These sources of randomness are also used internally for the unit tests, to ensure that the output of Passgen is stable.

Independent streams

Since Passgen supports outputting multiple passphrases, every source of randomness needs to be able to generate n independent streams of randomness.

Randomness

For non-deterministic sources of randomness, it can just create multiple instances of the same source of randomness. For deterministic sources of randomness, care must be taken to ensure that:

Independence: Every stream of randomness returned is independent, meaning that it is non-overlapping with any other stream.
Determinism: Whenever you get the nth stream, regardless of which streams you have produced previously (or in which order), it will always yield the same sequence of random values.

Deterministic source of randomness are typically implemented using pseudorandom number generators. Implementing these properties can be done in one of three ways:

Built-in notion of streams: Some of these have a built-in notion of streams, such that for the same seed, n independent streams can be produced.
Ability to seek: Some have the ability to seek in the output stream. Seeking to a large offset (2^64) is sufficient to implement independence for all reasonable use-cases of Passgen, because Passgen is unable to generate a passphrase using this much of the stream.
Recursive seeding: It can also be implemented using recursion, using the pseudorandom number generator to generate seeds for itself.

Implementation

Passgen uses the rand crate behind the scenes. All sources of randomness must implement the RandomSource trait, which has methods to use it to create some value which implements rand::Rng:

#![allow(unused)]
fn main() {
pub trait RandomSource {
    /// Underlying type of the randomness source.
    type Rng: RngCore + 'static;

    /// Return a stream of randomness for the given index.
    fn get_rng(&self, index: usize) -> Self::Rng;
}
}

As you can see, the get_rng() function takes an index. When Passgen creates multiple passphrases, it will generate one random number generator per passphrase that it generates.

Part of the reason this interface was built this way is because it allows Passgen to use multi-threading when generating large amounts of passphrases. Another possible design would be to use a single randomness generator and to use it in sequence to generate passphrases, but this means that the generation of each passphrase is dependent on the previous one and forces Passgen to work serially.

Passgen currently supports three randomness sources:

System: Secure system random number generator.
Xoshiro256PlusPlus: Deterministic number generator, mainly used for testing
ChaCha20: Cryptographic deterministic number generator, used to implement the master-passphrase mode.

These sources are discussed in the next chapters.

System

The System source of randomness uses whatever secure randomness generator is available on the platform that Passgen is compiled. For UNIX-based systems, this can often be /dev/urandom, or a syscall such as getrandom() or getentropy().

System randomness source just uses the rand::thread_rng() method to return the current thread-local random number generator.

#![allow(unused)]
fn main() {
impl RandomSource for System {
    type Rng = rand::rngs::ThreadRng;

    fn get_rng(&self, _index: usize) -> Self::Rng {
        rand::thread_rng()
    }
}
}

You can find data on the source of randomness used in the getrandom, which is used to implement ThreadRng.

Xoshiro

The Xoshiro deteministic random number source uses the Xoshiro256PlusPlus pseudorandom number generator. It can be seeded with an u64. It mainly exists for testing purposes.

The implementation uses the jump() function of the PRNG to skip $ n * 2^{64} $ bytes of the output stream, giving it independent streams.

#![allow(unused)]
fn main() {
pub struct Xoshiro(u64);

impl RandomSource for Xoshiro {
    type Rng = Xoshiro256PlusPlus;

    fn get_rng(&self, index: usize) -> Self::Rng {
        let mut rng = Xoshiro256PlusPlus::seed_from_u64(self.0);
        for _ in 0..index {
            rng.jump();
        }
        rng
    }
}
}

ChaCha20

Master Passphrase

https://www.usenix.org/publications/loginonline/bcrypt-25-retrospective-password-security

Generation

The purpose of the generation module is to generate random passphrases from a prepared Pattern AST, and to track the entropy of the resulting passphrase as it is being generated.

Passgen Documentation