Cover
Passgen generates random sequences from a dialect of regular expressions. It is suitable for generating secure passphrases that are memorable but have high-entropy. It is available for download as a command-line utility. For a quick overview of how it works, see usage.
Preface
Background
Related Works
Libary used by 1Password to generate secure passwords.
Research
-
https://raw.githubusercontent.com/PolyPasswordHasher/PolyPasswordHasher/refs/heads/master/academic-writeup/paper.pdf
-
https://stuartschechter.medium.com/before-you-use-a-password-manager-9f5949ccf168
-
https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2016/03/time-rethink-mandatory-password-changes
-
https://cups.cs.cmu.edu/passwords.html
-
https://patrickmn.com/security/storing-passwords-securely/
-
https://www.ise.io/casestudies/password-manager-hacking/
-
https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/
-
https://techcommunity.microsoft.com/t5/microsoft-entra-blog/your-pa-word-doesn-t-matter/ba-p/731984
-
https://blog.codinghorror.com/password-rules-are-bullshit/
-
https://anarc.at/blog/2017-02-18-passwords-entropy/
-
https://mitxela.com/projects/shamirs_password_store
-
https://fusionauth.io/articles/security/math-of-password-hashing-algorithms-entropy
-
https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/
-
https://seirdy.one/posts/2021/01/12/password-strength/
-
https://crypto.stanford.edu/PwdHash/
-
https://bitspook.in/blog/how-i-manage-my-passwords/
-
https://er.educause.edu/blogs/2017/10/time-for-password-expiration-to-die
-
https://brandur.org/fragments/password-hashing
-
https://lock.cmpxchg8b.com/passmgrs.html
-
https://www.troyhunt.com/86-of-passwords-are-terrible-and-other-statistics/
-
https://seirdy.one/posts/2021/01/12/password-strength/
Explainer of password strength
Features
Syntax
Instead of hard-coding different ways to generate random passphrases, Passgen uses a language to describe the shape of the passphrase it should generate. In the code, this is called a pattern.
The language it uses is similar to regular expressions, which is a language that is often used to validate or search text. However, Passgen uses that in reverse: instead of using the pattern to match some text, it generates some text which matches the pattern.
Inputs
As Passgen is designed to generate passphrases, it operates on text. To be compatible with different languages and keyboard layouts, it uses the Unicode encoding. This allows it to generate passphrases for nearly all languages which are alive today.
- Any valid Unicode character can be used in the language that Passgen uses. In Unicode terminology, characters are called codepoints.
- Any Unicode codepoint can also be input in an escaped form using only ASCII
characters. For example, from Passgen's point of view,
\u{ab}
andx
are the same character. - Characters may also be escaped. This is only useful for characters that Passgen uses for syntactic purposes.
Paper
Installation
You can install the Passgen command-line interface like this:
cargo install passgen-cli
Configuration
Compiling
Code Documentation
Parsing
Passgen uses a custom language to encode the pattern it uses to generate passphrases. It is fully AST-based, meaning that the result of parsing is an AST, and the AST is also consumed by the generator (as opposed to using a bytecode approach, whereby the parsed pattern AST is first compiled into bytecode, which is later executed).
Processing of parsed patterns in Passgen is done in three steps:
- Parsing: In this stage, the pattern is parsed and loaded into a raw AST.
- Optimizing: An optimizer simplifies the AST, yielding an AST that is optimized for generation.
- Preparing: During preparation, all external dependencies are resolved and injected into the AST, yielding a Pattern that can be used for generation.
The goal of this is to yield an AST that can be fed to the generator, which includes all the data needed to generate a passphrase from the parsed pattern.
Finally, the pattern is prepared for generation. This consumes the optimized AST and injects dependencies.
Parsing
Parsing is implemented using the Pest crate. This uses a custom grammar
and generates from it a parser which implements error display, and yields an
iterator called Pairs with the result of the parsing. These iterators are consumed
and turned into a Pattern<Parsed>
, which is the internal representation of
an AST of a raw, parsed pattern.
For example, a pattern such as sk(-\w{english}){3}[0-9]{2}
might look like this
after going through this parsing:
Every pattern parses as a group. In this case, the root group contains a single segment. Segments consists of items, which hold syntax elements along with their modifiers (repeat and optional). Literals are parsed and stored individually.
Abstract Syntax Tree
The data structure used to represent the AST of a parsed pattern in Passgen
is called Pattern<T>
, where T
is a type argument which configures what
data structures are used for certain elements. We will gloss over these for
now, and show a simplified version of the AST.
Pattern
#![allow(unused)] fn main() { enum Pattern { Literal(Literal), Set(Set), Special(Special) Group(Group), } }
The base data type for Passgen patterns is the Pattern
type, which can be
any of the possible syntax elements:
- Literal: Raw characters that are emitted unchanged
- Set: Set of possible characters to choose from
- Group: Segments of syntax elements, out of which one is chosen at random.
- Special: Special elements, such as a random word from a wordlist or a predefined pattern.
Literal
#![allow(unused)] fn main() { struct Literal { value: char, } }
Literals are raw characters which are output unchanged. For the initial AST,
every Literal is just a single character, after optimization they are represented
as String
(consecutive literals are grouped together).
Set
#![allow(unused)] fn main() { struct Set { characters: BTreeMap<char, usize>, } }
Sets consists of characters along with their weight. When generating, a random
character is chosen (with the weight being taken into account). The real data
structure used to store these is more efficient than the BTreeMap
shown here,
but it is sufficient for understanding.
Special
#![allow(unused)] fn main() { struct Special { Wordlist(String), Markov(String), Pattern(String), } }
Special elements can be one of wordlist, markov or pattern. These refer to a wordlist, markov-chain or pattern preset by name.
Group
#![allow(unused)] fn main() { struct Group { segments: Vec<Segment> } }
Groups consist of segments, out of which one is chosen at random at generation time. Groups can also just have one segment.
Segment
#![allow(unused)] fn main() { struct Segment { items: Vec<Item> } }
Segments are sequences of Passgen syntax items.
Item
#![allow(unused)] fn main() { struct Item { element: Pattern, optional: bool, repeat: RangeInclusive<usize>, } }
Passgen syntax items are Passgen elements, with additional modifiers.
Optimizer
This AST representation is then sent through an optimization pass, which simplifies the pattern and yields a list of dependencies. Dependencies are generally named wordlists, markov-chains and pattern presets.
The optimizer does three things:
- Simplifies the AST
- Turns data structures that are optimized for parsing into data structures optimized for generation
- Finds unbound references (references to wordlists or pattern presets)
Fundamentally, the optimizer works by traversing the AST.
Simplifications
Single-Segment Group
The optimizer will turn a single-segment group into its contents.
passgen("(abc)") => passgen("abc")
Single-Character Set
The optimizer will turn a single-character set into just the single character.
passgen("[aa]") => passgen("a")
Data structure transform
Unbound references
Preparing
Data Structures
Passgen uses data structures to optimize both the parsing of patterns into the AST, and the generation of passphrases from them.
The AST of a Passgen
pattern is represented by the Pattern
type, which comes in three variants:
Type | Description |
---|---|
Pattern<Parsed> | Optimized for parsing a Passgen pattern (insertion). |
Pattern<Optimized> | Data structure optimized for generating a Passgen pattern. |
Pattern<Prepared> | Data structure which has all external dependencies resolved and is ready for generation. |
Literals
Literals are characters which are passed through unchanged when generating a passphrase.
Generally, they are represented by a Literal
struct:
#![allow(unused)] fn main() { struct Literal { value: char, } }
After optimization, literals are represented by a String
:
#![allow(unused)] fn main() { struct Literal { value: String, } }
The reason for this is that multiple literals might be collapsed into one, if they do not contain modifiers.
Sets
Sets in Passgen contain sets of character ranges (a single character being represented as a range which contains just one character), mapped to their weight.
Rust has a special-purpose data structure that is useful for representing these efficiently, which is the RangeMap. This data structure allows us
Groups
Wordlists
Fundamentally, word lists are just that: lists of words. In addition to storing the words, Passgen also accepts a frequency indicator.
There are two options for data structures to represent these. The tradeoffs that need to be made are in the memory usage of the loaded wordlist, and the time it takes to load it into memory.
When parsing wordlists, a BTree or a HashMap are convenient, as they typically allow for fast $O(n log n)$ insertion times.
- BTreeMap
- HashMap
- Vec<()>
- Trie
// insertion time graph
// memory usage graph
With these results, the best option
Markov-Chain
Markov-chains store the letter frequency distribution for each \(n\)-length prefix, and are able to use this to generate randomized pseudo-words with a similar letter distribution as the wordlist they were generated from.
Frequency table
A frequency table stores the frequency (occurences) of each letter. For example:
Character | Count |
---|---|
a | 6 |
b | 3 |
c | 1 |
d | 9 |
... | ... |
Generating a random character from such a table is implemented using Weighted Generation.
Prefix frequency table
Data Structures
For example, a word such as discombobulated would be split up into character tokens like this:
\0 \0 d i s c o m b o b u l a t e d \0
There are three considerations to make for the in-memory representation of a markov-chain:
- Time to build (insertion time)
- Memory usage
- Time to generate it (access time).
In terms of data structures, implementing a Markov-chain requires two primary data structures: a
data structure to represent character frequency (such as a map from character
to occurences
), and
a data structure to map prefixes to this character frequency data. In Rust terms, such a table
could be represented as such:
// Stores frequency per character
type Frequency = Map<char, usize>;
// Prefix of length two
type Prefix = [char; 2];
// Stores frequency for every prefix
type Markov = Map<Prefix, Frequency>;
There are different built-in data structures in Rust which can be used for this mapping. For this reason, both the frequency table and the prefix table were made into traits, and several different implementations were tested to evaluated for their memory usage and insertion speed.
The implementations that were tested are:
- Rust's built-in
HashMap
- Rust's built-in
BTreeMap
- Rust's built-in
Vec
- The
hashbrown
crateHashMap
Table creation
The memory usage of the different options were very similar:
However, a larger difference can be seen in the insertion time.
Generation
Randomness
As a passphrase generator, Passgen depends heavily on having access to sources of randomness.
There are two kinds of sources of randomness that Passgen supports:
- Non-deterministic: Sources such as the system secure random number generator always generates unpredictable output. This is the default for Passgen, if no other option is specified.
- Deterministic: In some cases, it is desired to use deterministic sources of randomness. For example, to be able to generate passphrases from a master-passphrase. These sources of randomness are also used internally for the unit tests, to ensure that the output of Passgen is stable.
Independent streams
Since Passgen supports outputting multiple passphrases, every source of randomness needs to be able to generate n independent streams of randomness.
For non-deterministic sources of randomness, it can just create multiple instances of the same source of randomness. For deterministic sources of randomness, care must be taken to ensure that:
- Independence: Every stream of randomness returned is independent, meaning that it is non-overlapping with any other stream.
- Determinism: Whenever you get the nth stream, regardless of which streams you have produced previously (or in which order), it will always yield the same sequence of random values.
Deterministic source of randomness are typically implemented using pseudorandom number generators. Implementing these properties can be done in one of three ways:
- Built-in notion of streams: Some of these have a built-in notion of streams, such that for the same seed, n independent streams can be produced.
- Ability to seek: Some have the ability to seek in the output stream. Seeking to a large offset (2^64) is sufficient to implement independence for all reasonable use-cases of Passgen, because Passgen is unable to generate a passphrase using this much of the stream.
- Recursive seeding: It can also be implemented using recursion, using the pseudorandom number generator to generate seeds for itself.
Implementation
Passgen uses the rand
crate behind the scenes. All sources of randomness must
implement the RandomSource
trait, which has methods to use it to create some value
which implements rand::Rng
:
#![allow(unused)] fn main() { pub trait RandomSource { /// Underlying type of the randomness source. type Rng: RngCore + 'static; /// Return a stream of randomness for the given index. fn get_rng(&self, index: usize) -> Self::Rng; } }
As you can see, the get_rng()
function takes an index. When Passgen creates
multiple passphrases, it will generate one random number generator per passphrase
that it generates.
Part of the reason this interface was built this way is because it allows Passgen to use multi-threading when generating large amounts of passphrases. Another possible design would be to use a single randomness generator and to use it in sequence to generate passphrases, but this means that the generation of each passphrase is dependent on the previous one and forces Passgen to work serially.
Passgen currently supports three randomness sources:
- System: Secure system random number generator.
- Xoshiro256PlusPlus: Deterministic number generator, mainly used for testing
- ChaCha20: Cryptographic deterministic number generator, used to implement the master-passphrase mode.
These sources are discussed in the next chapters.
System
The System source of randomness uses whatever secure randomness generator
is available on the platform that Passgen is compiled. For UNIX-based systems,
this can often be /dev/urandom
, or a syscall such as getrandom()
or
getentropy()
.
System randomness source just uses the rand::thread_rng()
method to return
the current thread-local random number generator.
#![allow(unused)] fn main() { impl RandomSource for System { type Rng = rand::rngs::ThreadRng; fn get_rng(&self, _index: usize) -> Self::Rng { rand::thread_rng() } } }
You can find data on the source of randomness used in the
getrandom, which is used to
implement ThreadRng
.
Xoshiro
The Xoshiro deteministic random number source uses the
Xoshiro256PlusPlus
pseudorandom number generator. It can be seeded with an u64
. It mainly
exists for testing purposes.
The implementation uses the jump()
function of the PRNG to skip \( n *
2^{64} \) bytes of the output stream, giving it independent streams.
#![allow(unused)] fn main() { pub struct Xoshiro(u64); impl RandomSource for Xoshiro { type Rng = Xoshiro256PlusPlus; fn get_rng(&self, index: usize) -> Self::Rng { let mut rng = Xoshiro256PlusPlus::seed_from_u64(self.0); for _ in 0..index { rng.jump(); } rng } } }
ChaCha20
Master Passphrase
- https://www.usenix.org/publications/loginonline/bcrypt-25-retrospective-password-security
Generation
The purpose of the generation module is to generate random passphrases from a prepared Pattern AST, and to track the entropy of the resulting passphrase as it is being generated.