Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions src/distributions/add_typos.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
import StringDistances: DamerauLevenshtein, evaluate

"""
word_with_typos::String ~ AddTypos(word::String, max_typos=nothing)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly concerned that this style of documentation implies that we support this syntax, when in fact I don't believe PClean's parser can handle type annotations on the LHS of a ~. We could just fix this in PClean (and probably should, eventually)—or we could find another way to communicate the return type?


Add a random number of random typos to the given string.

The distribution on the of typos added to a word depends on the word
length. On average there is approximately 1 typo for every 45 characters in the
input word when max_typos is large or not provided.

The typos can be one of several types:

- insertion: insert a random lower-case letter at a random location

- deletion: delete a random character

- substitution: replace a random character with a random lower-case letter

- transpose: swap a random pair of two consecutive letters

NOTE: The log-density is approximate
"""
struct AddTypos <: PCleanDistribution end

has_discrete_proposal(::AddTypos) = false
Expand Down