Home

EnglishText.jl Documentation

Many applications display information to readers in prose format instead of tabular format. It is often important to generate human-readable, grammatically correct prose. However, taking care of grammatical special cases is tedious.

EnglishText.jl solves this problem by providing a variety of convenient utility functions. It uses established algorithms where available. The precise methods used are documented in the modules themselves.

EnglishText.jl uses a modular approach. Applications not requiring all the exports may use a submodule, such as EnglishText.ItemLists, instead of the entire package.

EnglishText.jl aims to

Note that this is not a natural language processing package and does not aim to include an English parser.

Indefinite Article Selection

indefinite(word)

Determine the correct indefinite article, from “a” or “an”, for the given noun.

julia> using EnglishText

julia> indefinite("hour")
"an"

julia> indefinite("hand")
"a"
source

Word Representation of Numbers

english(n::Integer)

Convert $n$ to English, given that $0 \le n < 10^{66}$.

julia> using EnglishText

julia> english(16)
"sixteen"
source
unenglish(T <: Integer, data::AbstractString) → T

Convert data to an integral type. This function has the guarantee that unenglish(Int, english(x)) == x, modulo any type differences. It is not guaranteed to work well or throw exceptions on other inputs.

julia> using EnglishText

julia> unenglish(Int, "sixteen")
16
source

Quantities and Pluralization

ItemQuantity(n::Integer, item::AbstractString)

Represents a quantity of n occurrences of item. Although this is not a collection, for ease of use, it implements some of the standard collection methods length (for number of items) and isempty (for whether there are no items).

julia> using EnglishText

julia> ItemQuantity(2, "apple")
2 apples

julia> ItemQuantity(1, "standard canine")
1 standard canine
source
Base.isemptyMethod.
isempty(quantity::ItemQuantity)

Return true if the given ItemQuantity represents no items.

julia> using EnglishText

julia> isempty(ItemQuantity(0, "orange"))
true

julia> isempty(ItemQuantity(4, "person"))
false
source
Base.lengthMethod.
length(quantity::ItemQuantity)

Return the number of items represented by this quantity.

julia> using EnglishText

julia> length(ItemQuantity(7, "desk"))
7
source
pluralize(word; classical=true)

Pluralize a singular noun word (given in canonical capitalization) using heuristics and lists of exceptions. If word is not a singular noun, this function may give strange results.

If classical is set to true, then the classical (i.e. inherited from Latin or Greek) pluralization is chosen instead of the anglicized pluralization. As an example, the classical plural of "vertex" is "vertices", but the anglicized plural is "vertexes". By default, the classical pluralization is used.

julia> using EnglishText

julia> pluralize("fox")
"foxes"

julia> pluralize("radius")
"radii"

julia> pluralize("radius", classical=false)
"radiuses"
source
singularize(word)

Unpluralize a plural noun (given in canonical capitalization) using heuristics and lists of exceptions. If the given word is not a plural noun, the result may be unpredictable.

julia> using EnglishText

julia> singularize("foxes")
"fox"

julia> singularize("data")
"datum"
source

Lists of Nouns and Adjectives

ItemList(objects, connective=Sum())

A list of items or adjectives, which supports printing in standard English format. The first argument objects should be an iterator over some number of strings or other objects, including EnglishText.ItemQuantity objects.

The second argument connective should be one of:

  • Sum(), which represents a list of nouns in a collection of things
  • Disjunction(), which represents a list of traits (typically adjectives or adverbs, but possibly also verbs or nouns) for which at least one should be satisfied
  • Conjunction(), which represents a list of traits that should all be satisfied

If omitted, connective is set to Sum().

julia> using EnglishText

julia> ItemList(["apples", "oranges"])
apples and oranges

julia> ItemList([ItemQuantity(2, "pencil"), ItemQuantity(1, "pen")])
2 pencils and 1 pen

julia> ItemList(["animal", "plant"], Disjunction())
animal or plant

julia> ItemList(["red", "blue", "white"], Conjunction())
red, blue, and white

julia> "Help us use and test this software."
"Help us use and test this software."
source

Parsing Sentences

sentences(text::AbstractString)

Return an iterable over the Sentences contained within text. Sentences are identified naïvely; that is, every full stop, exclamation mark, or question mark is considered to delimit a sentence. This is of course prone to error, as some full stops are used for abbreviations and not for delimiting sentences.

julia> using EnglishText

julia> for s in sentences("Hi! Iterate over sentences. OK?")
           println(s)
       end
Hi!
Iterate over sentences.
OK?
source

Internals

An object representing a string of text but with additional semantic information. These objects convert to Strings through the string function, but also typically support other operations.

source

Citations