EnglishText.jl Documentation
Many applications display information to readers in prose format instead of tabular format. It is often important to generate human-readable, grammatically correct prose. However, taking care of grammatical special cases is tedious.
EnglishText.jl solves this problem by providing a variety of convenient utility functions. It uses established algorithms where available. The precise methods used are documented in the modules themselves.
EnglishText.jl uses a modular approach. Applications not requiring all the exports may use a submodule, such as EnglishText.ItemLists
, instead of the entire package.
EnglishText.jl aims to
- provide a convenient, universally useful approach to abstracting away grammatical special cases
- be self-documenting where possible, but well-documented nevertheless
- not have unnecessary performance bottlenecks
Note that this is not a natural language processing package and does not aim to include an English parser.
Indefinite Article Selection
EnglishText.Articulate.indefinite
— Function.indefinite(word)
Determine the correct indefinite article, from “a” or “an”, for the given noun.
julia> using EnglishText
julia> indefinite("hour")
"an"
julia> indefinite("hand")
"a"
Word Representation of Numbers
EnglishText.Numeric.english
— Function.english(n::Integer)
Convert $n$ to English, given that $0 \le n < 10^{66}$.
julia> using EnglishText
julia> english(16)
"sixteen"
EnglishText.Numeric.unenglish
— Function.unenglish(T <: Integer, data::AbstractString) → T
Convert data
to an integral type. This function has the guarantee that unenglish(Int, english(x)) == x
, modulo any type differences. It is not guaranteed to work well or throw exceptions on other inputs.
julia> using EnglishText
julia> unenglish(Int, "sixteen")
16
Quantities and Pluralization
ItemQuantity(n::Integer, item::AbstractString)
Represents a quantity of n
occurrences of item
. Although this is not a collection, for ease of use, it implements some of the standard collection methods length
(for number of items) and isempty
(for whether there are no items).
julia> using EnglishText
julia> ItemQuantity(2, "apple")
2 apples
julia> ItemQuantity(1, "standard canine")
1 standard canine
Base.isempty
— Method.isempty(quantity::ItemQuantity)
Return true
if the given ItemQuantity
represents no items.
julia> using EnglishText
julia> isempty(ItemQuantity(0, "orange"))
true
julia> isempty(ItemQuantity(4, "person"))
false
Base.length
— Method.length(quantity::ItemQuantity)
Return the number of items represented by this quantity.
julia> using EnglishText
julia> length(ItemQuantity(7, "desk"))
7
EnglishText.Pluralize.pluralize
— Function.pluralize(word; classical=true)
Pluralize a singular noun word
(given in canonical capitalization) using heuristics and lists of exceptions. If word
is not a singular noun, this function may give strange results.
If classical
is set to true
, then the classical (i.e. inherited from Latin or Greek) pluralization is chosen instead of the anglicized pluralization. As an example, the classical plural of "vertex"
is "vertices"
, but the anglicized plural is "vertexes"
. By default, the classical pluralization is used.
julia> using EnglishText
julia> pluralize("fox")
"foxes"
julia> pluralize("radius")
"radii"
julia> pluralize("radius", classical=false)
"radiuses"
EnglishText.Pluralize.singularize
— Function.singularize(word)
Unpluralize a plural noun (given in canonical capitalization) using heuristics and lists of exceptions. If the given word
is not a plural noun, the result may be unpredictable.
julia> using EnglishText
julia> singularize("foxes")
"fox"
julia> singularize("data")
"datum"
Lists of Nouns and Adjectives
EnglishText.ItemLists.ItemList
— Type.ItemList(objects, connective=Sum())
A list of items or adjectives, which supports printing in standard English format. The first argument objects
should be an iterator over some number of strings or other objects, including EnglishText.ItemQuantity
objects.
The second argument connective
should be one of:
Sum()
, which represents a list of nouns in a collection of thingsDisjunction()
, which represents a list of traits (typically adjectives or adverbs, but possibly also verbs or nouns) for which at least one should be satisfiedConjunction()
, which represents a list of traits that should all be satisfied
If omitted, connective
is set to Sum()
.
julia> using EnglishText
julia> ItemList(["apples", "oranges"])
apples and oranges
julia> ItemList([ItemQuantity(2, "pencil"), ItemQuantity(1, "pen")])
2 pencils and 1 pen
julia> ItemList(["animal", "plant"], Disjunction())
animal or plant
julia> ItemList(["red", "blue", "white"], Conjunction())
red, blue, and white
julia> "Help us use and test this software."
"Help us use and test this software."
Parsing Sentences
EnglishText.Text.sentences
— Function.sentences(text::AbstractString)
Return an iterable over the Sentence
s contained within text
. Sentences are identified naïvely; that is, every full stop, exclamation mark, or question mark is considered to delimit a sentence. This is of course prone to error, as some full stops are used for abbreviations and not for delimiting sentences.
julia> using EnglishText
julia> for s in sentences("Hi! Iterate over sentences. OK?")
println(s)
end
Hi!
Iterate over sentences.
OK?
Internals
An object representing a string of text but with additional semantic information. These objects convert to String
s through the string
function, but also typically support other operations.
Citations
- Conway, D. M. (1998, August). An algorithmic approach to english pluralization. In Proceedings of the Second Annual Perl Conference.