regparser.grammar package

Submodules

regparser.grammar.amdpar module

regparser.grammar.amdpar.generate_verb(word_list, verb, active)[source]

Short hand for making tokens.Verb from a list of trigger words

regparser.grammar.amdpar.make_multiple(to_repeat)[source]

Shorthand for handling repeated tokens (‘and’, ‘,’, ‘through’)

regparser.grammar.amdpar.make_par_list(listify, force_text_field=False)[source]

Shorthand for turning a pyparsing match into a tokens.Paragraph

regparser.grammar.amdpar.tokenize_override_ps(match)[source]

Create token.Paragraphs for the given override match

regparser.grammar.appendix module

regparser.grammar.appendix.decimalize(characters, name)[source]
regparser.grammar.appendix.parenthesize(characters, name)[source]

regparser.grammar.atomic module

Atomic components; probably shouldn’t use these directly

regparser.grammar.delays module

class regparser.grammar.delays.Delayed[source]

Bases: object

Placeholder token

class regparser.grammar.delays.EffectiveDate[source]

Bases: object

Placeholder token

regparser.grammar.interpretation_headers module

regparser.grammar.terms module

regparser.grammar.tokens module

Set of Tokens to be used when parsing. @label is a list describing the depth of a paragraph/context. It follows: [ Part, Subpart/Appendix/Interpretations, Section, p-level-1, p-level-2, p-level-3, p-level4, p-level5 ]

class regparser.grammar.tokens.AndToken[source]

Bases: regparser.grammar.tokens.Token

The word ‘and’ can help us determine if a Context token should be a Paragraph token. Note that ‘and’ might also trigger the creation of a TokenList, which takes precedent

class regparser.grammar.tokens.Context(label, certain=False)[source]

Bases: regparser.grammar.tokens.Token

Represents a bit of context for the paragraphs. This gets compressed with the paragraph tokens to define the full scope of a paragraph. To complicate matters, sometimes what looks like a Context is actually the entity which is being modified (i.e. a paragraph). If we are certain that this is only context, (e.g. “In Subpart A”), use ‘certain’

certain
label
class regparser.grammar.tokens.Paragraph(label=NOTHING, field=None)[source]

Bases: regparser.grammar.tokens.Token

Represents an entity which is being modified by the amendment. Label is a way to locate this paragraph (though see the above note). We might be modifying a field of a paragraph (e.g. intro text only, or title only;) if so, set the field parameter.

HEADING_FIELD = 'title'
KEYTERM_FIELD = 'heading'
TEXT_FIELD = 'text'
field
label
label_text()[source]

Converts self.label into a string

classmethod make(label=None, field=None, part=None, sub=None, section=None, paragraphs=None, paragraph=None, subpart=None, is_interp=None, appendix=None)[source]

label and field are the only “materialized” fields. Everything other field becomes part of the label, offering a more legible API. Particularly useful for writing tests

class regparser.grammar.tokens.Token[source]

Bases: object

Base class for all tokens. Provides methods for pattern matching and copying this token

match(*types, **fields)[source]

Pattern match. self must be one of the types provided (if they were provided) and all of the fields must match (if fields were provided). If a successful match, returns self

class regparser.grammar.tokens.TokenList(tokens)[source]

Bases: regparser.grammar.tokens.Token

Represents a sequence of other tokens, e.g. comma separated of created via “through”

tokens
class regparser.grammar.tokens.Verb(verb, active, and_prefix=False)[source]

Bases: regparser.grammar.tokens.Token

Represents what action is taking place to the paragraphs

DELETE = 'DELETE'
DESIGNATE = 'DESIGNATE'
INSERT = 'INSERT'
KEEP = 'KEEP'
MOVE = 'MOVE'
POST = 'POST'
PUT = 'PUT'
RESERVE = 'RESERVE'
active
and_prefix
verb
regparser.grammar.tokens.uncertain_label(label_parts)[source]

Convert a list of strings/Nones to a ‘-‘-separated string with question markers to replace the Nones. We use this format to indicate uncertainty

regparser.grammar.unified module

Some common combinations

regparser.grammar.unified.appendix_section(match)[source]

Appendices may have parenthetical paragraphs in its section number.

regparser.grammar.unified.make_multiple(head, tail=None, wrap_tail=False)[source]

We have a recurring need to parse citations which have a string of terms, e.g. section 11(a), (b)(4), and (5). This function is a shorthand for setting these elements up

regparser.grammar.utils module

class regparser.grammar.utils.DocLiteral(literal, ascii_text)[source]

Bases: pyparsing.Literal

Setting an objects name to a unicode string causes Sphinx to freak out. Instead, we’ll replace with the provided (ascii) text.

regparser.grammar.utils.Marker(txt)[source]
class regparser.grammar.utils.Position(start, end)

Bases: tuple

end

Alias for field number 1

start

Alias for field number 0

class regparser.grammar.utils.QuickSearchable(expr, force_regex_str=None)[source]

Bases: pyparsing.ParseElementEnhance

Pyparsing’s scanString (i.e. searching for a grammar over a string) tests each index within its search string. While that offers maximum flexibility, it is rather slow for our needs. This enhanced grammar type wraps other grammars, deriving from them a first regular expression to use when `scanString`ing. This cuts search time considerably.

classmethod and_case(*first_classes)[source]

“And” grammars are relatively common; while we generally just want to look at their first terms, this decorator lets us describe special cases based on the class type of the first component of the clause

classmethod case(*match_classes)[source]

Add a “case” which will match grammars based on the provided class types. If there’s a match, we’ll execute the function

cases = [<function wordstart>, <function optional>, <function empty>, <function match_and>, <function match_or>, <function suppress>, <function has_re_string>, <function line_start>, <function literal>]
classmethod initial_regex(grammar)[source]

Given a Pyparsing grammar, derive a set of suitable initial regular expressions to aid our search. As grammars may Or together multiple sub-expressions, this always returns a set of possible regular expression strings. This is _not_ a complete conversion to regexes nor does it account for every Pyparsing element; add as needed

scanString(instring, maxMatches=None, overlap=False)[source]

Override scanString to attempt parsing only where there’s a regex search match (as opposed to every index). Does not implement the full scanString interface.

regparser.grammar.utils.SuffixMarker(txt)[source]
regparser.grammar.utils.WordBoundaries(grammar)[source]
regparser.grammar.utils.empty(grammar)[source]
regparser.grammar.utils.has_re_string(grammar)[source]
regparser.grammar.utils.keep_pos(expr)[source]

Transform a pyparsing grammar by inserting an attribute, “pos”, on the match which describes position information

regparser.grammar.utils.line_start(grammar)[source]
regparser.grammar.utils.literal(grammar)[source]
regparser.grammar.utils.match_and(grammar)[source]
regparser.grammar.utils.match_or(grammar)[source]
regparser.grammar.utils.optional(grammar)[source]
regparser.grammar.utils.parse_position(source, location, tokens)[source]

A pyparsing parse action which pulls out (and removes) the position information and replaces it with a Position object

regparser.grammar.utils.suppress(grammar)[source]
regparser.grammar.utils.wordstart(grammar)[source]

Optimization: WordStart is generally followed by a more specific identifier. Rather than searching for the start of a word alone, search for that identifier as well

Module contents