regparser package

Subpackages

Submodules

regparser.api_stub module

regparser.api_writer module

class regparser.api_writer.APIWriteContent(*path_parts)[source]

This writer writes the contents to the specified API

write(python_obj)[source]

Write the object (as json) to the API

class regparser.api_writer.AmendmentNodeEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]

Bases: regparser.notice.encoder.AmendmentEncoder, regparser.tree.struct.NodeEncoder

class regparser.api_writer.Client(base)[source]

A Client for writing regulation(s) and meta data.

diff(label, old_version, new_version)[source]
layer(layer_name, doc_type, doc_id)[source]
notice(doc_number)[source]
preamble(doc_number)[source]
regulation(label, doc_number)[source]
class regparser.api_writer.FSWriteContent(*path_parts)[source]

This writer places the contents in the file system

write(python_obj)[source]

Write the object as json to disk

class regparser.api_writer.GitWriteContent(*path_parts)[source]

This writer places the content in a git repo on the file system

static folder_name(node)[source]

Directories are generally just the last element a node’s label, but subparts and interpretations are a little special.

write(python_object)[source]
write_tree(root_path, node)[source]

Given a file system path and a node, write the node’s contents and recursively write its children to the provided location.

regparser.builder module

regparser.citations module

class regparser.citations.Label(schema=None, **kwargs)[source]

Bases: object

SCHEMA_FIELDS = set(['p2', 'p3', 'p1', 'p6', 'p7', 'p4', 'p5', 'cfr_title', 'p8', 'p9', 'comment', 'appendix', 'appendix_section', 'c3', 'c2', 'part', 'c1', 'section', 'c4'])
app_schema = ('part', 'appendix', 'p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9')
app_sect_schema = ('part', 'appendix', 'appendix_section', 'p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9')
comment_schema = ('comment', 'c1', 'c2', 'c3', 'c4')
copy(schema=None, **kwargs)[source]

Keep any relevant prefix when copying

default_schema = ('cfr_title', 'part', 'section', 'p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9')
static determine_schema(settings)[source]
classmethod from_node(node)[source]

Convert between a struct.Node and a Label; use heuristics to determine which schema to follow. Node labels aren’t as expressive as Label objects

labels_until(other)[source]

Given self as a starting point and other as an end point, yield a Label for paragraphs in between. For example, if self is something like 123.45(a)(2) and end is 123.45(a)(6), this should emit 123.45(a)(3), (4), and (5)

regtext_schema = ('cfr_title', 'part', 'section', 'p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9')
to_list(for_node=True)[source]

Convert a Label into a struct.Node style label list. Node labels don’t contain CFR titles

class regparser.citations.ParagraphCitation(start, end, label, full_start=None, full_end=None, in_clause=False)[source]

Bases: object

regparser.citations.cfr_citations(text, include_fill=False)[source]

Find all citations which include CFR title and part

regparser.citations.internal_citations(text, initial_label=None, require_marker=False, title=None)[source]

List of all internal citations in the text. require_marker helps by requiring text be prepended by ‘comment’/’paragraphs’/etc. title represents the CFR title (e.g. 11 for FEC, 12 for CFPB regs) and is used to correctly parse citations of the the form 11 CFR 110.1 when 11 CFR 110 is the regulation being parsed.

regparser.citations.match_to_label(match, initial_label, comment=False)[source]

Return the citation and offsets for this match

regparser.citations.multiple_citations(matches, initial_label, comment=False, include_fill=False)[source]

Similar to single_citations save that we have a compound citation, such as “paragraphs (b), (d), and (f). Yield a ParagraphCitation for each sub-citation. We refer to the first match as “head” and all following as “tail”

regparser.citations.remove_citation_overlaps(text, possible_markers)[source]

Given a list of markers, remove any that overlap with citations

regparser.citations.select_encompassing_citations(citations)[source]

The same citation might be found by multiple grammars; we take the most-encompassing of any overlaps

regparser.citations.single_citations(matches, initial_label, comment=False)[source]

For each pyparsing match, yield the corresponding ParagraphCitation

regparser.content module

We need to modify content from time to time, e.g. image overrides and xml macros. To provide flexibility in future expansion, we provide a layer of indirection here.

TODO: Delete and replace with plugins.

class regparser.content.ImageOverrides[source]

Bases: object

static get(key, default=None)[source]
class regparser.content.Macros[source]

Bases: object

regparser.federalregister module

regparser.search module

regparser.search.find_offsets(text, search_fn)[source]

Find the start and end of an appendix, supplement, etc.

regparser.search.find_start(text, heading, index)[source]

Find the start of an appendix, supplement, etc.

regparser.search.segments(text, offsets_fn, exclude=None)[source]

Split a block of text into a list of its sub parts. Often this means calling the offsets function repeatedly until there is no more text to process.

regparser.utils module

Module contents