regparser.notice package

Submodules

regparser.notice.address module

regparser.notice.build module

regparser.notice.build_appendix module

regparser.notice.build_interp module

regparser.notice.changes module

This module contains functions to help parse the changes in a notice. Changes are the exact details of how the pargraphs, sections etc. in a regulation have changed.

class regparser.notice.changes.Change(label_id, content)

Bases: tuple

content

Alias for field number 1

label_id

Alias for field number 0

class regparser.notice.changes.NoticeChanges[source]

Bases: object

Notice changes.

add_change(amdpar_xml, change)[source]

Track another change. This is cognizant of the fact that a single label can have more than one change. Do not add the same change twice (as may occur if both the parent and child are marked as added)

regparser.notice.changes.bad_label(node)[source]

Look through a node label, and return True if it’s a badly formed label. We can do this because we know what type of character should up at what point in the label.

regparser.notice.changes.create_add_amendment(amendment, subpart_label=None)[source]

An amendment comes in with a whole tree structure. We break apart the tree here (this is what flatten does), convert the Node objects to JSON representations. This ensures that each amendment only acts on one node. In addition, this futzes with the change’s field when stars are present.

regparser.notice.changes.create_field_amendment(label, amendment)[source]

If an amendment is changing just a field (text, title) then we don’t need to package the rest of the paragraphs with it. Those get dealt with later, if appropriate.

regparser.notice.changes.create_reserve_amendment(amendment)[source]

Create a RESERVE related amendment.

regparser.notice.changes.create_subpart_amendment(subpart_node)[source]

Create an amendment that describes a subpart. In particular when the list of nodes added gets flattened, each node specifies which subpart it’s part of.

regparser.notice.changes.find_candidate(root, label_last, amended_labels)[source]

Look through the tree for a node that has the same paragraph marker as the one we’re looking for (and also has no children). That might be a mis-parsed node. Because we’re parsing partial sections in the notices, it’s likely we might not be able to disambiguate between paragraph markers.

regparser.notice.changes.find_misparsed_node(section_node, label, change, amended_labels)[source]

Nodes can get misparsed in the sense that we don’t always know where they are in the tree or have their correct label. The first part corrects markerless labeled nodes by updating the node’s label if the source text has been changed to include the markerless paragraph (ex. 123-44-p6 for paragraph 6). we know this because label here is parsed from that change. The second part uses label to find a candidate for a mis-parsed node and creates an appropriate change.

regparser.notice.changes.find_subpart(amdpar_tag)[source]

Look amongst an amdpar tag’s siblings to find a subpart.

regparser.notice.changes.fix_section_node(paragraphs, amdpar_xml)[source]

When notices are corrected, the XML for notices doesn’t follow the normal syntax. Namely, pargraphs aren’t inside section tags. We fix that here, by finding the preceding section tag and appending paragraphs to it.

regparser.notice.changes.flatten_tree(node_list, node)[source]

Flatten a tree, removing all hierarchical information, making a list out of all the nodes.

regparser.notice.changes.format_node(node, amendment, parent_label=None)[source]

Format a node into a dict, and add in amendment information.

regparser.notice.changes.impossible_label(n, amended_labels)[source]

Return True if n is not in the same family as amended_labels.

regparser.notice.changes.match_labels_and_changes(amendments, section_node)[source]

Given the list of amendments, and the parsed section node, match the two so that we’re only changing what’s been flagged as changing. This helps eliminate paragraphs that are just stars for positioning, for example.

regparser.notice.changes.new_subpart_added(amendment)[source]

Return True if label indicates that a new subpart was added

regparser.notice.changes.node_to_dict(node)[source]

Convert a node to a dictionary representation. We skip the children, turning them instead into a list of labels instead.

regparser.notice.changes.resolve_candidates(amend_map, warn=True)[source]

Ensure candidate isn’t actually accounted for elsewhere, and fix it’s label.

regparser.notice.compiler module

Notices indicate how a regulation has changed since the last version. This module contains code to compile a regulation from a notice’s changes.

class regparser.notice.compiler.RegulationTree(previous_tree)[source]

Bases: object

This encapsulates a regulation tree, and methods to change that tree.

static add_child(children, node, order=None)[source]

Add a child to the children, and sort appropriately. This is used for non-root nodes.

add_node(node, parent_label=None)[source]

Add an entirely new node to the regulation tree. Accounts for placeholders, reserved nodes,

add_to_root(node)[source]

Add a child to the root of the tree.

contains(label)[source]

Is this label already in the tree? label can be a list or a string

create_empty_node(node_label)[source]

In rare cases, we need to flush out the tree by adding an empty node. Returns the created node

create_new_subpart(subpart_label)[source]

Create a whole new subpart.

delete(label_id)[source]

Delete the node with label_id from the tree.

delete_from_parent(node)[source]

Delete node from it’s parent, effectively removing it from the tree.

find_node(label)[source]
get_parent(node)[source]

Get the parent of a node. Returns None if parent not found.

insert_in_order(node)[source]

Add a new node, but determine its position in its parent by looking at the siblings’ texts

keep(labels)[source]

The ‘KEEP’ verb tells us that a node should not be removed (generally because it would had we dropped the children of its parent). “Keeping” those nodes makes sure they do not disappear when editing their parent

move(origin, destination)[source]

Move a node from one part in the tree to another.

move_to_subpart(label, subpart_label)[source]

Move an existing node to another subpart. If the new subpart doesn’t exist, create it.

replace_node_and_subtree(node)[source]

Replace an existing node in the tree with node.

replace_node_heading(label, change)[source]

A node’s heading is it’s keyterm. We handle this here, but not well, I think.

replace_node_text(label, change)[source]

Replace just a node’s text.

replace_node_title(label, change)[source]

Replace just a node’s title.

reserve(label_id, node)[source]

Reserve either an existing node (by replacing it) or reserve by adding a new node. When a node is reserved, it’s represented in the FR XML. We simply use that representation here instead of doing something else.

regparser.notice.compiler.compile_regulation(previous_tree, notice_changes)[source]

Given a last full regulation tree, and the set of changes from the next final notice, construct the next full regulation tree.

regparser.notice.compiler.dict_to_node(node_dict)[source]

Convert a dictionary representation of a node into a Node object if it contains the minimum required fields. Otherwise, pass it through unchanged.

regparser.notice.compiler.get_parent_label(node)[source]

Given a node, get the label of it’s parent.

regparser.notice.compiler.is_interp_placeholder(node)[source]

Interpretations may have nodes that exist purely to enforce structure. Knowing if a node is such a placeholder makes it easier to know if a POST should really just modify the existing placeholder.

regparser.notice.compiler.is_reserved_node(node)[source]

Return true if the node is reserved.

regparser.notice.compiler.make_label_sortable(label, roman=False)[source]

Make labels sortable, but converting them as appropriate. For example, “45Ai33b” becomes (45, “A”, “i”, 33, “b”). Also, appendices have labels that look like 30(a), we make those appropriately sortable.

regparser.notice.compiler.make_root_sortable(label, node_type)[source]

Child nodes of the root contain nodes of various types, these need to be sorted correctly. This returns a tuple to help sort these first level nodes.

regparser.notice.compiler.node_text_equality(left, right)[source]

Do these two nodes have the same text fields? Accounts for Nones

regparser.notice.compiler.one_change(reg, label, change)[source]

Notices are generally composed of many changes; this method handles a single change to the tree.

regparser.notice.compiler.overwrite_marker(origin, new_label)[source]

The node passed in has a label, but we’re going to give it a new one (new_label). This is necessary during node moves.

regparser.notice.compiler.replace_first_sentence(text, replacement)[source]

Replace the first sentence in text with replacement. This makes some incredibly simplifying assumptions - so buyer beware.

regparser.notice.compiler.replace_node_field(reg, label, change)[source]

Call one of the field appropriate methods if we’re changing just a field on a node.

regparser.notice.compiler.sort_labels(labels)[source]

Deal with higher up elements first.

regparser.notice.dates module

regparser.notice.dates.fetch_dates(xml)[source]

Pull out any dates (and their types) from the XML. Not all notices have all types of dates, some notices have multiple dates of the same type.

regparser.notice.dates.parse_date_sentence(sentence)[source]

Return the date type + date in this sentence (if one exists).

regparser.notice.diff module

regparser.notice.encoder module

class regparser.notice.encoder.AmendmentEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None)[source]

Bases: json.encoder.JSONEncoder

Custom JSON encoder to handle Amendment objects

default(obj)[source]

regparser.notice.fake module

regparser.notice.sxs module

regparser.notice.sxs.add_spaces_to_title(title)[source]

Federal Register often seems to miss spaces in the title of SxS sections. Make sure spaces get added if appropriate

regparser.notice.sxs.build_section_by_section(sxs, fr_start_page, previous_label)[source]

Given a list of xml nodes in the section by section analysis, pull out hierarchical data into a structure. Previous label is carried along to merge analyses of the same section.

regparser.notice.sxs.find_page(xml, index_line, page_number)[source]

Find the FR page that includes the indexed line

regparser.notice.sxs.find_section_by_section(xml_tree)[source]

Find the section-by-section analysis of this notice

regparser.notice.sxs.is_backtrack(previous_label, next_label)[source]

If we’ve already processes a header with 22(c) in it, we can assume that any following headers with 1111.22 are not supposed to be an analysis of 1111.22

regparser.notice.sxs.is_child_of(child_xml, header_xml, cfr_part, header_citations=None)[source]

Children are paragraphs, have lower ‘source’, the header has citations and the child does not, the citations for header and child are the same or the citation in a child is incorrect

regparser.notice.sxs.parse_into_labels(txt, part)[source]

Find what part+section+(paragraph) (could be multiple) this text is related to.

regparser.notice.sxs.remove_extract(xml_tree)[source]

Occasionally, the paragraphs/etc. useful to us are inside an EXTRACT tag. To normalize, move everything in an EXTRACT tag out

regparser.notice.sxs.split_into_ttsr(sxs, cfr_part)[source]

Split the provided list of xml nodes into a node with a title, a sequence of text nodes, a sequence of nodes associated with the sub sections of this header, and the remaining xml nodes

regparser.notice.util module

regparser.notice.util.body_to_string(xml_node)[source]

Create a string from the text of this node and its children (without the outer tag)

regparser.notice.util.prepost_pend_spaces(el)[source]

FR’s XML doesn’t always add spaces around tags that clearly need them. Account for this by adding spaces around the el where needed.

regparser.notice.util.spaces_then_remove(el, tag_str)[source]

FR’s XML tends to not add spaces where needed, which leads to the removal of tags sometimes smashing together words.

regparser.notice.util.swap_emphasis_tags(el)[source]

FR’s XML uses a different set of tags than the standard we’d like (XHTML). Swap out at needed

regparser.notice.xml module

Module contents