regparser.notice package¶
Submodules¶
regparser.notice.address module¶
regparser.notice.build module¶
regparser.notice.build_appendix module¶
regparser.notice.build_interp module¶
regparser.notice.changes module¶
This module contains functions to help parse the changes in a notice. Changes are the exact details of how the pargraphs, sections etc. in a regulation have changed.
-
class
regparser.notice.changes.
Change
(label_id, content)¶ Bases:
tuple
-
content
¶ Alias for field number 1
-
label_id
¶ Alias for field number 0
-
-
regparser.notice.changes.
bad_label
(node)[source]¶ Look through a node label, and return True if it’s a badly formed label. We can do this because we know what type of character should up at what point in the label.
-
regparser.notice.changes.
create_add_amendment
(amendment, subpart_label=None)[source]¶ An amendment comes in with a whole tree structure. We break apart the tree here (this is what flatten does), convert the Node objects to JSON representations. This ensures that each amendment only acts on one node. In addition, this futzes with the change’s field when stars are present.
-
regparser.notice.changes.
create_field_amendment
(label, amendment)[source]¶ If an amendment is changing just a field (text, title) then we don’t need to package the rest of the paragraphs with it. Those get dealt with later, if appropriate.
-
regparser.notice.changes.
create_reserve_amendment
(amendment)[source]¶ Create a RESERVE related amendment.
-
regparser.notice.changes.
create_subpart_amendment
(subpart_node)[source]¶ Create an amendment that describes a subpart. In particular when the list of nodes added gets flattened, each node specifies which subpart it’s part of.
-
regparser.notice.changes.
find_candidate
(root, label_last, amended_labels)[source]¶ Look through the tree for a node that has the same paragraph marker as the one we’re looking for (and also has no children). That might be a mis-parsed node. Because we’re parsing partial sections in the notices, it’s likely we might not be able to disambiguate between paragraph markers.
-
regparser.notice.changes.
find_misparsed_node
(section_node, label, change, amended_labels)[source]¶ Nodes can get misparsed in the sense that we don’t always know where they are in the tree or have their correct label. The first part corrects markerless labeled nodes by updating the node’s label if the source text has been changed to include the markerless paragraph (ex. 123-44-p6 for paragraph 6). we know this because label here is parsed from that change. The second part uses label to find a candidate for a mis-parsed node and creates an appropriate change.
-
regparser.notice.changes.
find_subpart
(amdpar_tag)[source]¶ Look amongst an amdpar tag’s siblings to find a subpart.
-
regparser.notice.changes.
fix_section_node
(paragraphs, amdpar_xml)[source]¶ When notices are corrected, the XML for notices doesn’t follow the normal syntax. Namely, pargraphs aren’t inside section tags. We fix that here, by finding the preceding section tag and appending paragraphs to it.
-
regparser.notice.changes.
flatten_tree
(node_list, node)[source]¶ Flatten a tree, removing all hierarchical information, making a list out of all the nodes.
-
regparser.notice.changes.
format_node
(node, amendment, parent_label=None)[source]¶ Format a node into a dict, and add in amendment information.
-
regparser.notice.changes.
impossible_label
(n, amended_labels)[source]¶ Return True if n is not in the same family as amended_labels.
-
regparser.notice.changes.
match_labels_and_changes
(amendments, section_node)[source]¶ Given the list of amendments, and the parsed section node, match the two so that we’re only changing what’s been flagged as changing. This helps eliminate paragraphs that are just stars for positioning, for example.
-
regparser.notice.changes.
new_subpart_added
(amendment)[source]¶ Return True if label indicates that a new subpart was added
regparser.notice.compiler module¶
Notices indicate how a regulation has changed since the last version. This module contains code to compile a regulation from a notice’s changes.
-
class
regparser.notice.compiler.
RegulationTree
(previous_tree)[source]¶ Bases:
object
This encapsulates a regulation tree, and methods to change that tree.
-
static
add_child
(children, node, order=None)[source]¶ Add a child to the children, and sort appropriately. This is used for non-root nodes.
-
add_node
(node, parent_label=None)[source]¶ Add an entirely new node to the regulation tree. Accounts for placeholders, reserved nodes,
-
create_empty_node
(node_label)[source]¶ In rare cases, we need to flush out the tree by adding an empty node. Returns the created node
-
delete_from_parent
(node)[source]¶ Delete node from it’s parent, effectively removing it from the tree.
-
insert_in_order
(node)[source]¶ Add a new node, but determine its position in its parent by looking at the siblings’ texts
-
keep
(labels)[source]¶ The ‘KEEP’ verb tells us that a node should not be removed (generally because it would had we dropped the children of its parent). “Keeping” those nodes makes sure they do not disappear when editing their parent
-
move_to_subpart
(label, subpart_label)[source]¶ Move an existing node to another subpart. If the new subpart doesn’t exist, create it.
-
static
-
regparser.notice.compiler.
compile_regulation
(previous_tree, notice_changes)[source]¶ Given a last full regulation tree, and the set of changes from the next final notice, construct the next full regulation tree.
-
regparser.notice.compiler.
dict_to_node
(node_dict)[source]¶ Convert a dictionary representation of a node into a Node object if it contains the minimum required fields. Otherwise, pass it through unchanged.
-
regparser.notice.compiler.
get_parent_label
(node)[source]¶ Given a node, get the label of it’s parent.
-
regparser.notice.compiler.
is_interp_placeholder
(node)[source]¶ Interpretations may have nodes that exist purely to enforce structure. Knowing if a node is such a placeholder makes it easier to know if a POST should really just modify the existing placeholder.
-
regparser.notice.compiler.
make_label_sortable
(label, roman=False)[source]¶ Make labels sortable, but converting them as appropriate. For example, “45Ai33b” becomes (45, “A”, “i”, 33, “b”). Also, appendices have labels that look like 30(a), we make those appropriately sortable.
-
regparser.notice.compiler.
make_root_sortable
(label, node_type)[source]¶ Child nodes of the root contain nodes of various types, these need to be sorted correctly. This returns a tuple to help sort these first level nodes.
-
regparser.notice.compiler.
node_text_equality
(left, right)[source]¶ Do these two nodes have the same text fields? Accounts for Nones
-
regparser.notice.compiler.
one_change
(reg, label, change)[source]¶ Notices are generally composed of many changes; this method handles a single change to the tree.
-
regparser.notice.compiler.
overwrite_marker
(origin, new_label)[source]¶ The node passed in has a label, but we’re going to give it a new one (new_label). This is necessary during node moves.
-
regparser.notice.compiler.
replace_first_sentence
(text, replacement)[source]¶ Replace the first sentence in text with replacement. This makes some incredibly simplifying assumptions - so buyer beware.
regparser.notice.dates module¶
regparser.notice.diff module¶
regparser.notice.encoder module¶
regparser.notice.fake module¶
regparser.notice.sxs module¶
-
regparser.notice.sxs.
add_spaces_to_title
(title)[source]¶ Federal Register often seems to miss spaces in the title of SxS sections. Make sure spaces get added if appropriate
-
regparser.notice.sxs.
build_section_by_section
(sxs, fr_start_page, previous_label)[source]¶ Given a list of xml nodes in the section by section analysis, pull out hierarchical data into a structure. Previous label is carried along to merge analyses of the same section.
-
regparser.notice.sxs.
find_page
(xml, index_line, page_number)[source]¶ Find the FR page that includes the indexed line
-
regparser.notice.sxs.
find_section_by_section
(xml_tree)[source]¶ Find the section-by-section analysis of this notice
-
regparser.notice.sxs.
is_backtrack
(previous_label, next_label)[source]¶ If we’ve already processes a header with 22(c) in it, we can assume that any following headers with 1111.22 are not supposed to be an analysis of 1111.22
-
regparser.notice.sxs.
is_child_of
(child_xml, header_xml, cfr_part, header_citations=None)[source]¶ Children are paragraphs, have lower ‘source’, the header has citations and the child does not, the citations for header and child are the same or the citation in a child is incorrect
-
regparser.notice.sxs.
parse_into_labels
(txt, part)[source]¶ Find what part+section+(paragraph) (could be multiple) this text is related to.
regparser.notice.util module¶
-
regparser.notice.util.
body_to_string
(xml_node)[source]¶ Create a string from the text of this node and its children (without the outer tag)
-
regparser.notice.util.
prepost_pend_spaces
(el)[source]¶ FR’s XML doesn’t always add spaces around tags that clearly need them. Account for this by adding spaces around the el where needed.
-
regparser.notice.util.
spaces_then_remove
(el, tag_str)[source]¶ FR’s XML tends to not add spaces where needed, which leads to the removal of tags sometimes smashing together words.
FR’s XML uses a different set of tags than the standard we’d like (XHTML). Swap out at needed