Parsing New Rules¶
Regulations are published, in full, annually; we rely on these annual editions to “synchronize” entire CFR parts. This works well when looking at the history of a regulation assuming that it has at most one change per year. When multiple final rules affect a single CFR part in a single year and when a new final rule has been issued, we don’t have access to a canonical, entire regulation. To account for these situations, we have a parser for final rules, which attempts to figure out what section/paragraphs/etc. are changing and apply those changes to the previous version of the regulation to derive a new version.
Unfortunately, the changes are not encoded in a machine readable format, so the parser makes a best-effort, but tends to fall a bit short. In this document, we’ll discuss what to expect from the parser and how to resolve common difficulties.
Fetching the Rule¶
Running the pipeline
command will generally pull down and attempt to parse
the relevant annual editions and final rules. It caches its results for a few
days, so if a rule has only recently hit the Federal Register, you may need to
run:
eregs clear
After running pipeline
, you should see a version associated with the new
rule in your output. If not, verify that the final rule is present on the
Federal Register (our source of final
rules). Looking in the right-hand column, you should find meta data associated
with the final rule’s publication date, effective date, entry type (must be
“Rule”), and CFR references. If one of those fields is not present and you
believe this to be in error, file a ticket on federalregister.gov’s
support page.
It’s possible that running the pipeline
causes an error. If you are
familiar with Python, try running eregs --debug pipeline
with the same
parameters to get additional debugging output and to drop into a debugger at
the point of error. Please
file an issue and we
will see if we can recreate the problem.
Viewing the Diff¶
Generally, eRegs will be able to create an appropriate version, but won’t have found all of the appropriate changes. To make the verification process a bit easier, send the output to an instance of eRegs’ UI. You can navigate to the “diff” view and compare the new rule to the previous version; the UI will highlight sections with changed text and tell you where it thinks changes have occurred. Open this view in conjunction with the text of the final rule and verify that the appropriate changes have been made.
We can also view more raw output representing the changes by investigating the
output associated with notices
. Run pipeline
and send the results to a
part of the file system, e.g.:
eregs pipeline 11 222 /tmp/eregs-output
and then inspect the /tmp/eregs-output/notice
directory for a JSON file
corresponding to the new rule. This data structure will contain keys
associated with amendments
(describing how the regulation is changing)
and changes
(describing the content of those changes).
Editing the Rule¶
Odds are that the parser did not pick up all of the changes present in the final rule. We can tweak the text of the rule to match align with the parser’s expectations.
File Location¶
For initial edits, it’ll make sense to modify the files directly within the
index. These edits will trigger a rebuild on successive pipeline
runs, but
will be erased should the clear
command ever be executed. To test out
minor edits, modify the appropriate file in .eregs_index/notice_xml
.
Once you would like to make those changes more permanent, we recommend you fork and checkout our shared notice-xml repository. Copy the final rule’s XML (attainable via the “Dev” link from the Federal Register’s UI) into a directory matching the structure.
For example, final rule
2014-18842 is represented by this XML: https://www.federalregister.gov/articles/xml/201/418/842.xml. To modify that, we’d save that XML file into fr-notices/articles/xml/201/418/842.xml
.
We recommend committing this file in its original form to make it easy for future developers to understand what’s changed. In any event, you’ll need to inform the parser to look for your new file. To do so,
eregs clear # remove the downloaded reference
echo 'LOCAL_XML_PATHS = ["path/to/fr-notices/"]' >> local_settings.py
Then re-run pipeline. This will alert the parser of the file’s presence. You
will only need to re-run the pipeline
command on successive edits.
When all is said and done, we request you make a pull request to the shared
fr-notices
repository, which gets downloaded automatically by the parser.
Amendments¶
The complications around final rules arise largely from the amendment
instructions (indicated by the AMDPAR
tags in the XML). Unfortunately, we
must attempt to parse these instructions, lest we will not know if paragraphs
have been deleted, moved, etc. The AMDParsing
logic attempts to find
appropriate verbs (“revise”, “correct”, “add”, “remove”, “reserve”,
“designate”, etc.) and the paragraphs associated with those actions. So, the
parser would understand an amendment like:
Section 1026.35 is amended by revising paragraph (b) introductory text,
adding new paragraph (b)(2), and removing paragraph (c).
In particular, it’d parse out as something like:
Context: 1026.35
Verb(PUT): amended, revising
Paragraph: 1026.35(b) introductory text
Verb(POST): adding
Paragraph: 1026.35(b)(2)
Verb(DELETE): removing
Paragraph: 1026.35(c)
We do not currently recognize concepts such as distinct sentences or specific words within a paragraph, so amendment instructions to “amend the fifth sentence” or “remove the last semicolon” cannot be understood. In these situations, it makes more sense to replace the text with something along the likes of “revise paragraph (b)” and include the entirety of the paragraph (rather than the single sentence, etc.).
We have also constructed two “artificial” amendment instructions to make this process easier.
[insert-in-order]
acts as a verb, indicating that the paragraph should be inserted in textual order (rather than by looking at the paragraph marker). This is particularly useful for modifications to definitions (which often do not contain paragraph markers).[label:111-22-c]
acts as a very well defined paragraph. We can specifically target any paragraph this way for modification. Certain paragraphs are best defined by a specific keyterm or definition associated with them (rather than a paragraph marker). In these scenarios, we have a special syntax:[label:111-22-keyterm(Special Term Here)]