regparser.tree.depth package¶

Submodules¶

regparser.tree.depth.derive module¶

class regparser.tree.depth.derive.ParAssignment(typ, idx, depth)¶

Bases: tuple

depth¶: Alias for field number 2

idx¶: Alias for field number 1

typ¶: Alias for field number 0

class regparser.tree.depth.derive.Solution(assignment, weight=1.0)[source]¶

Bases: object

A collection of assignments + a weight for how likely this solution is (after applying heuristics)

copy_with_penalty(penalty)[source]¶: Immutable copy while modifying weight

pretty_str()[source]¶

regparser.tree.depth.derive.debug_idx(marker_list, constraints=None)[source]¶: Binary search through the markers to find the point at which derive_depths no longer works

regparser.tree.depth.derive.derive_depths(original_markers, additional_constraints=None)[source]¶: Use constraint programming to derive the paragraph depths associated with a list of paragraph markers. Additional constraints (e.g. expected marker types, etc.) can also be added. Such constraints are functions of two parameters, the constraint function (problem.addConstraint) and a list of all variables

regparser.tree.depth.heuristics module¶

Set of heuristics for trimming down the set of solutions. Each heuristic works by penalizing a solution; it’s then up to the caller to grab the solution with the least penalties.

regparser.tree.depth.heuristics.prefer_diff_types_diff_levels(solutions, weight=1.0)[source]¶: Dock solutions which have different markers appearing at the same level. This also occurs, but not often.

regparser.tree.depth.heuristics.prefer_multiple_children(solutions, weight=1.0)[source]¶: Dock solutions which have a paragraph with exactly one child. While this is possible, it’s unlikely.

regparser.tree.depth.heuristics.prefer_no_markerless_sandwich(solutions, weight=1.0)[source]¶: Prefer solutions which don’t use MARKERLESS to switch depth, like a MARKERLESS

a

regparser.tree.depth.heuristics.prefer_shallow_depths(solutions, weight=0.1)[source]¶: Dock solutions which have a higher maximum depth

regparser.tree.depth.markers module¶

Namespace for collecting the various types of markers

regparser.tree.depth.markers.deemphasize(marker)[source]¶: Though the knowledge of emphasis is helpful for determining depth, it is _unhelpful_ in other scenarios, where we only care about the plain text. This function removes <E> tags

regparser.tree.depth.markers.emphasize(marker)[source]¶: The final depth levels for regulation text are emphasized, so we keep their <E> tags to distinguish them from previous levels. This function will wrap a marker in an <E> tag

regparser.tree.depth.optional_rules module¶

Depth derivation has a mechanism for _optional_ rules. This module contains a collection of such rules. All functions should accept two parameters; the latter is a list of all variables in the system; the former is a function which can be used to constrain the variables. This allows us to define rules over subsets of the variables rather than all of them, should that make our constraints more useful

regparser.tree.depth.optional_rules.depth_type_inverses(constrain, all_variables)[source]¶: If paragraphs are at the same depth, they must share the same type. If paragraphs are the same type, they must share the same depth

regparser.tree.depth.optional_rules.limit_paragraph_types(*p_types)[source]¶: Constraint paragraphs to a limited set of paragraph types. This can reduce the search space if we know (for example) that the text comes from regulations and hence does not have capitalized roman numerals

regparser.tree.depth.optional_rules.limit_sequence_gap(size=0)[source]¶: We’ve loosened the rules around sequences of paragraphs so that paragraphs can be skipped. This allows arbitrary tightening of that rule, effectively allowing gaps of a limited size

regparser.tree.depth.optional_rules.star_new_level(constrain, all_variables)[source]¶: STARS should never have subparagraphs as it’d be impossible to determine where in the hierarchy these subparagraphs belong. @todo: This _probably_ should be a general rule, but there’s a test that this breaks in the interpretations. Revisit with CFPB regs

regparser.tree.depth.optional_rules.stars_occupy_space(constrain, all_variables)[source]¶: Star markers can’t be ignored in sequence, so 1, *, 2 doesn’t make sense for a single level, unless it’s an inline star. In the inline case, we can think of it as 1, intro-text-to-1, 2

regparser.tree.depth.pair_rules module¶

Rules relating to two paragraph markers in sequence. The rules are “positive” in the sense that each allows for a particular scenario (rather than denying all other scenarios). They combine in the eponymous function, where, if any of the rules return True, we pass. Otherwise, we fail.

class regparser.tree.depth.pair_rules.MarkerAssignment[source]¶

Bases: regparser.tree.depth.pair_rules.MarkerAssignment

is_inline_stars()[source]¶: Inline stars (* * *) often behave quite differently from both STARS and other markers.

is_markerless()[source]¶: We will often check whether an assignment is MARKERLESS. This function makes that clearer

is_stars()[source]¶: We will often check whether an assignment is either STARS or inline stars (* * *). This function makes that clearer

regparser.tree.depth.pair_rules.continuing_seq(prev, curr)[source]¶: E.g. “d, e” is good, but “e, d” is not. We also want to allow some paragraphs to be skipped, e.g. “d, g”

regparser.tree.depth.pair_rules.decreasing_stars(prev, curr)[source]¶: Two stars in a row can exist if the second is shallower than the first

regparser.tree.depth.pair_rules.decrement_depth(prev, curr)[source]¶: Decrementing depth is okay unless we’re using inline stars

regparser.tree.depth.pair_rules.marker_star_level(prev, curr)[source]¶: Allow a marker to be followed by stars if those stars are deeper. If not inline, also allow the stars to be at the same depth

regparser.tree.depth.pair_rules.markerless_same_level(prev, curr)[source]¶: Markerless paragraphs can be followed by any type on the same level as long as that’s beginning a new sequence

regparser.tree.depth.pair_rules.new_sequence(prev, curr)[source]¶: Allow depth to be incremented if starting a new sequence

regparser.tree.depth.pair_rules.pair_rules(prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶: Combine all of the above rules

regparser.tree.depth.pair_rules.paragraph_markerless(prev, curr)[source]¶: A non-markerless paragraph followed by a markerless paragraph can be one level deeper

regparser.tree.depth.pair_rules.same_level_stars(prev, curr)[source]¶: Two stars in a row can exist on the same level if the previous is inline

regparser.tree.depth.pair_rules.star_marker_level(prev, curr)[source]¶: Allow markers to be on the same level as a preceding star

regparser.tree.depth.rules module¶

Namespace for constraints on paragraph depth discovery.

For the purposes of this module a “symmetry” refers to two perfectly valid solutions to a problem whose differences are irrelevant. For example, if the distinctions between a vs. a STARS STARS may not matter if we’re planning to ignore the final STARS anyway. To “break” this symmetry, we explicitly reject one solution; this reduces the number of permutations we care about dramatically.

regparser.tree.depth.rules.ancestors(all_prev)[source]¶: Given an assignment of values, construct a list of the relevant parents, e.g. 1, i, a, ii, A gives us 1, ii, A

regparser.tree.depth.rules.continue_previous_seq(typ, idx, depth, *all_prev)[source]¶: Constrain the current marker based on all markers leading up to it

regparser.tree.depth.rules.depth_type_order(order)[source]¶: Create a function which constrains paragraphs depths to a particular type sequence. For example, we know a priori what regtext and interpretation markers’ order should be. Adding this constrain speeds up solution finding.

regparser.tree.depth.rules.marker_stars_markerless_symmetry(pprev_typ, pprev_idx, pprev_depth, prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶

When we have the following symmetry:: a a a

STARS vs. STARS vs. STARS MARKERLESS MARKERLESS MARKERLESS

Prefer the middle

regparser.tree.depth.rules.markerless_stars_symmetry(pprev_typ, pprev_idx, pprev_depth, prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶

Given MARKERLESS, STARS, MARKERLESS want to break these symmetries:

MARKERLESS MARKERLESS STARS vs. STARS MARKERLESS MARKERLESS

Here, we don’t really care about the distinction, so we’ll opt for the former.

regparser.tree.depth.rules.must_be(value)[source]¶: A constraint that the given variable must matches the value.

regparser.tree.depth.rules.same_parent_same_type(*all_vars)[source]¶: All markers in the same parent should have the same marker type. Exceptions for:

STARS, which can appear at any level Sequences which _begin_ with markerless paragraphs

regparser.tree.depth.rules.star_sandwich_symmetry(pprev_typ, pprev_idx, pprev_depth, prev_typ, prev_idx, prev_depth, typ, idx, depth)[source]¶

Symmetry breaking constraint that places STARS tag at specific depth so that the resolution of

c

? ? ? ? ? ? <- Potential STARS depths 5

can only be one of: OR

c c STARS STARS

5 5 Stars also cannot be used to skip a level (similar to markerless sandwich, above)

regparser.tree.depth.rules.triplet_tests(*triplet_seq)[source]¶: Run propositions around a sequence of three markers. We combine them here so that they act as a single constraint

regparser.tree.depth.rules.type_match(marker)[source]¶: The type of the associated variable must match its marker. Lambda explanation as in the above rule.