Serialized Statutes
Serial Pattern
Bases: BasePattern
A Rule
can be extracted from a SerialPattern
. The word serial
is employed because the documents representing rules are numbered consecutively.
Each serial pattern refers to a Statute Category
,
e.g. RA
, CA
, etc. matched with a
Serial Identifier
.
Since a SerialPattern
inherits from a BasePattern, it includes
other fields declared in the latter model: matches
and excludes
bringing the
total number of fields to 5, viz.:
Field | Description | Example |
---|---|---|
cat |
Statute Category |
StatuteSerialCategory.RepublicAct |
regex_bases |
How do we pattern the category name? | ["r.a. no.", "Rep. Act. No."] |
regex_serials |
What digits are allowed | ["386", "11114"] |
matches |
Usable in parametized tests to determine whether the pattern declared matches the samples | ["Republic Act No. 7160", "R.A. 386 and 7160" ] |
excludes |
Usable in parametized tests to determine that the full pattern will not match | ["Republic Act No. 7160:", "RA 9337-"] |
Source code in statute_patterns/models.py
Attributes
lines: Iterator[str]
property
Each regex string produced matches the serial rule. Note the line break
needs to be retained so that when printing @regex
, the result is organized.
Serial Pattern Collection
Bases: BaseCollection
Each category-based, serial-numbered, legal title will have a regex string, e.g. Republic Act is a category, a serial number for this category is 386 representing the Philippine Civil Code.
Source code in statute_patterns/models.py
Functions
extract_rules(text)
Each m
, a python Match object, represents a
serial pattern category with possible ambiguous identifier found.
So running m.group(0)
should yield the entire text of the
match which consists of (a) the definitive category;
and (b) the ambiguous identifier.
The identifier is ambiguous because it may be a compound one, e.g. 'Presidential Decree No. 1 and 2'. In this case, there should be 2 matches produced not just one.
This function splits the identifier by commas ,
and the
word and
to get the individual component identifiers.