API
CitableDocument
A Citation
's extract_citations()
function relies on a CitableDocument
.
Creates three main reusable lists:
list | concept |
---|---|
@docketed_reports |
list of DocketReportCitation found in the text, excluding exceptional statutory dockets |
@reports |
list of Report found in the text (which may already be included in @docketed_reports ) |
@undocketed_reports |
= @docketed_reports - @reports |
Examples:
>>> text_statutes = "Bar Matter No. 803, Jan. 1, 2000; Bar Matter No. 411, Feb. 1, 2000"
>>> len(CitableDocument(text=text_statutes).docketed_reports) # no citations, since these are 'statutory dockets'
0
>>> text_cites = "374 Phil. 1, 10-11 (1999) 1111 SCRA 1111; G.R. No. 147033, April 30, 2003; G.R. No. 147033, April 30, 2003, 374 Phil. 1, 600; ABC v. XYZ, G.R. Nos. 138570, 138572, 138587, 138680, 138698, October 10, 2000, 342 SCRA 449; XXX, G.R. No. 31711, Sept. 30, 1971, 35 SCRA 190; Hello World, 1111 SCRA 1111; Y v. Z, 35 SCRA 190;"
>>> doc1 = CitableDocument(text=text_cites)
>>> len(doc1.docketed_reports)
4
>>> doc1.undocketed_reports
{'1111 SCRA 1111'}
>>> text = "<em>Gatchalian Promotions Talent Pool, Inc. v. Atty. Naldoza</em>, 374 Phil. 1, 10-11 (1999), citing: <em>In re Almacen</em>, 31 SCRA 562, 600 (1970).; People v. Umayam, G.R. No. 147033, April 30, 2003; <i>Bagong Alyansang Makabayan v. Zamora,</i> G.R. Nos. 138570, 138572, 138587, 138680, 138698, October 10, 2000, 342 SCRA 449; Villegas <em>v.</em> Subido, G.R. No. 31711, Sept. 30, 1971, 41 SCRA 190;"
>>> doc2 = CitableDocument(text=text)
>>> set(doc2.get_citations()) == {'GR No. 147033, Apr. 30, 2003', 'GR No. 138570, Oct. 10, 2000, 342 SCRA 449', 'GR No. 31711, Sep. 30, 1971, 41 SCRA 190', '374 Phil. 1', '31 SCRA 562'}
True
Source code in src/citation_utils/document.py
Python | |
---|---|
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
Functions
get_citations()
There are two main lists to evaluate:
@docketed_reports
- each includes aDocket
(optionally attached to aReport
)@reports
- from the same text, just getReport
objects.
Can filter out Report
objects not docketed and thus return
a more succinct citation list which includes both constructs mentioned above but
without duplicate reports
.
Source code in src/citation_utils/document.py
get_docketed_reports(text, exclude_docket_rules=True)
classmethod
Extract from raw
text all raw citations which should include their Docket
and Report
component parts.
This may however include statutory rules since some docket categories like AM and BM use this convention.
To exclude statutory rules, a flag is included as a default.
Examples:
>>> cite = next(CitableDocument.get_docketed_reports("Bagong Alyansang Makabayan v. Zamora, G.R. Nos. 138570, 138572, 138587, 138680, 138698, October 10, 2000, 342 SCRA 449"))
>>> cite.model_dump(exclude_none=True)
{'publisher': 'SCRA', 'volume': '342', 'page': '449', 'context': 'G.R. Nos. 138570, 138572, 138587, 138680, 138698', 'category': 'GR', 'ids': '138570, 138572, 138587, 138680, 138698', 'docket_date': datetime.date(2000, 10, 10)}
>>> statutory_text = "Bar Matter No. 803, Jan. 1, 2000"
>>> next(CitableDocument.get_docketed_reports(statutory_text)) # default
Traceback (most recent call last):
...
StopIteration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Text to look for |
required |
Yields:
Type | Description |
---|---|
DocketReport
|
Iterator[DocketReport]: Any of custom |
Source code in src/citation_utils/document.py
get_undocketed_reports()
Steps:
- From a set of
uniq_reports
(seeself.reports
); - Compare to reports found in
@docketed_reports
- Limit reports to those without an accompaying docket
Source code in src/citation_utils/document.py
Docket Model
Bases: BaseModel
The Docket
is the modern identifier of a Supreme Court decision. This data structure
however is not the final form of the identifier since that description belongs to the Citation
and the CountedCitation
.
The purpose of this intermediate structure is that a Docket
is often paired with a Report
, which
is the traditional identifier based on volume and page numbers. The pairing however is not
mandatory, thus needed flexibility to create structures with the following combinations of
the eventual Citation object:
Citation | Docket | Report |
---|---|---|
has both docket and report | yes | yes |
only a docket | yes | no |
only a report | no | yes |
See docket_citation.DocketReportCitation to see structure of paired content.
A Docket
is based on a category
, a serial id
, and a date
. Since the serial id
may required
Field | Type | Description |
---|---|---|
context |
optional (str) | Full text matched by the regex pattern |
category |
optional (DocketCategory) | Whether GR, AC, etc. |
ids |
optional (str) | The serial number of the docket category |
docket_date |
optional (date) | The date associated with the docket |
Sample Citation | Category | Serial | Date |
---|---|---|---|
G.R. Nos. 138570, October 10, 2000 | GR | 74910 | October 10, 2000 |
A.M. RTJ-12-2317 (Formerly OCA I.P.I. No. 10-3378-RTJ), Jan 1, 2000 | AM | RTJ-12-2317 | Jan 1, 2000 |
A.C. No. 10179 (Formerly CBD 11-2985), March 04, 2014 | AC | 10179 | Mar. 4, 2014 |
Source code in src/citation_utils/dockets/models/docket_model.py
Python | |
---|---|
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
Attributes
first_id: str
property
Get first bit from list of separated ids, when possible.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
First id found |
serial_text: str
property
From raw ids
, get the cleaned_ids
, and of these cleaned_ids
,
extract the @first_id
found to deal with compound ids, e.g.
ids separated by 'and' and ','
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Singular text identifier |
Functions
check_serial_num(text)
classmethod
If a serial number exists, ensure it meets criteria prior to row creation.
clean_serial(text)
classmethod
Criteria:
- Must be lowercased
- Characters that can be included
a-z
,0-9
,-
- Must only contain a single alpha-numeric reference
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Raw text to clean |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str | None
|
Cleaned serial text fit for database input. |
Source code in src/citation_utils/dockets/models/docket_model.py
Docket Category
Docket Category Model
Bases: StrEnum
Common docket references involving Philippine Supreme Court decisions.
Name | Value |
---|---|
GR |
General Register |
AM |
Administrative Matter |
AC |
Administrative Case |
BM |
Bar Matter |
PET |
Presidential Electoral Tribunal |
OCA |
Office of the Court Administrator |
JIB |
Judicial Integrity Board |
UDK |
Undocketed |
Complications
Legacy rules
These categories do not always represent decisions. For instance,
there are are AM
and BM
docket numbers that represent rules rather
than decisions.
Redocketed numbers
From the Supreme Court Stylebook (p. 159, 2024):
11.3.1. Redocketed numbers
Some cases may have an undocketed (UDK) number and may be redocketed and assigned a General Register (G.R.) number upon payment of the required docket fees. Still other cases may have a docket number starting with OCA IPI or JIB and may be redocketed as Administrative Matters (A.M.), while Commission on Bar Discipline (CBD) cases may be redocketed as Administrative Cases (A.C.). These must still be reflected in all court resolutions, orders, and decisions. x x x
Source code in src/citation_utils/dockets/models/docket_category.py
Functions
__repr__()
Uses name of member gr
instead of Enum default
<DocketCategory.GR: 'General Register'>
. It becomes to
use the following conventions:
Examples:
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The value of the Enum name |
Source code in src/citation_utils/dockets/models/docket_category.py
Docket CitationConstructor
Although the different category docket models share a similar configuration, the regex strings involved are different for each, prompting the need for a preparatory constructor class:
Bases: BaseModel
Prefatorily, regex strings are defined so that a
re.Pattern
object can take advantage of the "group_name"
assigned in the string.
These are the docket styles with regex strings predefined:
- General Register
- Administrative Matter
- Administrative Case
- Bar Matter
- Office of the Court Administrator
- Presidential Electoral Tribunal
- Judicial Integrity Board
- Undocketed Case
The CitationConstructor formalizes the assigned group names into their respective fields.
Relatedly, it takes advantage of
the citation_date
and the citation_report
libraries in
generating the main @pattern
since the regex strings above
are only concerned with the key
num
id
formula part
of the docket, e.g. GR
No.
123
... but not the accompanying
date and report.
Source code in src/citation_utils/dockets/models/constructor.py
Python | |
---|---|
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
Attributes
key_num_pattern: re.Pattern
property
Unlike full @pattern, this regex compiled object is limited to just the key and number elements, e.g. "GR No. 123" or "BP Blg. 45"
pattern: re.Pattern
property
Construct the regex string and generate a full Pattern object from:
docket_regex
,docket_date
defined in the citation-date library- an optional
REPORT_REGEX
defined in the citation-report library
Returns:
Name | Type | Description |
---|---|---|
Pattern |
Pattern
|
Combination of Docket and Report styles. |
Functions
detect(raw)
Logic: if self.init_name
Match group exists, get entire
regex based on self.group_name
, extract subgroups which will
consist of Docket
and Report
parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw |
str
|
Text to evaluate |
required |
Yields:
Type | Description |
---|---|
dict[str, Any]
|
Iterator[dict[str, Any]]: A dict that can fill up a Docket + Report pydantic BaseModel |