API
CitableDocument
A Citation
's extract_citations()
function relies on a CitableDocument
.
Creates three main reusable lists:
list | concept |
---|---|
@docketed_reports |
list of DocketReportCitation found in the text, excluding exceptional statutory dockets |
@reports |
list of Report found in the text (which may already be included in @docketed_reports ) |
@undocketed_reports |
= @docketed_reports - @reports |
Examples:
>>> text_statutes = "Bar Matter No. 803, Jan. 1, 2000; Bar Matter No. 411, Feb. 1, 2000"
>>> len(CitableDocument(text=text_statutes).docketed_reports) # no citations, since these are 'statutory dockets'
0
>>> text_cites = "374 Phil. 1, 10-11 (1999) 1111 SCRA 1111; G.R. No. 147033, April 30, 2003; G.R. No. 147033, April 30, 2003, 374 Phil. 1, 600; ABC v. XYZ, G.R. Nos. 138570, 138572, 138587, 138680, 138698, October 10, 2000, 342 SCRA 449; XXX, G.R. No. 31711, Sept. 30, 1971, 35 SCRA 190; Hello World, 1111 SCRA 1111; Y v. Z, 35 SCRA 190;"
>>> doc1 = CitableDocument(text=text_cites)
>>> len(doc1.docketed_reports)
4
>>> doc1.undocketed_reports
{'1111 SCRA 1111'}
>>> text = "<em>Gatchalian Promotions Talent Pool, Inc. v. Atty. Naldoza</em>, 374 Phil. 1, 10-11 (1999), citing: <em>In re Almacen</em>, 31 SCRA 562, 600 (1970).; People v. Umayam, G.R. No. 147033, April 30, 2003; <i>Bagong Alyansang Makabayan v. Zamora,</i> G.R. Nos. 138570, 138572, 138587, 138680, 138698, October 10, 2000, 342 SCRA 449; Villegas <em>v.</em> Subido, G.R. No. 31711, Sept. 30, 1971, 41 SCRA 190;"
>>> doc2 = CitableDocument(text=text)
>>> set(doc2.get_citations()) == {'GR No. 147033, Apr. 30, 2003', 'GR No. 138570, Oct. 10, 2000, 342 SCRA 449', 'GR No. 31711, Sep. 30, 1971, 41 SCRA 190', '374 Phil. 1', '31 SCRA 562'}
True
Source code in citation_utils/document.py
Python | |
---|---|
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
Functions
get_citations()
There are two main lists to evaluate:
@docketed_reports
- each includes aDocket
(optionally attached to aReport
)@reports
- from the same text, just getReport
objects.
Can filter out Report
objects not docketed and thus return
a more succinct citation list which includes both constructs mentioned above but
without duplicate reports
.
Source code in citation_utils/document.py
get_docketed_reports(text, exclude_docket_rules=True)
classmethod
Extract from raw
text all raw citations which should include their Docket
and Report
component parts.
This may however include statutory rules since some docket categories like AM and BM use this convention.
To exclude statutory rules, a flag is included as a default.
Examples:
>>> cite = next(CitableDocument.get_docketed_reports("Bagong Alyansang Makabayan v. Zamora, G.R. Nos. 138570, 138572, 138587, 138680, 138698, October 10, 2000, 342 SCRA 449"))
>>> cite.model_dump(exclude_none=True)
{'publisher': 'SCRA', 'volume': '342', 'page': '449', 'volpubpage': '342 SCRA 449', 'context': 'G.R. Nos. 138570, 138572, 138587, 138680, 138698', 'category': 'GR', 'ids': '138570, 138572, 138587, 138680, 138698', 'docket_date': datetime.date(2000, 10, 10)}
>>> statutory_text = "Bar Matter No. 803, Jan. 1, 2000"
>>> next(CitableDocument.get_docketed_reports(statutory_text)) # default
Traceback (most recent call last):
...
StopIteration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Text to look for |
required |
Yields:
Type | Description |
---|---|
DocketReport
|
Iterator[DocketReport]: Any of custom |
Source code in citation_utils/document.py
get_undocketed_reports()
Steps:
- From a set of
uniq_reports
(seeself.reports
); - Compare to reports found in
@docketed_reports
- Limit reports to those without an accompaying docket
Source code in citation_utils/document.py
Docket Model
Bases: BaseModel
The Docket is the modern identifier of a Supreme Court decision.
It is based on a category
, a serial id
, and a date
.
Field | Type | Description |
---|---|---|
context |
optional (str) | Full texted matched by the regex pattern |
category |
optional (DocketCategory) | See docket-category-model |
ids |
optional (str) | The serial number of the docket category |
docket_date |
optional (date) | The date associated with the docket |
Sample Citation | Category | Serial | Date |
---|---|---|---|
G.R. Nos. 138570, October 10, 2000 | GR | 74910 | October 10, 2000 |
A.M. RTJ-12-2317 (Formerly OCA I.P.I. No. 10-3378-RTJ), Jan 1, 2000 | AM | RTJ-12-2317 | Jan 1, 2000 |
A.C. No. 10179 (Formerly CBD 11-2985), March 04, 2014 | AC | 10179 | Mar. 4, 2014 |
The Docket is often paired with a Report, which is the traditional identifier based on volume and page numbers.
Source code in citation_utils/dockets/models/docket_model.py
Attributes
first_id: str
property
Get first bit from list of separated ids, when possible.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
First id found |
serial_text: str
property
From raw ids
, get the cleaned_ids
, and of these cleaned_ids
,
extract the @first_id
found to deal with compound ids, e.g.
ids separated by 'and' and ','
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Singular text identifier |
Docket Category
Docket Category Model
Bases: StrEnum
Common docket references involving Philippine Supreme Court decisions.
Name | Value |
---|---|
GR |
General Register |
AM |
Administrative Matter |
AC |
Administrative Case |
BM |
Bar Matter |
PET |
Presidential Electoral Tribunal |
OCA |
Office of the Court Administrator |
JIB |
Judicial Integrity Board |
UDK |
Undocketed |
Complication: These categories do not always represent decisions. For instance,
there are are AM
and BM
docket numbers that represent rules rather
than decisions.
Source code in citation_utils/dockets/models/docket_category.py
Functions
__repr__()
Uses name of member gr
instead of Enum default
<DocketCategory.GR: 'General Register'>
. It becomes to
use the following conventions:
Examples:
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The value of the Enum name |
Source code in citation_utils/dockets/models/docket_category.py
Docket CitationConstructor
Although the different category docket models share a similar configuration, the regex strings involved are different for each, prompting the need for a preparatory constructor class:
Bases: BaseModel
Prefatorily, regex strings are defined so that a
re.Pattern
object can take advantage of the "group_name"
assigned in the string.
These are the docket styles with regex strings predefined:
- General Register
- Administrative Matter
- Administrative Case
- Bar Matter
- Office of the Court Administrator
- Presidential Electoral Tribunal
- Judicial Integrity Board
- Undocketed Case
The CitationConstructor formalizes the assigned group names into their respective fields.
Relatedly, it takes advantage of
the citation_date
and the citation_report
libraries in
generating the main @pattern
since the regex strings above
are only concerned with the key
num
id
formula part
of the docket, e.g. GR
No.
123
... but not the accompanying
date and report.
Source code in citation_utils/dockets/models/constructor.py
Python | |
---|---|
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
Attributes
key_num_pattern: re.Pattern
property
Unlike full @pattern, this regex compiled object is limited to just the key and number elements, e.g. "GR No. 123" or "BP Blg. 45"
pattern: re.Pattern
property
Construct the regex string and generate a full Pattern object from:
docket_regex
,docket_date
defined in the citation-date library- an optional
REPORT_REGEX
defined in the citation-report library
Returns:
Name | Type | Description |
---|---|---|
Pattern |
re.Pattern
|
Combination of Docket and Report styles. |
Functions
detect(raw)
Logic: if self.init_name
Match group exists, get entire
regex based on self.group_name
, extract subgroups which will
consist of Docket
and Report
parts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
raw |
str
|
Text to evaluate |
required |
Yields:
Type | Description |
---|---|
dict[str, Any]
|
Iterator[dict[str, Any]]: A dict that can fill up a Docket + Report pydantic BaseModel |