Citation Date
Concept
This is a regex date formula and decoder for dates in Philippine citations based on the following constraints:
- Limit allowed years:
1900 - 2299
- Use regular days:
1 - 31
- Allow both traditional and unorthodox expression of months:
Jan.
Dec.
mar
july
Sept
- Capture different date formats:
- UK format:
day month, year
- US format:
month day, year
- UK format:
- Handle typographic issues, e.g. lacking space
Dec1,2000
This is a dependency (to make it easier to test regex strings) referenced in the Report
of citation-report; and the Docket
of citation-docket. These two libraries are, in turn, dependencies of citation-utils. The citation-
libraries are intended to parse long-form court decisions and documents that contain Philippine Supreme Court citations.
Report Regex
An example of a Report
(referring to a reporter / publisher citation)
containing a date is "1 SCRA 200 <date>
". See citation-report
library on
how the report_date
group name of a matched regex expression can be extracted
from a piece of text.
Examples:
>>> from citation_date import REPORT_DATE_REGEX, decode_date
>>> import re
>>> pattern = re.compile(REPORT_DATE_REGEX, re.I | re.X) # note flags
>>> text = "1 SCRA 200 (1Dec. 2000)" # this is what a report looks like
>>> sample_match = pattern.search(text)
>>> sample_match.group("report_date")
"(1Dec. 2000)"
>>> decode_date(sample_match.group("report_date")) # use the regex group name
"2000-12-01"
Docket Regex
An example of a Docket number containing a date is "G.R. No. 12345, <date>
".
See citation-docket
library on how the docket_date
group name
of a matched regex expression can be extracted from a piece of text.
Examples:
>>> from citation_date import DOCKET_DATE_REGEX
>>> import re
>>> pattern = re.compile(DOCKET_DATE_REGEX, re.I | re.X) # note flags
>>> text = "G.R. No. 12345, Dec,1, 2000" # this is what a docket looks like
>>> sample_match = pattern.search(text)
>>> sample_match.group("docket_date")
"Dec,1, 2000"
>>> decode_date(sample_match.group("docket_date")) # use the regex group name
"December 01, 2000"
Group Name: docket_date
The regular expression that is constructed will include a group name
(see (?<docket_date>...)
). This means that DOCKET_DATE_REGEX
can be combined with
a future regex expression and when the match occurs for the docket date, that match
will be accessible through the group name.
from citation_date import DOCKET_DATE_REGEX
import pprint
pprint.pprint(DOCKET_DATE_REGEX)
(
"\n"
" (?P<docket_date>\n"
" \n"
"(\n"
" (\n"
" (?:\n"
" Jan(?:uary)?|\n"
" Feb(?:ruary)?|\n"
" Mar(?:ch)?|\n"
" Apr(?:il)?|\n"
" May|\n"
" Jun(?:e)?|\n"
" Jul(?:y)?|\n"
" Aug(?:ust)?|\n"
" Sep(?:tember)?|\n"
" Sept|\n"
" Oct(?:ober)?|\n"
" (Nov|Dec)(?:ember)?\n"
" )\n"
")\n"
"\n"
" [,\\.\\s]*\n"
" \n"
" (\n"
" ( \n"
" ([0]?[1-9])| # 01-09\n"
" ([1-2][0-9])| # 10-29\n"
" (3[01]) # 30-31\n"
" )\n"
" )\n"
"\n"
" [,\\.\\s]*\n"
" \n"
" (\n"
" 19[0-9][0-9]| # 1900 to 1999\n"
" 2[0-2][0-9][0-9] # 2000 to 2299\n"
" )\n"
" \\b # ends with the last digit of the year\n"
"\n"
")\n"
"|\n"
"(\n"
" \n"
" (\n"
" ( \n"
" ([0]?[1-9])| # 01-09\n"
" ([1-2][0-9])| # 10-29\n"
" (3[01]) # 30-31\n"
" )\n"
" )\n"
"\n"
" [,\\.\\s]*\n"
" (\n"
" (?:\n"
" Jan(?:uary)?|\n"
" Feb(?:ruary)?|\n"
" Mar(?:ch)?|\n"
" Apr(?:il)?|\n"
" May|\n"
" Jun(?:e)?|\n"
" Jul(?:y)?|\n"
" Aug(?:ust)?|\n"
" Sep(?:tember)?|\n"
" Sept|\n"
" Oct(?:ober)?|\n"
" (Nov|Dec)(?:ember)?\n"
" )\n"
")\n"
"\n"
" [,\\.\\s]*\n"
" \n"
" (\n"
" 19[0-9][0-9]| # 1900 to 1999\n"
" 2[0-2][0-9][0-9] # 2000 to 2299\n"
" )\n"
" \\b # ends with the last digit of the year\n"
"\n"
")\n"
"\n"
" )\n"
)
Docket Date Format
Utilizes a uniform docket format of %b. %d, %Y
, e.g. Jan. 2, 1994, for dates to
be usable downstream.
decode_date()
Given a piece of text, extract the date found using the specific constraints of Philippine citations.
Examples:
>>> text = "G.R. No. 12345, Dec,1, 2000"
>>> decode_date(text)
'December 01, 2000'
>>> text1 = "The date is (april29,2001)"
>>> decode_date(text1)
'April 29, 2001'
>>> decode_date(text1, is_output_date_object=True)
datetime.date(2001, 4, 29)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Presumably a date string |
required |
is_output_date_object |
bool
|
If True, the return is a
|
False
|
Returns:
Type | Description |
---|---|
str | date | None
|
str | date | None: The decoded text as a date, if it exists. |