Skip to content

Citation Date

Concept

This is a regex date formula and decoder for dates in Philippine citations based on the following constraints:

  1. Limit allowed years: 1900 - 2299
  2. Use regular days: 1 - 31
  3. Allow both traditional and unorthodox expression of months:
    • Jan.
    • Dec.
    • mar
    • july
    • Sept
  4. Capture different date formats:
    • UK format: day month, year
    • US format: month day, year
  5. Handle typographic issues, e.g. lacking space Dec1,2000

This is a dependency (to make it easier to test regex strings) referenced in the Report of citation-report; and the Docket of citation-docket. These two libraries are, in turn, dependencies of citation-utils. The citation- libraries are intended to parse long-form court decisions and documents that contain Philippine Supreme Court citations.

Report Regex

An example of a Report (referring to a reporter / publisher citation) containing a date is "1 SCRA 200 <date>". See citation-report library on how the report_date group name of a matched regex expression can be extracted from a piece of text.

Examples:

Python Console Session
>>> from citation_date import REPORT_DATE_REGEX, decode_date
>>> import re
>>> pattern = re.compile(REPORT_DATE_REGEX, re.I | re.X)  # note flags
>>> text = "1 SCRA 200 (1Dec.  2000)" # this is what a report looks like
>>> sample_match = pattern.search(text)
>>> sample_match.group("report_date")
"(1Dec.  2000)"
>>> decode_date(sample_match.group("report_date")) # use the regex group name
"2000-12-01"

Docket Regex

An example of a Docket number containing a date is "G.R. No. 12345, <date>". See citation-docket library on how the docket_date group name of a matched regex expression can be extracted from a piece of text.

Examples:

Python Console Session
>>> from citation_date import DOCKET_DATE_REGEX
>>> import re
>>> pattern = re.compile(DOCKET_DATE_REGEX, re.I | re.X)  # note flags
>>> text = "G.R. No. 12345, Dec,1,  2000" # this is what a docket looks like
>>> sample_match = pattern.search(text)
>>> sample_match.group("docket_date")
"Dec,1,  2000"
>>> decode_date(sample_match.group("docket_date")) # use the regex group name
"December 01, 2000"

Group Name: docket_date

The regular expression that is constructed will include a group name (see (?<docket_date>...)). This means that DOCKET_DATE_REGEX can be combined with a future regex expression and when the match occurs for the docket date, that match will be accessible through the group name.

Python
from citation_date import DOCKET_DATE_REGEX
import pprint

pprint.pprint(DOCKET_DATE_REGEX)
(
    "\n"
    "    (?P<docket_date>\n"
    "        \n"
    "(\n"
    "    (\n"
    "    (?:\n"
    "        Jan(?:uary)?|\n"
    "        Feb(?:ruary)?|\n"
    "        Mar(?:ch)?|\n"
    "        Apr(?:il)?|\n"
    "        May|\n"
    "        Jun(?:e)?|\n"
    "        Jul(?:y)?|\n"
    "        Aug(?:ust)?|\n"
    "        Sep(?:tember)?|\n"
    "        Sept|\n"
    "        Oct(?:ober)?|\n"
    "        (Nov|Dec)(?:ember)?\n"
    "    )\n"
    ")\n"
    "\n"
    "    [,\\.\\s]*\n"
    "    \n"
    "    (\n"
    "        ( \n"
    "            ([0]?[1-9])| # 01-09\n"
    "            ([1-2][0-9])| # 10-29\n"
    "            (3[01]) # 30-31\n"
    "        )\n"
    "    )\n"
    "\n"
    "    [,\\.\\s]*\n"
    "    \n"
    "    (\n"
    "        19[0-9][0-9]| # 1900 to 1999\n"
    "        2[0-2][0-9][0-9] # 2000 to 2299\n"
    "    )\n"
    "    \\b # ends with the last digit of the year\n"
    "\n"
    ")\n"
    "|\n"
    "(\n"
    "    \n"
    "    (\n"
    "        ( \n"
    "            ([0]?[1-9])| # 01-09\n"
    "            ([1-2][0-9])| # 10-29\n"
    "            (3[01]) # 30-31\n"
    "        )\n"
    "    )\n"
    "\n"
    "    [,\\.\\s]*\n"
    "    (\n"
    "    (?:\n"
    "        Jan(?:uary)?|\n"
    "        Feb(?:ruary)?|\n"
    "        Mar(?:ch)?|\n"
    "        Apr(?:il)?|\n"
    "        May|\n"
    "        Jun(?:e)?|\n"
    "        Jul(?:y)?|\n"
    "        Aug(?:ust)?|\n"
    "        Sep(?:tember)?|\n"
    "        Sept|\n"
    "        Oct(?:ober)?|\n"
    "        (Nov|Dec)(?:ember)?\n"
    "    )\n"
    ")\n"
    "\n"
    "    [,\\.\\s]*\n"
    "    \n"
    "    (\n"
    "        19[0-9][0-9]| # 1900 to 1999\n"
    "        2[0-2][0-9][0-9] # 2000 to 2299\n"
    "    )\n"
    "    \\b # ends with the last digit of the year\n"
    "\n"
    ")\n"
    "\n"
    "    )\n"
)

Docket Date Format

Utilizes a uniform docket format of %b. %d, %Y, e.g. Jan. 2, 1994, for dates to be usable downstream.

decode_date()

Given a piece of text, extract the date found using the specific constraints of Philippine citations.

Examples:

Python Console Session
>>> text =  "G.R. No. 12345, Dec,1,  2000"
>>> decode_date(text)
'December 01, 2000'
>>> text1 = "The date is (april29,2001)"
>>> decode_date(text1)
'April 29, 2001'
>>> decode_date(text1, is_output_date_object=True)
datetime.date(2001, 4, 29)

Parameters:

Name Type Description Default
text str

Presumably a date string

required
is_output_date_object bool

If True, the return is a datetime.date object. Defaults to False.

False

Returns:

Type Description
str | date | None

str | date | None: The decoded text as a date, if it exists.

Source code in citation_date/decoder.py
Python
def decode_date(
    text: str, is_output_date_object: bool = False
) -> str | date | None:
    """Given a piece of text, extract the date found using the specific
    constraints of Philippine citations.

    Examples:
        >>> text =  "G.R. No. 12345, Dec,1,  2000"
        >>> decode_date(text)
        'December 01, 2000'
        >>> text1 = "The date is (april29,2001)"
        >>> decode_date(text1)
        'April 29, 2001'
        >>> decode_date(text1, is_output_date_object=True)
        datetime.date(2001, 4, 29)

    Args:
        text (str): Presumably a date string
        is_output_date_object (bool, optional): If True, the return is a
            `datetime.date` object. Defaults to False.

    Returns:
        str | date | None: The decoded text as a date, if it exists.
    """
    obj = DatedText(text)
    if is_output_date_object:
        return obj.as_date
    return obj.as_string