Corpus Base Docs
flowchart TD
pax(corpus-pax)--github api--->sc
subgraph /corpus
1(justices)
2(decisions/sc)
3(decisions/legacy)
end
subgraph local
1--github api---sc
2--local copy of corpus---sc
3--local copy of corpus---sc
sc(corpus-base)--run setup_base--->db[(sqlite.db)]
end
Concept
In tandem with corpus-pax, corpus-base
creates sqlpyd tables related to decisions of the Philippine Supreme Court, thereby adding the following:
- Justices
- Decisions
- Citations
- Votelines
- Titletags
- Opinions
- Segments
Run
>>> from corpus_pax import setup_pax_base
>>> db_name = # assume target db to be created/recreated is in the present working directory
>>> setup_pax_base('test.db') # takes ~20 to 30 minutes to create/recreate in working dir
Caveats
Flow
- Unlike
corpus-pax
which operates over API calls,corpus-base
operates locally. - It implies parsing through a locally downloaded repository
corpus
to populate tables. - Opinions are limited. Save for 1 or 2 sample situations, the present
corpus
only includes the Ponencia.
Data
The path location of the downloaded corpus
repository is hard-coded since this package is intended to be run locally.
Instructions for downloading and updating the repository are discussed elsewhere.
Now toying with the idea of placing the entire corpus
in a bucket like AWS S3 or Cloudflare R2. So that all access can be cloud-based.
Prerequisites
Repositories
Different repositories involved:
repository | status | type | purpose |
---|---|---|---|
lawsql-articles | private | data source | used by corpus-pax; yaml-formatted member and org files |
corpus-entities | private | data source | used by corpus-pax; markdown-styled articles with frontmatter |
corpus | private | data source | used by corpus-base |
corpus-pax | public | sqlite i/o | functions to create pax-related tables |
corpus-base | public | sqlite i/o | functions to create sc-related tables |
.env
Create an .env file to create/populate the database. See sample .env
highlighting the following variables:
- Cloudflare
CF_ACCT
- Cloudflare
CF_TOKEN
- Github
GH_TOKEN
DB_FILE
(sqlite)
Note the workflow (main.yml) where the secrets are included for Github actions. Ensure these are set in the repository's <url-to-repo>/settings/secrets/actions
, making the proper replacements when the tokens for Cloudflare and Github expire.
Helper function to do things incrementally
Since there are thousands of cases, can limit the number of downloads via the test_only
function attribute.