Skip to content

Overview

Concept

corpus-pax + corpus-base + statute-trees = converts raw yaml-based corpus repository to its database variant corpus-x. After constructing all of the required tables, it becomes possible to evaluate the raw data

Mode

Order Time Instruction
1 ~6sec (if with test data) corpus-pax pre-requiste before corpus-base can work.
2 ~40min corpus-base pre-requiste before corpus-x can work.
3 ~70min If inclusion files not yet created, run script to generate.
4 ~30min Assuming inclusion files are already created, can populate the various tables under corpus-x
5 ~40 to ~60min Litestream local output x.db to AWS bucket

Build from corpus-base

Assuming step 3 above has already been completed as a separate process and pax_ and sc_ tables have already been added:

Python
>>> from corpus_x import setup_x
>>> from sqlpyd import Connection
>>> c = Connection(DatabasePath="x.db", WAL=True)
>>> setup_x('x.db') # adds to the database in present working directory, takes ~2300 seconds or ~40 minutes

The produced x.db file can then be replicated to aws via litestream, which should take another hour.

From Local Files to DB

See prior documentation for corpus-base tables.