Skip to content

Datasette Plugins

datasette-auth-token

Bash
├── app
   ├── metadata.yml # see datasette-auth-tokens referrencing `LAWSQL_BOT_TOKEN`
├── data
   ├── 1.db
   ├── 2.db

The restored database/s can be accessed in a localhost container through a datasette instance via an authorization bearer token LAWSQL_BOT_TOKEN, if the same is enabled in themetadata.yml file.

Configure metadata

Edit the metadata.yml to allow access to the database without a token by commenting out:

YAML
databases:
  # allow: --> commented out
  #  bot_id: "lex-bot" --> commented out

Access local .venv via datasette

Bash
# .env
datasette serve --immutable data/pdf.db data/x.db ...

Access running Docker container via docker run

Can create the docker image with:

Bash
docker build -t lawdata . # Will look for Dockerfile inside the . folder

This will start the build process. If successful, the docker image will be built and appear in the list of Docker Images found in VS Code's Docker extension.

Run the docker image locally with:

Bash
docker run --publish 8080:8080 --env-file .env lawdata

Restore via run.sh

The Dockerfile terminates with run.sh.

Since, on initialization, the sqlite database file doesn't exist yet, it will use litestream's restore command to copy the AWS variant to a local container.

Bash Session
No database found, restoring from replica if exists
2022/12/10 05:42:11.906021 s3: restoring snapshot xxx/00000000 to /db/x.db.tmp
2022/12/10 05:43:45.299375 s3: restoring wal files: generation=xxx index=[00000000,00000000]
2022/12/10 05:43:45.494176 s3: downloaded wal xxx/00000000 elapsed=191.820083ms
2022/12/10 05:43:45.566760 s3: applied wal xxx/00000000 elapsed=73.156208ms
2022/12/10 05:43:45.566865 s3: renaming database from temporary location
INFO:     Started server process [15]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

Test access on a running container

Unauthorized:

Bash
curl -IX get localhost:8080/x.json

Produces a HTTP/1.1 403 (FORBIDDEN) http status code:

Bash
HTTP/1.1 403 Forbidden
date: x x x
server: uvicorn
content-type: text/html; charset=utf-8
Transfer-Encoding: chunked

Authorized:

With xxx as LAWSQL_BOT_TOKEN, this results in a list of tables from restored the x.db via datasette + litestream:

Bash
export token=<whatever-value-of-LAWSQL_BOT_TOKEN>
curl -H 'Authorization: Bearer ${token}' localhost:8080/x.json | jq

datasette-auth-github

Bash
├── app
   ├── metadata.yml # see datasette-auth-github referrencing `DATASETTE_GITHUB_AUTH_CLIENT_ID` and `DATASETTE_GITHUB_AUTH_CLIENT_SECRET`
├── data
   ├── 1.db
   ├── 2.db

See metadata.yaml:

YAML
databases:
  allow:
    gh_login:
    - justmars # allowed actor with datasette-auth-github

datasette-query-files

Bash
├── app
   ├── queries # see metadata.yml referencing app/queries in relation to datasette-query-files
      ├── x # for x-based .sql queries
      ├── pdf # for pdf-based .sql queries

Unlike a default Datasette instance, canned SQL queries will not be found in the metadata.yml.

The datasette-query-files plugin allows us to use a separate folder /queries/ where each pairing of .sql and .yml becomes its own canned API endpoint.

Native .sql files

This setup makes it easier to write .sql files in VSCode with extensions (dbt formatter dependent on vscode-dbt):

JSON
// settings.json
"files.associations": {
  "**/*.sql": "jinja-sql",
}

Relatively Complex SQL

The sqlite expressions are complex, making use of JSON1 and FTS5 extensions.

Start and End Rows, precursor to FTS5 snippet

Most of the queries utilize the following common table expression style:

SQL
SELECT
  ROW_NUMBER() over (ORDER BY cx.id) rn,
  cx.id row_idx,
  COUNT(*) over () max_count
FROM
  lex_tbl_codification_fts_units cx
  JOIN lex_tbl_codification_fts_units_fts
  ON cx.rowid = lex_tbl_codification_fts_units_fts.rowid
WHERE
  cx.codification_id = :code_id
  AND lex_tbl_codification_fts_units_fts match escape_fts(:q) -- escape_fts is a datasette-defined user function

This is an example of fetching the applicable rows for a given code_id with a matching full-text-search (fts) done on the lex_tbl_codification_fts_units table.

It creates rows with the following fields under a designated order:

  1. row numbers rn for each matching row
  2. paired unique id row_idx corresponding to the rn
  3. total number of rows max_count

Using this first CTE as the baseline, a second CTE will be used to filter the first CTE based on a start_row and an end_row

SQL
SELECT
  rn, row_idx, max_count
FROM
  rowids_match_q
WHERE
  rn BETWEEN CAST(:start AS INTEGER)
  AND CAST(:end AS INTEGER)

The reason for these preliminary CTEs is to limit the rows that sqlite's fts5 snippet function will be called to operate on; if the snippet function were called in the first CTE, then all of the matching rows would have a computed value vs. the ranged rows limited by the start and end parameters.

SQL
SELECT
  snippet(
    lex_tbl_codification_fts_units_fts,
    0,
    '<mark>',
    '</mark>',
    '...',
    15
  ) matched_text
FROM
  lex_tbl_codification_fts_units cx3
  JOIN lex_tbl_codification_fts_units_fts
  ON lex_tbl_codification_fts_units_fts.rowid = cx3.rowid
WHERE
  cx3.id = cx2.id -- cx2 is declared in the main SQL statement and will be based on the prefiltered rows
  AND lex_tbl_codification_fts_units_fts match escape_fts(:q)

The full SQL expression for this particular example can be found in /queries/x/code_mp_fts_id.sql.