Overview
Combines:
- sqlite-utils data management
- Pydantic data validation
Data will (later) be deployed to a specific Datasette project: LawData for use in LawSQL(https://lawsql.com)
Stack
In handling data that persists, we'll be using two layers of code:
Layer | Dimension |
---|---|
App | sqlite-utils + Pydantic |
Database | sqlite |
Though sqlite
features are frequently evolving, see json1, fts5, etc., it lacks a more robust validation mechanism. Pydantic
would be useful to: (a) clean and validate a model's fields prior to database insertion; and (b) reuse extracted data from the database. Since the database query syntax (SQL) is different from the app syntax (python), a useful bridge is sqlite-utils
which allows us, via this package, to use pre-defined Pydantic field attributes as a means of creating dynamic SQL statements.
Put another way, this is an attempt to integrate the two tools so that the models declared in Pydantic can be consumed directly by sqlite-utils.
The opinionated use of default configurations is intended for a specific project. Later on, may consider making this more general in scope.
Connection
Connect to a database declared by an .env
file through a DB_FILE
variable, e.g.
With the .env file created, the following sqlite sqlpyd.Connection
object gets a typed Table
:
>>> from sqlpyd import Connection
>>> from sqlite_utils.db import Table
>>> conn = Connection() # will use .env file's DB_FILE value
>>> conn.db["test_table"].insert({"text": "hello-world"}, pk="id") # will contain a db object
>>> isinstance(conn.tbl("test_table"), Table)
True
There appears to be movement to make sqlite-utils more type-friendly, see issue.
Fields
Generic Pydantic Model
Let's assume a generic pydantic BaseModel with a non-primitive field like suffix
and gender
:
# see sqlpyd/name.py
class Gender(str, Enum):
male = "male"
female = "female"
other = "unspecified"
class Suffix(str, Enum):
jr = "Jr."
sr = "Sr."
third = "III"
fourth = "IV"
fifth = "V"
sixth = "VI"
class IndividualBio(BaseModel):
first_name: str = Field(None, max_length=50)
last_name: str = Field(None, max_length=50)
suffix: Suffix | None = Field(None, max_length=4)
nick_name: str = Field(None, max_length=50)
gender: Gender | None = Field(Gender.other)
class Config:
use_enum_values = True
With the BaseModel, we can get the types directly using:
>>> IndividualBio.__annotations__
{
"first_name": str,
"last_name": str,
"suffix": sqlpyd.__main__.Suffix | None, # is non-primitive / optional
"nick_name": str,
"gender": sqlpyd.__main__.Gender | None, # is non-primitive / optional
}
Using the sqlite-utils convention of creating tables, this will throw an error:
>>> from sqlpyd import Connection # thin wrapper over sqlite-utils Database()
>>> conn = Connection(DatabasePath="created.db")
>>> conn.db["test_tbl"].create(columns=IndividualBio.__annotations__)
...
KeyError: sqlpyd.__main__.Suffix | None
Data Modelling & Input Validation
We could rewrite the needed columns and use sqlite-utils:
conn.db["test_tbl"].create(
columns={
"first_name": str,
"last_name": str,
"suffix": str,
"nick_name": str,
"gender": str,
}
)
# <Table test_tbl (first_name, last_name, suffix, nick_name, gender)>
But we can also modify the initial Pydantic model and co-inherit from sqlpyd.TableConfig
, to wit:
class RegularName(
BaseModel
): # separated the name to add a clear pre-root validator, note addition to field attributes
full_name: str | None = Field(None, col=str, fts=True, index=True)
first_name: str = Field(..., max_length=50, col=str, fts=True)
last_name: str = Field(..., max_length=50, col=str, fts=True, index=True)
suffix: Suffix | None = Field(None, max_length=4, col=str)
class Config:
use_enum_values = True
@root_validator(pre=True)
def set_full_name(cls, values):
if not values.get("full_name"):
first = values.get("first_name")
last = values.get("last_name")
if first and last:
values["full_name"] = f"{first} {last}"
if sfx := values.get("suffix"):
values["full_name"] += f", {sfx}"
return values
class IndividualBio(
RegularName, TableConfig
): # mandatory step: inherit from TableConfig (which inherits from BaseModel)
__tablename__ = "person_tbl" # optional: may declare a tablename
__indexes__ = [["first_name", "last_name"]] # optional: may declare joined indexes
nick_name: str | None = Field(None, max_length=50, col=str, fts=True)
gender: Gender | None = Field(Gender.other, max_length=15, col=str)
@validator("gender", pre=True)
def lower_cased_gender(cls, v):
return Gender(v.lower()) if v else None
With this setup, we can use the connection to create the table. Note that the primary key id
is auto-generated in this scenario:
>>> conn = Connection(DatabasePath="test.db", WAL=False)
>>> conn.create_table(IndividualBio)
<Table person_tbl (id, full_name, first_name, last_name, suffix, nick_name, gender)>
>>> person2 = { # dict
"first_name": "Jane",
"last_name": "Doe",
"suffix": None,
"gender": "FEMALE", # all caps
"nick_name": "Jany",
}
>>> IndividualBio.__validators__ # note that we created a validator for 'gender'
{'gender': [<pydantic.class_validators.Validator object at 0x10c497510>]}
>>> IndividualBio.__pre_root_validators__() # we also have one to create a 'full_name'
[<function RegularName.set_full_name at 0x10c4b43a0>]
>>> tbl = conn.add_record(IndividualBio, person2) # under the hood, the dict is instantiated to a Pydantic model and the resulting `tbl` value is an sqlite-utils Table
>>> assert list(tbl.rows) == [
{
"id": 1, # auto-generated
"full_name": "Jane Doe", # since the model contains a pre root-validator, it adds a full name
"first_name": "Jane",
"last_name": "Doe",
"suffix": None,
"nick_name": "Jany",
"gender": "female", # since the model contains a validator, it cleans the same prior to database entry
}
]
True
Attributes
sqlite-utils
is a far more powerful solution than the limited subset of features provided here. Again, this abstraction is for the purpose of easily reusing the functionality for a specific project rather than for something more generalized.
Columns In General
Using col
in the Pydantic Field signals the need to add the field to an sqlite database table:
conn = Connection(DatabasePath="test.db", WAL=False)
kls = IndividualBio
tbl = conn.db[kls.__tablename__]
cols = kls.extract_cols(kls.__fields__) # possible fields to use
"""
{'first_name': str,
'last_name': str,
'suffix': str,
'nick_name': str,
'gender': str}
"""
tbl.create(cols) # customize tablename and column types
# <Table individual_bio_tbl (first_name, last_name, suffix, nick_name, gender)>
Primary Key
To auto-generate, use the TableConfig.config_tbl()
helper. It auto-creates the id
field as an int
-based primary key.
Note: if an
id
is declared as astr
in the pydantic model, thestr
declaration takes precedence over the implicitint
default.
conn = Connection(DatabasePath="test_db.db")
kls = IndividualBio
tbl = conn.db[kls.__tablename__]
tbl_created = kls.config_tbl(tbl=tbl, cols=kls.__fields__)
# <Table individual_bio_tbl (id, first_name, last_name, suffix, nick_name, gender)> # id now added
This results in the following sql schema:
CREATE TABLE [individual_bio_tbl] (
[id] INTEGER PRIMARY KEY, -- added as integer since no field specified
[first_name] TEXT NOT NULL, -- required via Pydantic's ...
[last_name] TEXT NOT NULL, -- required via Pydantic's ...
[suffix] TEXT,
[nick_name] TEXT,
[gender] TEXT
)
Full-Text Search (fts) Fields
Since we indicated, in the above declaration of Fields
, that some columns are to be used for fts
, we enable sqlite-utils to auto-generate the tables required. This makes possible the prescribed approach of querying fts tables:
# Using the same variable for `tbl` described above, can yield a query string, viz.
print(tbl.search_sql(columns=["first_name", "last_name"]))
produces:
with original as (
select
rowid,
[first_name],
[last_name]
from [individual_bio_tbl]
)
select
[original].[first_name],
[original].[last_name]
from
[original]
join [individual_bio_tbl_fts] on [original].rowid = [individual_bio_tbl_fts].rowid
where
[individual_bio_tbl_fts] match :query
order by
[individual_bio_tbl_fts].rank
Foreign Keys
To add foreign keys, can use the fk
attribute on a ModelField, assigning the same to a 2-tuple, e.g.:
class GroupedIndividuals(TableConfig):
__tablename__ = "grouping_tbl"
__indexes__ = [["member_id", "name"]]
name: str = Field(..., max_length=50, col=str)
member_id: int = Field(
..., col=int, fk=(IndividualBio.__tablename__, "id"), index=True
)
Parts of fk
tuple:
- The first part of the
fk
tuple is the referenced table name X. - The second part of the
fk
tuple is the id of X.
So in the above example, member_id
, the Pydantic field, is constrained to the "id" column of the table "individual_bio_tbl"
Indexes
Note that we can add an index to each field as well with a boolean True
to a ModelField attribute index
. In case we want to use a combination of columns for the index, can include this when subclassing TableConfig
:
class GroupedIndividuals(TableConfig):
__tablename__ = "grouping_tbl"
__indexes__ = [["member_id", "name"]] # follow sqlite-utils convention
When combined, the sql generated amounts to the following:
CREATE TABLE [grouping_tbl] (
[id] INTEGER PRIMARY KEY,
[name] TEXT NOT NULL,
[member_id] INTEGER NOT NULL REFERENCES [individual_bio_tbl]([id])
);
CREATE UNIQUE INDEX [idx_grouping_tbl_member_id]
ON [grouping_tbl] ([member_id]);
CREATE UNIQUE INDEX [idx_grouping_tbl_name_member_id]
ON [grouping_tbl] ([name], [member_id]);
API
Connection API
Bases: BaseSettings
Thin wrapper over sqlite-utils.Database
(which itself wraps around sqlite3.connect
)
with some convenience methods.
Source code in sqlpyd/conn.py
Python | |
---|---|
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
Functions
add_cleaned_records(kls, items)
Compare this with add_records()
. If records have already been
previously validated, no need for a deserialization step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kls |
Any
|
preconfigured |
required |
items |
Iterable[dict] | Iterator[dict]
|
Each item matches |
required |
Returns:
Name | Type | Description |
---|---|---|
Table |
Table
|
description |
Source code in sqlpyd/conn.py
add_record(kls, item)
With a TableConfig kls
(that is representing or should represent
an sqlite table), add a record to the same which will be cleaned via
Pydantic prior to being inserted by sqlite-utils.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kls |
Any
|
preconfigured |
required |
item |
dict
|
Raw data representing fields that match the |
required |
Returns:
Name | Type | Description |
---|---|---|
Table |
Table
|
The sqlite-utils.db Table instance. |
Source code in sqlpyd/conn.py
add_records(kls, items)
With a TableConfig kls
(that is representing or should represent
an sqlite table), add multiple records to the same which will be
cleaned via Pydantic prior to being inserted by sqlite-utils.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kls |
Any
|
preconfigured |
required |
items |
Iterable[dict] | Iterator[dict]
|
Each item matches |
required |
Returns:
Name | Type | Description |
---|---|---|
Table |
Table
|
description |
Source code in sqlpyd/conn.py
create_table(kls)
Create a TableConfig
table, if it doesn't exist yet,
having attributes declared in the kls
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kls |
Any
|
Assumes that |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
The |
Returns:
Name | Type | Description |
---|---|---|
Table |
Table
|
The sqlite-utils.db Table instance. |
Source code in sqlpyd/conn.py
table(kls)
Using the TableConfig kls to retrieve the table instance.
Source code in sqlpyd/conn.py
tbl(table_name)
Should use string represented by kls.tablename to retrieve the table instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str
|
The name of the sqlite table. |
required |
Raises:
Type | Description |
---|---|
Exception
|
No table found. |
Returns:
Name | Type | Description |
---|---|---|
Table |
Table
|
The sqlite-utils.db Table instance. |
Source code in sqlpyd/conn.py
TableConfig API
Bases: BaseModel
Adds custom pydantic.Field
attributes:
Attribute | Value | Function |
---|---|---|
col |
str , int , etc. |
Accepts a primitive Python type understandable by sqlite-utils |
fk |
tuple[str, str] | First value: the name of the table in sqlite and; second value: name of column representing the foreign key in said table |
fts |
bool |
If True, a full-text-search counterpart table is created including said column |
index |
bool |
If True, the column is created with an sqlite index` |
required |
bool |
If True, the column is deemed essential to instantiation |
This enables the construction of an sqlite-utils
-designed table.
Note that __indexes__
as a ClassVar
refers to a list of Iterables that can be used as
indexes to the table,based on the sqlite-utils convention.
Defaults to None.
Source code in sqlpyd/tableconfig.py
Python | |
---|---|
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
Functions
config_tbl(tbl)
classmethod
Using pydantic fields, generate an sqlite db table via
sqlite-utils
conventions.
Each pydantic.BaseModel
will have a fields attribute,
which is a dictionary of ModelField
values.
Each of these fields assigned a col
attribute
will be extracted from the ModelField.
The extract will enable further processing on the field such
as introspecting the fk
, fts
, and index
attributes.
For more complex indexes, the idxs
attribute can be supplied
following the sqlite-utils
convention.
Returns:
Name | Type | Description |
---|---|---|
Table |
Table
|
An sqlite-utils Table object. |
Source code in sqlpyd/tableconfig.py
extract_cols(fields_data)
classmethod
If a col
attribute is declared in the ModelField,
e.g. name: str = Field(col=str)
, extract it.
Note this makes the the id
field an int
type by default.
But if an id
field exists in the parent model and this is
set to a different type, e.g. str
, this overrides the default id
previously set as an int
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields_data |
dict[str, ModelField]
|
Pydantic field |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
dict[str, Any]: A mapping to be used later sqlite-utils |
Source code in sqlpyd/tableconfig.py
extract_model_fields(fields_data)
classmethod
Loop through the ModelField to extract included 2-tuples. The first part of the tuple is the name of the field, the second part is the ModelField.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields_data |
dict[str, ModelField]
|
description |
required |
Yields:
Type | Description |
---|---|
Iterator[tuple[str, ModelField]]
|
Iterator[tuple[str, ModelField]]: description |