Skip to content

dan1elt0m/unitycatalog-pydantic

Repository files navigation

Unity Catalog Pydantic

CodeQL test Python Version from PEP 621 TOML codecov

Disclaimer: This project is unofficial and not affiliated with or endorsed by the official Unity Catalog team.

Simplifies managing Unity Catalog tables using Pydantic models.

Installation

pip install unitycatalog-pydantic

Examples

Create Table

from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import UCModel

class MyTable(UCModel):
    col1: str
    col2: int
    col3: float

# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)

# Create the table
table_info = await MyTable.create(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
    storage_location="s3://my_bucket/my_path",
)

Retrieve Table

table_info = await MyTable.get(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
)

Delete Table

await MyTable.delete(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
)

Nested Models

from pydantic import BaseModel
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import UCModel

class NestedModel(BaseModel):
    nested_col1: str
    nested_col2: int

class MyTable(UCModel):
    col1: str
    col2: NestedModel

# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)

# Create the table
table_info = await MyTable.create(
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
    storage_location="s3://my_bucket/my_path",
)

Using a BaseModel as root model

from pydantic import BaseModel
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import create_table

class NestedModel(BaseModel):
    nested_col1: str
    nested_col2: int

class MyTable(BaseModel):
    col1: str
    col2: NestedModel

# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)

# Create the table
table_info = await create_table(
    model=MyTable,
    tables_api=tables_api,
    catalog_name="my_catalog",
    schema_name="my_schema",
    storage_location="s3://my_bucket/my_path",
)

Configuration

  • tables_api: The TablesApi client.
  • catalog_name: The catalog name.
  • schema_name: The schema name.
  • storage_location: The storage location.
  • table_type: The table type (default is TableType.EXTERNAL).
  • data_source_format: The data source format (default is DataSourceFormat.DELTA).
  • comment: A comment for the table. If not provided, the table docstring is used
  • properties: The properties of the table.
  • by_alias: Whether to use the alias or name for the columns (default is True).
  • json_schema_mode: The mode in which to generate the schema (default is validation).
  • alias: The table alias. If not provided, the class name is used.

Caveats

Tested on Parquet, Delta, and CSV data source formats. Other formats may not work as expected.

  • Currently, Parquet and Unity Catalog type integration is pretty limited. For instance, there is no way to specify the integer type, because Parquet doesn't recognize integer SQL types. The same goes for other types like DATE, TIMESTAMP, etc.. This is an integration issue and not a problem with the library itself.
  • You can't use nested models for CSV data source format. This is because CSV doesn't support nested types. This is an issue with the data source format and not the library itself.
  • Latest version of DuckDB doesn't support reading some of the required fields for UC's ColumnInfo model. e.g., precision fields. This is an integration issue and not a problem with the library itself.

About

Manage Unity Catalog tables with Pydantic Models

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages