Disclaimer: This project is unofficial and not affiliated with or endorsed by the official Unity Catalog team.
Simplifies managing Unity Catalog tables using Pydantic models.
pip install unitycatalog-pydantic
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import UCModel
class MyTable(UCModel):
col1: str
col2: int
col3: float
# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)
# Create the table
table_info = await MyTable.create(
tables_api=tables_api,
catalog_name="my_catalog",
schema_name="my_schema",
storage_location="s3://my_bucket/my_path",
)
table_info = await MyTable.get(
tables_api=tables_api,
catalog_name="my_catalog",
schema_name="my_schema",
)
await MyTable.delete(
tables_api=tables_api,
catalog_name="my_catalog",
schema_name="my_schema",
)
from pydantic import BaseModel
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import UCModel
class NestedModel(BaseModel):
nested_col1: str
nested_col2: int
class MyTable(UCModel):
col1: str
col2: NestedModel
# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)
# Create the table
table_info = await MyTable.create(
tables_api=tables_api,
catalog_name="my_catalog",
schema_name="my_schema",
storage_location="s3://my_bucket/my_path",
)
from pydantic import BaseModel
from unitycatalog.client import ApiClient, TablesApi
from unitycatalog_pydantic import create_table
class NestedModel(BaseModel):
nested_col1: str
nested_col2: int
class MyTable(BaseModel):
col1: str
col2: NestedModel
# Initialize the API client
catalog_client = ApiClient(...)
tables_api = TablesApi(catalog_client)
# Create the table
table_info = await create_table(
model=MyTable,
tables_api=tables_api,
catalog_name="my_catalog",
schema_name="my_schema",
storage_location="s3://my_bucket/my_path",
)
- tables_api: The
TablesApi
client. - catalog_name: The catalog name.
- schema_name: The schema name.
- storage_location: The storage location.
- table_type: The table type (default is
TableType.EXTERNAL
). - data_source_format: The data source format (default is
DataSourceFormat.DELTA
). - comment: A comment for the table. If not provided, the table docstring is used
- properties: The properties of the table.
- by_alias: Whether to use the alias or name for the columns (default is
True
). - json_schema_mode: The mode in which to generate the schema (default is
validation
). - alias: The table alias. If not provided, the class name is used.
Tested on Parquet, Delta, and CSV data source formats. Other formats may not work as expected.
- Currently, Parquet and Unity Catalog type integration is pretty limited. For instance, there is no way to specify the
integer type, because Parquet doesn't recognize integer SQL types. The same goes for other types like
DATE
,TIMESTAMP
, etc.. This is an integration issue and not a problem with the library itself. - You can't use nested models for CSV data source format. This is because CSV doesn't support nested types. This is an issue with the data source format and not the library itself.
- Latest version of DuckDB doesn't support reading some of the required fields for UC's ColumnInfo model. e.g., precision fields. This is an integration issue and not a problem with the library itself.