Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Insert overwrite incremental model (#67)
* Pulled out JDBC connection code and replaced it with our DB-API connection. Minor change to field values in profiles.yml to match internal nomenclature: host -> api_endpoint. * Hand merged main. * dbt debug shows we're making connections. Seeding is failing with type errors. * Can connect to the database and dbt run, but dbt seed isn't currently working. I think that's actually an issue in main. * add dbt debugging * Little bit of cleanup. Mostly removed a couple of references to and uses of jaydebeapi, Java, and the JDBC. * deprecate JDBC java type conversion hacks * Update setup.py * Updated a couple of variable names and the setup.py file. * Updated account and engine to account_name and engine_name in connections.py. * Updated method of getting credentials in connection.py. * Updated .dbtspec test file to use only env variables. connections.py was still using hosts instead of api_endpoint. * Bumped version number and dbt dependency. * Minor edits to catalog.sql for cleanup. * Changed env variable names in tests/integration/firebolt.dbtspec for eash of testing. * Bumped version number. * Bumped version number. * Fixed incorrect version number. * Bumped version number. * Removed jaydebeapi from the installation requirements. * Removed unnecessary init file and renamed tests directory. * Added firebolt-sdk >= 0.3.2 as a required install. * Removed get_status() from connections.py. * Updated .dbtspec pytest file to get schema name from environment. * Found an removed an extraneous mention of JDBC in an error msg. * Removed EngineOfflineException from connections.py. * DROP CASCADE now works and dropped table is recreated. Bonus side effect: dbt debug runs without errors or skipped steps. * Changed SELECT schema_exists to SELECT 1. Since Firebolt doesn't currently support schemas, it's always true. Pytest basecase now passes. * Bumped the version number. * Updated changelog. * Added dummy firebolt__snapshot_string_as_time macro for testing. * Added firebolt__snapshot_string_as_time macro to adapters.sql. Moved some tests around in firebolt.dbtspec. * More moving values around in firebolt.dbtspec file. * Added incremental models from dbt/core/dbt/include/global_project/macros/materializations/models/incremental. * Added line break in adapters.sql, created is_incremental.sql. * Minor changes for output while troubleshooting failing pytest. * Rewrote query in catalog.sql. Log statements added and removed, firebolt.dbtspec edited. * Updating merge to add two macros: get_merge_sql and get_delete_insert_merge_sql. * Edited firebolt.dbtspec file to better separate tests. * Removed column_comment from catalog.sql. * Removed some extraneous log statements. * Little code cleanup in adapters.sql. * Style tweeks * Updated build plan on external tables to they default to drop table. * Added exceptions for missing or misspelled fields in external tables. * As name is a required field for columns in external tables, removed some logic to error out with missing column names. * Updated changelog and version number. * Small edit to error messages for missing fields on external tables. * Changed default behavior on external tables: Firebolt external tables only drop and rebuild if explicity told to using a variable on dbt run-operation stage_external_sources. * Added allowable prefixes to PR title description. * Updated firebolt.dbtspec file. * Formatted integration_tests.yml to match that of code-check.yml. Added integration tests to pull_request.yml. * Now, actually added integration-tests to pull_request.yml. Fixed directory path issue in integration-tests.yml. * Some reformatting for legibility. Added docstrings. * Reordered index table names, tiny edits in error msgs, renamed type field to index_type for clarity. * Now passing secrets from pull_requests.yml to integration_tests.yml. * Commenting out integration tests from PR workflow. * Fixed misspelled variable FIREBOLT_USERNAME. * Edited pull_request.yml to use this branch for testing. Will rever later. * Bumped version number to 1.0.2. * Reverted pull_request.yml so that future integration tests will frun from main branch. * Bumped version number and added a little text to the changelog. * Cleaned up some missing index_type variable names. Bumped required python-sdk version to 0.5.2. * Cleanup on switch to index_type. Edited types for key and join columns, and added upper() to ensure match in errors in FireboltIndexConfig. * Moved todos out of comments and into dedicated Confluence doc. And yes, I know that's not actually relevant to the job at hand. * Added logic to deal with having a list of keys in aggregate indexes, and for correctly naming the indexes if there is a list of keys. * Edited many call statements to make dbt log output easier to read. * Style cleanup. * Cleanup of incremental.sql * Updated changelog and bumped version number. * Added line to pull_request_template.md. * Removed extraneous log and print calls, as well as unused function get_columns(). * Fixed regression in github workflow. * Regressed some code in adapters.sql. * Regressed changes in catalog.sql that will instead living in passing-base-integration-test branch. * Changelog version number hadn't been bumped. * Optimized language in changelog. * Removed extraneous changes not related to logic and moved them to a different branch. * Removed extraneous/spurious merge.sql. * Added merge.sql back in, although only get_delete_insert_merge_sql macro. * removed snapshot_string_as_time macro from adapters.sql. * Merge.sql had mis-formatted sql. * Updated readme to include append-only in supported features section. * Removed any chance of DELETE FROM being accidentally called during incremental updates. * Removing extra calls to drop_relation_if_exists in incremental. * Added relational to adapters. * Trying to replace drop_if. * Finally switched out create_table_as to create_view_as. * Added code to delete all temp and backup relations at the end of an incremental run. Renamed some relation names to make code clearer. * Changed create views back to create tables in incremental. * Fixed an error with an undeclared variable, table, in relation.sql. * Clean up, add some comments, rename variable for more clarity. * Added a strategies sql module that contains all the code for various incremental strategies and a conditional branch to choose one. * Fixed typo in call to get_insert_sql. * Fixed error message. * Moved drop_relation from adapters.sql to relation.sql. * Removed mypy flags from setup.cfg. * Undid last commit. * Rewrote a lot of incremental.sql to remove backup relations and use a view for the intermediate table, which holds the new records. Minor edits otherwise, to variable names and comments. * Significant changes in incremental.sql. * Updated the changelog to reflect a breaking change from the last merge on the columns of aggregating and join indexes. * Minor cleanup of comments in incremental.sql. * Final(?) changes to incremental append-only. Skipped creation of intermediate table, allowing for removal of extra CTAS by skipping table rename at end. A little bit of code cleanup. Now properly registering all created and dropped tables with dbt's internal database. New records are added to a novel view, and that view is dropped after the final insertion. * Fixed a problem with dropping a relation that didn't exist. Incremental pytest is now passing. * Updated firebolt.dbtspec to include incremental. * Found an extra closing curly bracket in incremental.sql. * Added two jinja files and updated incremental.sql to allow for correct processing of errors on schema changes. * Fixed some syntax errors. * Updated all logic to check for errors on schema changes and to log warnings. * Disallow any schema changes whatsoever. * Started to add macros and pseudocode for doing insert-overwrite. * Fixed a bug/typo in incremental/strategies.sql. * Cleaned up and added some error messages. Fixed logic for retrieval of partitions to drop. * Changed get_response() to return Success from False. * Clarified some comments in incremental.sql and also renamed some variables for ease of comprehension. * Clean up error message output. * Removed all log and print calls and updated changelog. * Updated readme to show that we now support insert_overwrite. * Moved out of passing tests in firebolt.dbtspec. * Updated firebolt.dbtspec to make comment clearer. * Updated the changelog and version number. * Bumped version number. * Fixed queries on dropping partitions. The partitions were actually just a string, not an iterable, so trimmed first and last chars ('[' and ']'). Rest of string was correctly formatted. * Rewrote drop partition code to correctly format SQL query string. * Removed log() statements. * Trying to rectify logic of DROP PARTITION SQL so that it works equally well whether partitions specified in config files or from SQL query. * Updated drop_partitions_sql macro so that it works with either queried data or data sent in from a config file. * Changed log message. * Finished logic on getting to work, as well as . * Fixed issue with valies being truncated before partitions are dropped. Added a billion hyphens to jinja tags to make log output easier to read. * Made a bunch of changes to get correct logic for parsing results from queries used in order to figure out which partitions to drop on an insert_overwrite incremental model. Also, some code cleanup and commenting. * Having an issue with multiple columns in partitions. jinja is trying to do something like DROP PARTITIION '['henry',, 'megan']'. * Fixed bug with multiple columns in strategies.sql. Started trying to add in code to determine column types. Switching to new-integration-tests to complete that. * Effed up and made changes that ought to be in firebolt__get_columns_in_relation in firebolt__list_relations_without_caching instead. Now removed them completely. * Added functions to convert list of SDK Column types to FireboltColumn types. Next step is to change output type of get_columns_in_relation from List[Agate.table] to List[FireboltColumn]. * Added a bunch of type 'annotations' to docstrings for jinja macros. * Successfully diffing FireboltColumns with column and dtype. Wrote a python fn to check for date types and append ::DATE. * Found a couple of tiny issues, mostly in docstrings. * Removed an extraneous line of code found by Sonar Cloud. * Slightly optimized diff_column_data_types in column_helpers.sql. * Walking back changes to return type of get_columns_in_relation. Is currently List[FireboltColumn], moving back to agate.Table. * Finished with type changes. Also cleaned up some docstrings and comments. * Was missing an opening curly brace on the macro get_columns_in_relation. * Updated connection auth and required firebolt-sdk version. * Updated impl.sdk_column_list_to_firebolt_column_list to convert types from Python to Firebolt internal types. * Updated the changelog. * Created new function create_type_string() in impl.py to correctly parse Firebolt array types. Updated changelog to acknowledge breaking change in auth required with firebolt-sdk >= .8. * Changed floats to map to Firebolt DOUBLE types. * Switched to using isinstance() rather than string conversion and find() to parse arrays. * Changed table build success output to all caps to match dbt's status text when running. * Was accidentally checking for extracts as columns when figuring out date types. Now not checking for anything that isn't an _actual_ column. * Updated firebolt-sdk to deal with fossa's httpx complaints. * Minor changes to deal with comments by reviewer on PR. * Edited a function argument name. * Changed warning on missing field to exception. Co-authored-by: Eric Ford <[email protected]> Co-authored-by: swanderz <[email protected]>
- Loading branch information