ActiveRecord::Dbt
generates dbt files from the information of the database connected via ActiveRecord.
Currently, it can generate yaml
files for sources
and models
files for staging
.
To install activerecord-dbt
, add this line to your application's Gemfile:
gem 'activerecord-dbt'
Since it is only used in the development environment, it is recommended to add it to the development group:
group :development do
gem 'activerecord-dbt'
end
Then run:
$ bundle
Alternatively, you can install it manually by running:
$ gem install activerecord-dbt
Create an initializer file for dbt:
$ bin/rails generate active_record:dbt:initializer
This will generate the config/initializers/dbt.rb
file.
Configuration | Description |
---|---|
config_directory_path | The path to the directory where files generated by bin/rails generate active_record:dbt:* are stored. The default is lib/dbt . |
export_directory_path | The path to the directory where configuration files are stored. The default is doc/dbt . |
dwh_platform | Specify the data warehouse platform to which dbt connects. The default is bigquery . |
data_sync_delayed | Indicates whether there is a data delay. If set to true , severity: warn is applied to the relationships test. The default is false . |
logger | The destination for log output. The default is Logger.new('./log/active_record_dbt.log') . |
used_dbt_package_names | An array of dbt package names to use. |
locale | I18n locale. The default is I18n.locale . |
List of platforms that can currently be set with dwh_platform
.
Data Warehouse Platform | Link |
---|---|
bigquery | BigQuery enterprise data warehouse - Google Cloud |
postgres | PostgreSQL: The world's most advanced open source database |
redshift | Cloud Data Warehouse - Amazon Redshift - AWS |
snowflake | The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI |
spark | Apache Spark™ - Unified Engine for large-scale data analytics |
List of packages that can currently be set with used_dbt_package_names
.
dbt Package Name | Link |
---|---|
dbt-labs/dbt-utils | dbt-labs/dbt-utils: Utility functions for dbt projects. |
datnguye/dbterd | datnguye/dbterd: Generate the ERD as a code from dbt artifacts |
Example:
Adjust the settings according to your environment.
# frozen_string_literal: true
if Rails.env.development?
require 'active_record/dbt'
ActiveRecord::Dbt.configure do |c|
c.config_directory_path = 'lib/dbt'
c.export_directory_path = 'doc/dbt'
c.data_sync_delayed = false
c.used_dbt_package_names = [
'dbt-labs/dbt_utils',
'datnguye/dbterd'
]
end
end
Create configuration files for dbt:
$ bin/rails generate active_record:dbt:config
This will create the following files.
File | Description |
---|---|
#{config_directory_path}/source_config.yml |
Used to generate #{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml . |
#{config_directory_path}/staging_model.sql.tt |
Used to generate #{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql . |
In the #{config_directory_path}/source_config.yml
file, describe the properties you want to set for the source.
You can configure sources
, table_overrides
, defaults
, and table_descriptions
in this file.
The available properties for sources
and table_overrides
are detailed in Source properties | dbt Developer Hub.
Set the configurations for sources
.
Configuration | Description |
---|---|
project_name | Written to the beginning of the table or model description . |
table_names in exclude |
Specify which table names you do not want output in sources . |
Example:
sources:
name: dummy
meta:
project_name: dummy_project
generated_by: activerecord-dbt
exclude:
table_names:
- ar_internal_metadata
- schema_migrations
description: |-
Write a description of the 'dummy' source.
You can write multiple lines.
Set all properties for tables
except for name
and description
.
Example:
table_overrides:
users:
loaded_at_field: created_at
freshness:
warn_after:
count: 3
period: day
error_after:
count: 5
period: day
columns:
created_at:
data_tests:
- not_null:
where: 'id != 1'
Set the default value for the description
(logical_name
, description
) of tables
.
In the logical_name
and description
fields of table_descriptions
, you can use {{ table_name }}
to refer to the table name and {{ project_name }}
to refer to the project name.
In the description
of table_descriptions.columns
, you can use {{ table_name }}
to refer to the table name and {{ column_name }}
to refer to the column name.
Example:
defaults:
table_descriptions:
logical_name: Write the logical_name of the '{{ table_name }}' table in '{{ project_name }}'.
columns:
description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
If nothing is set, it defaults to the following:
defaults:
table_descriptions:
logical_name: Write the logical_name of the '{{ table_name }}' table in '{{ project_name }}'.
columns:
description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
Set the name
and description
for tables
.
Configuration | Description |
---|---|
logical_name | A title or one-line description to be output in the dbt description . |
description | A detailed description of logical_name to be output in the dbt description . |
Example:
table_descriptions:
ar_internal_metadata:
logical_name: Internal Metadata
description: |-
By default Rails will store information about your Rails environment and schema
in an internal table named `ar_internal_metadata`.
columns:
key: Key
value: Value
created_at: Created At
updated_at: Updated At
schema_migrations:
logical_name: Schema Migrations
description: |-
Rails keeps track of which migrations have been committed to the database and
stores them in a neighboring table in that same database called `schema_migrations`.
columns:
version: The version number of the migration.
Adjust the settings according to your environment.
sources:
name: dummy
meta:
project_name: dummy_project
generated_by: activerecord-dbt
exclude:
table_names:
- hoges
description: |-
Write a description of the 'dummy' source.
You can write multiple lines.
table_overrides:
users:
loaded_at_field: created_at
freshness:
warn_after:
count: 3
period: day
error_after:
count: 5
period: day
columns:
created_at:
data_tests:
- not_null:
where: 'id != 1'
defaults:
table_descriptions:
logical_name: Write the logical_name of the '{{ table_name }}' table in '{{ project_name }}'.
columns:
description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
seed_descriptions:
enum:
description: "{{ project_name }} {{ table_logical_name }} {{ column_description }} enum"
table_descriptions:
ar_internal_metadata:
logical_name: Internal Metadata
description: |-
By default Rails will store information about your Rails environment and schema
in an internal table named `ar_internal_metadata`.
columns:
key: Key
value: Value
created_at: Created At
updated_at: Updated At
schema_migrations:
logical_name: Schema Migrations
description: |-
Rails keeps track of which migrations have been committed to the database and
stores them in a neighboring table in that same database called `schema_migrations`.
columns:
version: The version number of the migration.
Generate a source file for dbt:
$ bin/rails generate active_record:dbt:source
Generate #{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml
.
Note
The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.
Warning
If you are using a version of dbt lower than v1.8, replace tests:
with data_tests:
in the generated file.
Add data tests to your DAG | dbt Developer Hub
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests in v1.8, the key was renamed from
tests:
todata_tests:
.
---
version: 2
sources:
- name: dummy
meta:
project_name: dummy_project
generated_by: activerecord-dbt
exclude:
table_names:
- hoges
description: |-
Write a description of the 'dummy' source.
You can write multiple lines.
tables:
- name: ar_internal_metadata
description: |-
# dummy_project Internal Metadata
By default Rails will store information about your Rails environment and schema
in an internal table named `ar_internal_metadata`.
columns:
- name: key
description: Key
data_type: string
data_tests:
- unique
- not_null
- name: value
description: Value
data_type: string
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: companies
description: Write the logical_name of the 'companies' table in 'dummy_project'.
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: name
description: Write a description of the 'companies.name' column.
data_type: string
data_tests:
- not_null
- name: establishment_date
description: Write a description of the 'companies.establishment_date' column.
data_type: string
- name: average_age
description: Write a description of the 'companies.average_age' column.
data_type: float64
- name: published
description: Write a description of the 'companies.published' column.
data_type: bool
data_tests:
- not_null
- accepted_values:
values:
- true
- false
quote: false
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: posts
description: dummy_project Post
columns:
- name: id
description: ID
data_type: int64
data_tests:
- unique
- not_null
- name: user_id
description: User
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: title
description: Title
data_type: string
- name: content
description: Content
data_type: string
- name: created_at
description: Post Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Post Updated At
data_type: datetime
data_tests:
- not_null
- name: status
description: Status
data_type: int64
data_tests:
- accepted_values:
values:
- 0
- 1
- 2
quote: false
- name: posts_tags
description: Write the logical_name of the 'posts_tags' table in 'dummy_project'.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- post_id
- tag_id
columns:
- name: post_id
description: post_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'posts')
field: id
meta:
relationship_type: many-to-one
active_record_dbt_error:
class: NameError
message: |-
uninitialized constant PostsTag
Object.const_get(camel_cased_word)
^^^^^^^^^^
- name: tag_id
description: tag_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'tags')
field: id
meta:
relationship_type: many-to-one
active_record_dbt_error:
class: NameError
message: |-
uninitialized constant PostsTag
Object.const_get(camel_cased_word)
^^^^^^^^^^
- name: profiles
description: Write the logical_name of the 'profiles' table in 'dummy_project'.
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: user_id
description: user_id
data_type: int64
data_tests:
- unique
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: one-to-one
- name: first_name
description: Write a description of the 'profiles.first_name' column.
data_type: string
data_tests:
- not_null
- name: last_name
description: Write a description of the 'profiles.last_name' column.
data_type: string
data_tests:
- not_null
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: relationships
description: Write the logical_name of the 'relationships' table in 'dummy_project'.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- follower_id
- followed_id
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: follower_id
description: follower_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: followed_id
description: followed_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: schema_migrations
description: |-
# dummy_project Schema Migrations
Rails keeps track of which migrations have been committed to the database and
stores them in a neighboring table in that same database called `schema_migrations`.
columns:
- name: version
description: The version number of the migration.
data_type: string
data_tests:
- unique
- not_null
- name: tags
description: Write the logical_name of the 'tags' table in 'dummy_project'.
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: name
description: Write a description of the 'tags.name' column.
data_type: string
data_tests:
- unique
- not_null
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: user_tags
description: Write the logical_name of the 'user_tags' table in 'dummy_project'.
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- user_id
- tag_id
columns:
- name: id
description: id
data_type: int64
data_tests:
- unique
- not_null
- name: user_id
description: user_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: many-to-one
- name: tag_id
description: tag_id
data_type: int64
data_tests:
- not_null
- relationships:
to: source('dummy', 'tags')
field: id
meta:
relationship_type: many-to-one
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
- name: users
description: dummy_project User
loaded_at_field: created_at
freshness:
warn_after:
count: 3
period: day
error_after:
count: 5
period: day
columns:
- name: id
description: ID
data_type: int64
data_tests:
- unique
- not_null
- name: created_at
description: User Created At
data_type: datetime
data_tests:
- not_null:
where: id != 1
- name: updated_at
description: User Updated At
data_type: datetime
data_tests:
- not_null
- name: company_id
description: company_id
data_type: int64
data_tests:
- relationships:
to: source('dummy', 'companies')
field: id
meta:
relationship_type: many-to-one
In the #{config_directory_path}/staging_model.sql.tt
file, write the SQL template for the staging
model you want to create.
You can use sql.source_name
, sql.table_name
, sql.select_column_names
, sql.primary_key_eql_id?
, and sql.rename_primary_id
within this file.
Example:
with
source as (
select * from {{ source('<%= sql.source_name %>', '<%= sql.table_name %>') }}
),
renamed as (
select
<%- sql.select_column_names.each_with_index do |(column_type, columns), column_type_index| -%>
-- <%= column_type %>
<%- columns.each_with_index do |column, column_index| -%>
<%- is_rename_primary_id = sql.primary_key_eql_id? && sql.primary_key?(column.name) -%>
<%- is_last_column = column_type_index == sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
<%= is_rename_primary_id ? "id as #{sql.rename_primary_id}" : column.name %><% unless is_last_column -%>,<%- end %>
<%- if column_type_index != sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
<%- end -%>
<%- end -%>
<%- end -%>
from source
)
select * from renamed
Different Pattern:
#standardSQL
with source as (
select
<%- if sql.primary_key_eql_id? -%>
id as <%= sql.rename_primary_id %>
, * except(id)
<%- else -%>
*
<%- end -%>
from {{ source('<%= sql.source_name %>', '<%= sql.table_name %>') }}
)
, final as (
select
<%- sql.select_column_names.each_with_index do |(column_type, columns), column_type_index| -%>
-- <%= column_type %>
<%- columns.each_with_index do |column, column_index| -%>
<% unless column_type_index == 0 && column_index == 0 -%>, <%- end %><%= (sql.primary_key_eql_id? && sql.primary_key?(column.name) ? sql.rename_primary_id : column.name) %>
<%- if column_type_index != sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
<%- end -%>
<%- end -%>
<%- end -%>
from source
)
select
*
from final
Generate staging model files for dbt:
$ bin/rails generate active_record:dbt:staging_model TABLE_NAME
Generate staging model files for dbt that reference the specified TABLE_NAME
.
File | Description |
---|---|
#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql |
Staging model file for dbt. |
#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.yml |
Staging model documentation file for dbt. |
Example:
$ bin/rails generate active_record:dbt:staging_model profiles
Generate #{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql
Example:
with
source as (
select * from {{ source('dummy', 'profiles') }}
),
renamed as (
select
-- ids
id as profile_id,
user_id,
-- strings
first_name,
last_name,
-- datetimes
created_at,
updated_at
from source
)
select * from renamed
Different Pattern:
#standardSQL
with source as (
select
id as profile_id
, * except(id)
from {{ source('dummy', 'profiles') }}
)
, final as (
select
-- ids
profile_id
, user_id
-- strings
, first_name
, last_name
-- datetimes
, created_at
, updated_at
from source
)
select
*
from final
Generate #{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.yml
Example:
Note
The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.
Warning
If you are using a version of dbt lower than v1.8, replace tests:
with data_tests:
in the generated file.
Add data tests to your DAG | dbt Developer Hub
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests in v1.8, the key was renamed from
tests:
todata_tests:
.
---
version: 2
models:
- name: stg_dummy__profiles
description: Write the logical_name of the 'profiles' table in 'dummy_project'.
columns:
- name: profile_id
description: profile_id
data_type: int64
data_tests:
- unique
- not_null
- relationships:
to: source('dummy', 'profiles')
field: id
meta:
relationship_type: one-to-one
- name: user_id
description: user_id
data_type: int64
data_tests:
- unique
- not_null
- relationships:
to: source('dummy', 'users')
field: id
meta:
relationship_type: one-to-one
- name: first_name
description: Write a description of the 'profiles.first_name' column.
data_type: string
data_tests:
- not_null
- name: last_name
description: Write a description of the 'profiles.last_name' column.
data_type: string
data_tests:
- not_null
- name: created_at
description: Created At
data_type: datetime
data_tests:
- not_null
- name: updated_at
description: Updated At
data_type: datetime
data_tests:
- not_null
In the #{config_directory_path}/source_config.yml
file, describe the properties you want to set for the seed enum.
You can configure defaults
in this file.
Set the default value for the description
of the seeds
enum.
In the description
of seed_descriptions.enum
, you can refer to the project name with {{ project_name }}
, the table logical table name with {{ table_logical_name }}
, and the column description with {{ column_description }}
.
Example:
defaults:
seed_descriptions:
enum:
description: "{{ project_name }} {{ table_logical_name }} {{ column_description }} enum"
If nothing is set, it defaults to the following:
defaults:
seed_descriptions:
enum:
description: "{{ project_name }} {{ table_logical_name }} {{ column_description }} enum"
Generate seed enum files for dbt:
$ bin/rails generate active_record:dbt:enum TABLE_NAME ENUM_COLUMN_NAME
Generate seed enum files for dbt from the specified TABLE_NAME
and ENUM_COLUMN_NAME
.
File | Description |
---|---|
#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.csv |
Seed enum file for dbt. |
#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.yml |
Seed enum documentation file for dbt. |
Example:
$ bin/rails generate active_record:dbt:enum posts status
Generate #{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.csv
Example:
status_before_type_of_cast,status_key,status_en,status_ja
0,draft,Draft,下書き
1,published,Published,公開
2,deleted,Deleted,削除
Generate #{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.yml
Example:
Note
The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.
Warning
If you are using a version of dbt lower than v1.8, replace tests:
with data_tests:
in the generated file.
Add data tests to your DAG | dbt Developer Hub
Data tests were historically called "tests" in dbt as the only form of testing available. With the introduction of unit tests in v1.8, the key was renamed from
tests:
todata_tests:
.
---
version: 2
seeds:
- name: seed_dummy__post_enum_statuses
description: dummy_project Post Status enum
config:
column_types:
status_before_type_of_cast: int64
status_key: string
status_en: string
status_ja: string
columns:
- name: status_before_type_of_cast
description: Status(before_type_of_cast)
data_type: int64
data_tests:
- unique
- not_null
- name: status_key
description: Status(key)
data_type: string
data_tests:
- unique
- not_null
- name: status_en
description: Status(en)
data_type: string
data_tests:
- unique
- not_null
- name: status_ja
description: Status(ja)
data_type: string
data_tests:
- unique
- not_null
Contribution directions go here.
The gem is available as open source under the terms of the MIT License.