Most of the provisioner configurations are handled with the Typesafe Config library. You can find the default settings in the reference.conf
of each module. Customize them and use the config.file
system property or the other options provided by Typesafe Config according to your needs. The provided docker image expects the config file mounted at path /config/application.conf
.
Configuration | Description | Default |
---|---|---|
dataMeshProvisioner.networking.httpServer.interface |
Interface to bind the specific provisioner API layer | 0.0.0.0 |
dataMeshProvisioner.networking.httpServer.port |
Port to bind the specific provisioner API layer | 8093 |
Enable async handling on the Terraform Provisioner
Configuration | Description | Default |
---|---|---|
dataMeshProvisioner.async.provision.enabled |
Enables the async provision/unprovision tasks. When enabled, this operations will return 202 with a token to be used on polling | false |
dataMeshProvisioner.async.type |
Defines the type of repository to be used to store the status of the asynchronous tasks. Allowed values: [cache ] |
cache |
dataMeshProvisioner.async.pool.size |
Size of the thread pool used to execute asynchronous tasks | 16 |
Logging is handled with Logback. Customize it and pass it using the logback.configurationFile
system property. The provided docker image expects the logging config file mounted at path /config/logback.xml
.
If you are deploying this provisioner with helm, there are two places for these config files:
- values.yaml : in the values.yaml specific to your environment, you have the possibility to provide the configurations via
configOverride
andlockbackOverride
keys. This is the preferred approach. - helm/files: contains the default values, they are not supposed to be changed by the user
Define your infrastructure using Terraform configuration files. These files typically have the extension .tf and are written in HashiCorp Configuration Language (HCL) or JSON format. In these files, you specify the desired state of your cloud resources, such as virtual machines, networks, databases, etc.
With the multi-module feature, it is possible to define multiple isolated configuration files that can be used to handle provisioning/unprovisioning of different resources:
datameshProvisioner {
...
terraform {
moduleId1 {
repositoryPath: "path-for-moduleId1"
descriptorToVariablesMapping: {}
}
moduleId2 {
repositoryPath: "path-for-moduleId2"
descriptorToVariablesMapping: {}
}
}
}
where moduleId1
,moduleId2
are the useCaseTemplateId
of the component to manage, declared in the DP Descriptor.
The configuration key datameshProvisioner.terraform.<moduleId>.repositoryPath
must point to a folder where the complete terraform configuration is present.
When creating terraform resources, a way to create terraform variables from the dataproduct descriptor is needed.
The descriptorToVariablesMapping
(defined in datameshProvisioner.terraform.<moduleId>
) configuration is meant for this purpose: it allows to specify a list of mappings, where each one maps a terraform key to a dataproduct descriptor value.
The dataproduct descriptor value is accessed via JsonPath, which allows full flexibility in traversing the descriptor.
Example
Given the following descriptor
dataProduct:
name: xys-12345
somearray:
- name: abc
bar: xyz
and the following configuration
descriptorToVariablesMapping = {
account_name = "$.dataProduct.name"
foo = "$.dataProduct.somearray[1].bar"
}
The following vars would be produced
-var account_name='xys-12345' -var foo='xyz'
In the previous example, both mappings were addressing a YAML leaf of the descriptor. By pointing to a YAML node instead, the full YAML object is extracted as a terraform variable.
For example, given the following descriptor
dataProduct:
name: xys-12345
specific:
complex:
foo: bar
fuz: buz
list:
- buzz
- lightyear
and the following configuration
descriptorToVariablesMapping = {
complex = "$.dataProduct.specific.complex"
list = "$.dataProduct.specific.list"
}
The following vars would be produced
-var complex='{"foo":"bar", "fuz":"buz"}' -var list='["buzz", "lightyear"]'
The incoming dataproduct descriptor contains all the components, not only the one that is going to be provisioned. In the descriptor, the correct component to provision is specified in the componentIdToProvision
field.
In order to address the fields of that specific component, it is possible to use the placeholder {{componentIdToProvision}}
that will be replaced with the correct id.
Example
In this example is shown how the componentIdToProvision
replacement allows to address the specific.resourceGroup
field of the component that is going to be provisioned.
Given the following descriptor:
dataProduct:
name: xys-12345
components:
- id: comp1
specific:
foo1: bar1
- id: comp2
specific:
foo2: bar2
resourceGroup: zoo
componentIdToProvision: comp2
and the following configuration:
descriptorToVariablesMapping = {
resource_group_name = "$.dataProduct.components[?(@.id == '{{componentIdToProvision}}')].specific.resourceGroup"
}
The following var would be produced:
-var resource_group_name='zoo'
State management in Terraform is a critical aspect of its functionality, as it helps Terraform keep track of the current state of your infrastructure and enables it to make informed decisions about what changes need to be applied to achieve the desired infrastructure configuration. Terraform uses a state file to store information about the resources it manages, such as their current state, resource dependencies, and metadata.
Each configured module must handle its own state management, making sure to appropriately segregate DP components with a reasonable state key
to avoid collisions and use a fault-tolerant and lockable state store
(remote backends, such as Amazon S3, Azure Blob Storage, or HashiCorp Consul, are a good fit as they provide also better collaboration and security).
In order to dynamically set the state key
, refer to the Backend configurations chapter.
It is important to notice that the backend configurations will be shared among the main
and acl
module. Since those two module must use separate state file, they cannot share the same state key.
For this reason, we preventively append the ".acl" suffix to the rendered state key.
In order to make the backend configuration dynamic, the backendConfigs
block allow you to set backend configurations.
This block requires two object:
configs
: is a map that allows specifying key/value pairs, where keys will be the backendConfigs keys, while values will be processed as JsonPath, exactly as in thedescriptorToVariablesMapping
block.stateKey
: is the string that identifies, in theconfigs
map, the key corresponding to the state key. Since the stateKey can have different names depending on the provider (common names are "key" and "prefix") we cannot guess it. Since we need to apply further processing on it, we must know which one is the key.
Given the following descriptor:
dataProduct:
name: xys-12345
components:
- id: comp1
specific:
foo1: bar1
- id: comp2
specific:
foo2: bar2
resourceGroup: zoo
componentIdToProvision: comp2
and the following configuration snippet
backendConfigs = {
stateKey = "key"
configs = {
key = "$.dataProduct.name"
foo = "$.dataProduct.components[?(@.id == '{{componentIdToProvision}}')].specific.resourceGroup"
}
}
The following command would be produced for the main module:
terraform init [...] -backend-config="key=xys-12345" -backend-config="foo=zoo"
The following command would be produced for the acl module:
terraform init [...] -backend-config="key=xys-12345.acl" -backend-config="foo=zoo"
The updateAcl
endpoint is invoked to request access to the resources provisioned by the provision
endpoint.
If we imagine the
provisioning
endpoint to create an S3 bucket, you will want to use theupdateAcl
to grant some users the priledges to access the bucket.
Given that this SP is cloud and technology-agnostic, the definition of the resources that will grant access is demanded to the module developer.
Within the terraform module, an acl
folder can contain terraform files. This submodule is applied upon the invocation of the updateAcl
endpoint.
Going back to the bucket example, this submodule would contain the needed IAM resources, e.g. IAM policies, roles etc.
You can view a full example here.
When defining the acl resources, there will be variables that you will know only at runtime, i.e. upon invocation of the updateAcl
endpoint.
For this reason, when terraform is run, some variables will be injected.
You need to put the definition of these variables in a .tf
file.
Example
variable "principals" {}
variable "output1" {}
variable "output2" {}
variable "outputN" {}
The list of variables you need to expect follows.
Principals
The list of principals to use.
key: principals
value: list,of,principals
type: string
Example
In the bucket example, principals can be ARNs of users and/or roles. The resulting terraform command would be
terraform apply [...] -var principals=arn:aws:iam::12345:user/[email protected],arn:aws:iam::12345:role/queen
Terraform outputs
The outputs generated by the provisioning
endpoint. If you specified some outputs in the main module, they will be injected as variables during the apply of the acl module.
key: your-output-name
value: your-output-value
type: string
Example
If your output.tf contains the following:
output "storage_account_id" { value = azurerm_storage_account.st_account.id }
The resulting terraform command would be
terraform apply [...] -var storage_account_id=foo.bar
When the updateAcl
is queried, it requires a list of identities as parameter. This is the list of users/groups/roles that need the access.
These identities have a meaning in the Witboost context, but might not have sense in the context of your resources.
For this reason, the mapping is demanded to the principalMappingPlugin
, which you can define with your own logic and simply plug it.
Example
The plugin can map the Witboost identity into an AWS ARN
`user:[email protected]` -> `arn:aws:iam::12345:user/[email protected]`
Here some samples we provide.
The provision
endpoint takes care about creating the requested resources.
In this phase, the Data Product Owner
and the Development Group
are extracted from the Data Product Descriptor and a mapping operation is executed, based on the configured plugin. If the mapping succeeds, the mapped principals are injected as a Terraform variable named ownerPrincipals
(the entries are separated by ,
). For this reason, in every module you create, the following variable definition is needed:
variable "ownerPrincipals" {
type = string
description = "The identities that own the Data Product"
default = ""
}
If you do not require this information, you can simply not use it in your terraform code.
Specific provisioners provide the possibility of returning results to witboost within the Info
object. These results can be either private or public.
The Info
object is a field that can contain different public and private details. Only details in the publicInfo object will be displayed to consumers in the Marketplace, while all the values put in the privateInfo will be stored in the deployed descriptor, but will not be shown in the Marketplace UI.
There is no limit to how many values can be set in those fields, but only the ones compliant with the following specification will be rendered in the "Technical Information" card of the Marketplace module. Invalid data is ignored and if there is no valid data to be shown, the "Technical Information" card will not be displayed.
By default the terraform provisioners will inject all non-sensitive outputs as private info.
If, on the other side, you want to return public info, you need to create an output called public_info
, which must honor the schema requested by Witboost. This output will not be present in the private_info
.
In the following example, a public_info
that contains two elements is returned.
output "public_info" {
value = {
saLink = {
type = "link"
label = "Endpoint link"
value = "Endpoint link"
href = "adls://foo.bar"
},
saName = {
type = "string"
label = "Storage Account Name"
value = "Foo"
}
}
}
You can refer to Witboost documentation for a better understanding of the requested public_info schema.
The unprovision
endpoint takes care about destroying the provisioned resources. The endpoint honours the removeData
parameter, it will therefore skip the destroy operation for components of type storage
when false
.
The ReverseProvisioning
functionality helps the user to import existing resources.
The reverse provisioning logic itself
- materializes an
import.tf
file in the Terraform context, containing a list of import blocks - performs a
plan
and returns the output in a human-readable format. The status of the operation also depends on theskipSafetyChecks
flag - if the operation is successfuly completed, outputs are returned to make sure the imports will be applied in the next deploy operation
Attention!
It is important to understand that the reverse provisioning only plans the import, allowing the user to iterate over the result in order to fine tune the import. But the actual import (i.e. the terraform apply
) will only happen during the next deploy.
The variables used during this operation are created with same logic of the mappings.
Attention! As of now, the reverse provisioning endpoint receives the catalog-info as input, and not the data product descriptor. For this reason, mappings that reference variables outside the scope of the component will fail to parse.
The following mapping will fail to parse, because of the environment
variable. The resource_group
will be correctly extracted from the catalog-info.
descriptorToVariablesMapping = {
resource_group = "$.dataProduct.components[?(@.id == '{{componentIdToProvision}}')].specific.resourceGroup"
environment = "$.dataProduct.environment"
}
The reverse provisioning endpoint receives as input the catalog-info of the component, and a params
object, which contains the skipSafetyChecks
flag and all the necessary information to build the import.tf
file.
This is an example of the import.tf
file.
import {
to = azurerm_storage_account.st_account
id = "i-abcd1234"
}
import {
to = azurerm_storage_data_lake_gen2_filesystem.filesystem["default"]
id = "i-abcdABC"
}
// ...
As the import in terraform is a sensitive operation, we must be very careful in deciding whether the resulting plan is a success or a failure.
When importing an existing resource, the variable specified by the terraform modules are checked against the properties of the existing resources. If the variables don't match, terraform will propose to either update the properties in place or re-create the whole resource. Re-creating a resource means destroying the resource in the first place, which can lead to data loss. Of course this highly depends on the resources and on the specific use case, for this reason we must provide a safety valve to control this behaviour.
By default, the skipSafetyChecks
flag is set to false
, ensuring that:
- if the plan proposes N (with N > 0) destroys, the operation is marked as fail
- if the plan proposes 0 imports, the operation is marked as fail
For consistency, this flag is also honoured in the validation logic.
A successful reverse provisioning operation returns:
- the
import blocks
, to be sure that the next test/deploy operation will embed the imports - the
skipSafetyChecks
flag