Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

323 clickhouse loaderpy support schema overrides in yaml #967

Open
wants to merge 13 commits into
base: 2.6.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively.

## Roadmap

[2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401)
[2025 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401)

## Help

Expand Down
4 changes: 2 additions & 2 deletions doc/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ using [Debezium](debezium) into a common log format and then applies those
transactions to tables in ClickHouse.

There are two modes of operation.
* Lightweight Sink Connector - Combines extract and apply operations
* **Lightweight Sink Connector** - Combines extract and apply operations
into a single process.
* Kafka Sink Connector - Separates extract and apply operations into separate
* **Kafka Sink Connector** - Separates extract and apply operations into separate
processes, using a Kafka-compatible event stream for transport between them.

Debezium offers change data capture on a number of database types. The
Expand Down
2 changes: 1 addition & 1 deletion doc/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
| clickhouse.server.password | ClickHouse password |
| clickhouse.server.port | ClickHouse port, For TLS(use the correct port `8443` or `443` |
| snapshot.mode | "initial" -> Data that already exists in source database will be replicated. "schema_only" -> Replicate data that is added/modified after the connector is started.\<br/> MySQL: https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-property-snapshot-mode \ <br/>PostgreSQL: https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-snapshot-mode <br/> MongoDB: initial, never. https://debezium.io/documentation/reference/stable/connectors/mongodb.html |
| connector.class | MySQL -> "io.debezium.connector.mysql.MySqlConnector" <br/> PostgreSQL -> <br/> Mongo -> <br/> |
| connector.class | MySQL -> `io.debezium.connector.mysql.MySqlConnector` <br/> PostgreSQL -> `io.debezium.connector.postgresql.PostgresConnector <br/> Mongo -> `io.debezium.connector.mongodb.MongoDbConnector` <br/> |
| offset.storage.file.filename | Offset storage file(This stores the offsets of the source database) MySQL: mysql binlog file and position, gtid set. Make sure this file is durable and its not persisted in temp directories. |
| database.history.file.filename | Database History: Make sure this file is durable and its not persisted in temp directories. |
| schema.history.internal.file.filename | Schema History: Make sure this file is durable and its not persisted in temp directories. |
Expand Down
8 changes: 6 additions & 2 deletions doc/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,17 @@ sudo apt install clickhouse-client

Use Docker Compose to start containers.
Set the `CLICKHOUSE_SINK_CONNECTOR_LT_IMAGE` to the latest release from the Releases page.
or run `./getLatestTag.sh` which will set the environment variable
or run `./getLatestTag.sh` which will print the environment variable
that need to be exported.

```
cd sink-connector-lightweight/docker
./getLatestTag.sh
```

Example:
```
export CLICKHOUSE_SINK_CONNECTOR_LT_IMAGE=altinity/clickhouse-sink-connector:2.5.0-lt
```
```
docker compose -f docker-compose-mysql.yml up --renew-anon-volumes
```
Expand Down
2 changes: 1 addition & 1 deletion sink-connector-lightweight/docker/getLatestRelease.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ echo -e "\n"
echo "****************************************************************************************************"

# Display a message to the usage of the latest_version in color green
echo -e "\e[32m export CLICKHOUSE_SINK_CONNECTOR_LT_IMAGE=altinity/clickhouse-sink-connector:$latest_version-lt'\e[0m"
echo -e "\e[32m export CLICKHOUSE_SINK_CONNECTOR_LT_IMAGE=altinity/clickhouse-sink-connector:$latest_version-lt\e[0m"
echo "****************************************************************************************************"
echo -e "\n"

Expand Down
2 changes: 1 addition & 1 deletion sink-connector-lightweight/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.33</version>
<version>2.0</version>
</dependency>
<!-- VERSION COMPARE LIBRARY -->
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import com.altinity.clickhouse.debezium.embedded.common.PropertiesHelper;
import com.altinity.clickhouse.debezium.embedded.config.ConfigLoader;
import com.altinity.clickhouse.debezium.embedded.config.ConfigurationService;
import com.altinity.clickhouse.debezium.embedded.ddl.parser.DDLParserService;
import com.altinity.clickhouse.debezium.embedded.parser.DebeziumRecordParserService;
import com.altinity.clickhouse.sink.connector.ClickHouseSinkConnectorConfig;
import com.altinity.clickhouse.sink.connector.ClickHouseSinkConnectorConfigVariables;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package com.altinity.clickhouse.debezium.embedded.cdc;

import com.altinity.clickhouse.debezium.embedded.common.PropertiesHelper;
import com.altinity.clickhouse.debezium.embedded.config.ColumnOverrideParser;
import com.altinity.clickhouse.debezium.embedded.config.SinkConnectorLightWeightConfig;
import com.altinity.clickhouse.debezium.embedded.ddl.parser.DDLParserService;
import com.altinity.clickhouse.debezium.embedded.ddl.parser.MySQLDDLParserService;
import com.altinity.clickhouse.debezium.embedded.parser.DebeziumRecordParserService;
import com.altinity.clickhouse.sink.connector.ClickHouseSinkConnectorConfig;
Expand Down Expand Up @@ -871,6 +871,7 @@ public void setup(Properties props, DebeziumRecordParserService debeziumRecordPa
Metrics.initialize(props.getProperty(ClickHouseSinkConnectorConfigVariables.ENABLE_METRICS.toString()),
props.getProperty(ClickHouseSinkConnectorConfigVariables.METRICS_ENDPOINT_PORT.toString()));


// Start debezium event loop if its requested from REST API.
if(!config.getBoolean(ClickHouseSinkConnectorConfigVariables.SKIP_REPLICA_START.toString()) || forceStart) {
this.setupProcessingThread(config);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
package com.altinity.clickhouse.debezium.embedded.config;

import com.altinity.clickhouse.sink.connector.ClickHouseSinkConnectorConfigVariables;
import org.yaml.snakeyaml.Yaml;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Map;
import java.util.LinkedHashMap;
import com.clickhouse.data.ClickHouseDataType;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

public class ColumnOverrideParser {

private static final Logger log = LogManager.getLogger(ColumnOverrideParser.class);
public static Map<String, String> parseColumnOverrides(String yamlFile) throws FileNotFoundException {


Yaml yaml = new Yaml();
FileInputStream inputStream = new FileInputStream(yamlFile);

Map<String, Object> data = yaml.load(inputStream);

Object result = data.get(ClickHouseSinkConnectorConfigVariables.DEFAULT_COLUMN_DATATYPE_MAPPING.toString());


// if result is instance of LinkedHashMap , then cast it to LinkedHashMap
if (result instanceof LinkedHashMap) {
result = (LinkedHashMap<String, String>) result;
}
else {
return new HashMap<>();
}
// Iterate through the map and convert values to ClickHouse data types
Map<String, String> columnOverrides = new HashMap<>();
for (Map.Entry<String, String> entry : ((Map<String, String>) result).entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();

// Match to ClickHouseDataType
ClickHouseDataType clickHouseDataType = ClickHouseDataType.valueOf(value.toString());

// if clickhouseDataType is null, then log an error.
if(clickHouseDataType == null) {
log.error("*********** Invalid ClickHouse data type passed by user in yaml file for column override:******** " + value.toString());
}
columnOverrides.put(key, clickHouseDataType.toString());
}

return columnOverrides;
}



}
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
package com.altinity.clickhouse.debezium.embedded.config;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.yaml.snakeyaml.Yaml;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Properties;

public class ConfigLoader {

private static final Logger log = LogManager.getLogger(ConfigLoader.class);

public Properties load(String resourceFileName) {
InputStream fis = this.getClass()
.getClassLoader()
Expand All @@ -24,8 +29,27 @@ public Properties load(String resourceFileName) {
if(entry.getValue() instanceof Integer) {
props.setProperty(entry.getKey(), Integer.toString((Integer) entry.getValue()));
} else {
String value = (String) entry.getValue();
props.setProperty(entry.getKey(), value.replace("\"", ""));
Object entryValue = entry.getValue();
// Check if value is an instance of String.
if (entryValue instanceof String) {
entryValue = (String) entryValue;
}
else {
// Additional
log.info("entryValue is not a String");
if (entryValue instanceof LinkedHashMap) {
// iterate through the map and add the properties to the props.
for (Map.Entry<String, Object> mapEntry : ((LinkedHashMap<String, Object>) entryValue).entrySet()) {
// prfix the key with the entry key.
String key = entry.getKey() + "." + mapEntry.getKey();
props.setProperty(key, mapEntry.getValue().toString());
}
}

}
if (entryValue instanceof String) {
props.setProperty(entry.getKey(), ((String) entryValue).replace("\"", ""));
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
import io.debezium.jdbc.JdbcValueConverters;
import io.debezium.jdbc.TemporalPrecisionMode;
import io.debezium.relational.Column;
import io.debezium.relational.RelationalDatabaseConnectorConfig;
import io.debezium.relational.ddl.DataType;
import io.debezium.service.DefaultServiceRegistry;
import io.debezium.service.spi.ServiceRegistry;
Expand All @@ -25,7 +24,6 @@
import java.sql.Types;
import java.time.ZoneId;
import java.util.Arrays;
import java.util.Map;

/**
*
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
package com.altinity.clickhouse.debezium.embedded.config;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Assertions;

import java.util.Map;
import java.io.FileNotFoundException;

public class ColumnOverrideParserTest {

@Test
public void testParseColumnOverrides() {
String yamlFile = "src/test/resources/config.yml";
try {
Map<String, String> result = ColumnOverrideParser.parseColumnOverrides(yamlFile);
Assertions.assertEquals(result.size(), 7);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

import com.altinity.clickhouse.sink.connector.ClickHouseSinkConnectorConfigVariables;

import java.util.Properties;

public class ConfigLoaderTest {
Expand All @@ -16,4 +18,24 @@ public void testLoad() {

Assertions.assertNotNull(props);
}


@Test
@DisplayName("Unit test to validate loading of nested entries in config.yml")
public void testLoadNestedEntries() {
ConfigLoader loader = new ConfigLoader();
Properties props = loader.load("config.yml");

int defaultColumnDataTypeMappingCount = 0;
// iterate through the properties and check if the nested entries are loaded correctly
// the nested entries have the prefix ClickHouseSinkConnectorConfigVariables.DEFAULT_COLUMN_DATATYPE_MAPPING
for (Object key : props.keySet()) {
if (key.toString().startsWith(ClickHouseSinkConnectorConfigVariables.DEFAULT_COLUMN_DATATYPE_MAPPING.toString())) {
Assertions.assertNotNull(props.getProperty(key.toString()));
defaultColumnDataTypeMappingCount++;
}
}

Assertions.assertEquals(defaultColumnDataTypeMappingCount, 7);
}
}
18 changes: 17 additions & 1 deletion sink-connector-lightweight/src/test/resources/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,20 @@ schema.history.internal.jdbc.schema.history.table.name: "altinity_sink_connector
enable.snapshot.ddl: "true"
auto.create.tables: "true"
metrics.enable: "false"
database.connectionTimeZone: "America/Chicago"
database.connectionTimeZone: "America/Chicago"
default_column_datatype_mapping:
# we are no longer turning Date/DateTime/Timestamp as a String
transaction_id: String
exchange_transaction_id: String
unique_transaction_id: String
account_ref: String
otm_identifier: String
tag_reserved_4: String
initiator: String
databases:
dbo:
tables:
tr_live:
partition_by: tr_date_id
primary_key: gmt_time
settings: allow_nullable_key=1
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

public enum ClickHouseSinkConnectorConfigVariables {

DEFAULT_COLUMN_DATATYPE_MAPPING("default_column_datatype_mapping"),
IGNORE_DELETE("ignore_delete"),
THREAD_POOL_SIZE("thread.pool.size"),
BUFFER_COUNT("buffer.count"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
import org.apache.kafka.connect.data.Struct;

import org.locationtech.jts.geom.Coordinate;
import org.locationtech.jts.geom.LinearRing;
import org.locationtech.jts.geom.Polygon;
import org.locationtech.jts.io.ParseException;
import org.locationtech.jts.io.WKBReader;
Expand Down
Loading