All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Upgrade Apiary extensions to 8.0.2 (was 7.3.9). (Glue Listener fix)
- Upgrade yum repos from EMR-5.36.2 (latest EMR 5 version)
- Upgrade HMS to 2.3.9 (was 2.3.7)
- Added
datanucleus.connectionPoolingType
to hive-site.xml, defaults:BoneCP
- Added
DATANUCLEUS_CONNECTION_POOLING_TYPE
to support changing the database connection pooling. Valid options areBoneCP
,DBCP
,DBCP2
,C3P0
,HikariCP
. - Added
DATANUCLEUS_CONNECTION_POOL_MAX_POOLSIZE
- Maximum pool size for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_MIN_POOLSIZE
- Minimum pool size for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_INITIAL_POOLSIZE
- Initial pool size for the connection pool (C3P0 only). - Added
DATANUCLEUS_CONNECTION_POOL_MAX_IDLE
- Maximum idle connections for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_MIN_IDLE
- Minimum idle connections for the connection pool. - Added
DATANUCLEUS_CONNECTION_POOL_MIN_ACTIVE
- Maximum active connections for the connection pool (DBCP/DBCP2 only). - Added
DATANUCLEUS_CONNECTION_POOL_MAX_WAIT
- Maximum wait time for the connection pool (DBCP/DBCP2 only). - Added
DATANUCLEUS_CONNECTION_POOL_VALIDATION_TIMEOUT
- Validation timeout for the connection pool (DBCP/DBCP2/HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_LEAK_DETECTION_THRESHOLD
- Leak detection threshold for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_LEAK_MAX_LIFETIME
- Maximum lifetime for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_AUTO_COMMIT
- Auto commit for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_IDLE_TIMEOUT
- Idle timeout for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_CONNECTION_WAIT_TIMEOUT
- Connection wait timeout for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_READ_ONLY
- Read only mode for the connection pool (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_NAME
- Connection pool name (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_CATALOG
- Connection pool catalog (HikariCP only). - Added
DATANUCLEUS_CONNECTION_POOL_REGISTER_MBEANS
- Register MBeans for the connection pool (HikariCP only).
- Added
MYSQL_DRIVER_JAR
to add the driver connector JAR to the system classpath. By default it is now using/usr/share/java/mysql-connector-java.jar
.
- Switch from mariadb driver to default mysql driver. (Override settings to keep using mariadb driver).
- Added
MYSQL_CONNECTION_DRIVER_NAME
to support use different connection driver, defaults:com.mysql.jdbc.Driver
. - Added
MYSQL_TYPE
to support use different type of MySQL, defaults:mysql
. - Added
mysql-connector-java
to support to use drivercom.mysql.jdbc.Driver
.
- Upgraded
APIARY_EXTENSIONS_VERSION
to7.3.9
(was7.3.8
). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSION
to7.3.9
(was7.3.8
).
- Enables JMX (Java Management Extensions) on Hadoop clients, allowing for remote monitoring and management of JVM-related metrics
- CloudWatch metrics in favour of JMX Prometheus Exporter.
- Enable prometheus jmx agent when running on ECS by exporting
EXPORTER_OPTS
- Added snapshot.yaml for pushing docker image from feature branch.
- Safeguard AWS account id call to prevent incorrect DB locations.
- Upgrade Maven version from
3.9.3
to3.9.4
as the older version no longer supported.(https://dlcdn.apache.org/maven/maven-3/)
- issue-118 Added variable
ENABLE_HIVE_LOCK_HOUSE_KEEPER
to support hive lock house keeper. See more details here: apache/iceberg#2301
- Added variable
MAX_REQUEST_SIZE
to optionally increase the request size when sending records to Kafka. - Upgraded
APIARY_EXTENSIONS_VERSION
to7.3.8
(was7.3.7
). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSION
to7.3.8
(was7.3.7
).
- Added variable
KAFKA_COMPRESSION_TYPE
to optionally add compression type when sending Metastore events to Kafka through apiary-metastore-listener library. - Upgraded
APIARY_EXTENSIONS_VERSION
to7.3.7
(was7.3.4
). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSION
to7.3.7
(was7.3.6
).
- Added variable
LIMIT_PARTITION_REQUEST_NUMBER
to protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value "-1" means no limit. The limit on partitions does not affect metadata-only queries.
- Upgraded github actions ubuntu runner to
22.04
(was18.04
). - Set
amazonlinux
version to2
(waslatest
). - Upgraded mvn version to
3.9.3
(was3.6.3
).
- Variable
MYSQL_SECRET_USERNAME_KEY
for pulling aws credentials where the key is set to something other thanusername
. Defaults tousername
.
- Upgraded
APIARY_GLUESYNC_LISTENER_VERSION
to7.3.6
(was7.3.5
). It fixes a bug in sortOrders when syncing up Iceberg tables.
- Upgraded
APIARY_GLUESYNC_LISTENER_VERSION
to7.3.5
(was7.3.4
). It fixes a bug in parsing the table parameter -lastAccessTime
when syncing up Iceberg tables.
- Upgraded
APIARY_EXTENSIONS_VERSION
to7.3.4
(was6.0.1
). - Upgraded
APIARY_GLUESYNC_LISTENER_VERSION
to7.3.4
(was7.3.0
).
- LDAP Credentials now can be loaded directly using
LDAP_USERNAME
andLDAP_PASSWORD
, this is useful to load them from Vault.
- Upgrade
apiary-gluesync-listener
version to7.3.0
(was4.2.0
).
- Add ability to configure size of HMS MySQL connection pool, and configure stats computation on table/partition creation.
- Upgrade EMR repository to version
5.31.0
(was5.30.2
) soAWS SDK for Java
library is upgraded to1.11.852
that enables AWS web identity token file file authentication using hadoop and public constructors.
- Enable authentication via
WebIdentityTokenCredentialsProvider
.
- Upgrade EMR repository to version
5.30.2
(was5.24.0
) soAWS SDK for Java
library is upgraded to1.11.759
and in that way support authentication using IAM role via an OIDC web identity token file (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html).
- Modified log4j2 security script to reduce container startup time.
- Added script to find and remove vulnerable log4j2 classes in order to mitigate security issue CVE-2021-44228.
- Allow override of
hive.metastore.disallow.incompatible.col.type.changes=true
property.
- Remove Atlas MetaStore listener in favor of internal processes that subscribe to the Kafka HMS event listener and push changes to Ranger.
Note: This release is a BREAKING change that removes all support for the Apache Atlas HMS listener.
- Enabled ranger audit log summarization.
- Add
allow-grant.sh
to main container. - Add
db-iam-user.sh
to main container.
- Removed
initContainer
in favor of a single image.
- Issue-165 Add init container dockerfile for supporting air-gapped environments.
Create Hive database apiary_system
on startup. Data for Ranger access logs goes to bucket <prefix>-apiary-system
in Parquet format.
This is pre-work to prepare for Ranger access-log Hive tables in a future version of Apiary.
- Enable caller to set min and max size of the Hive metastore thread pool. If not set, defaults to 200/1000 (Hive defaults).
- If S3 access logs are enabled in
apiary-data-lake
, create Hive databases3_logs_hive
on startup. Raw logs go to bucket<prefix>-s3-logs
and Hive Parquet data to bucket<prefix>-s3-logs-hive
. This is pre-work to prepare for S3 access-log Hive tables in a future version of Apiary.
- Updated
apiary-metastore-listener
version to6.0.1
(was6.0.0
).
- If S3 Inventory is enabled in
apiary-data-lake
, create Hives3_inventory
database on startup. - Add script
/s3_inventory_repair.sh
which can be used as the entrypoint of this Docker image to create and repair S3 inventory tables in the inventory database (if S3 inventory is enabled). The intent is to run the image this way on a scheduled basis in Kubernetes after AWS creates new inventory partition files in S3 each day.
- Updated
apiary-metastore-listener
andkafka-metastore-listener
versions to6.0.0
(was5.0.2
).
- Enable Prometheus exporter when running on Kubernetes instead of sending metrics to CloudWatch.
- Added an optional Apiary metastore listener which can be used to send Hive metadata events to a Kafka topic.
- Updated
apiary-metastore-listener
version to5.0.2
(was4.2.0
).
- Set EKS hostname to ECS_TASK_ID required for enabling metastore metrics.
- Update using https for maven central repository as it no longer supports insecure communication over plain HTTP.
- Fix Ranger Solr auditing by upgrading
apiary-extensions
version to5.0.1
(was5.0.0
)
- Atlas cluster name is set to Apiary
ATLAS_CLUSTER_NAME
env variable when using Atlas plugin. If not set, will default toINSTANCE_NAME
var.
- Update Ranger version from to
2.0.0
(was1.1.0
). - Update Ranger metastore plugin to
5.0.0
(was4.2.0
). - Support Ranger audit-only mode for read-only HMS endpoint when audit destination is SOLR.
- Add Atlas hive-bridge metastore listener, to send metadata events to Kafka.
- set DefaultAWSCredentialsProviderChain as default hadoop-aws credential provider.
- Updated
emr-apps.repo
to5.24.0
(was5.15.0
). - Updated
emr-platform.repo
to1.17.0
(was1.6.0
).
- Upgrade Hive to
2.3.4
(was2.3.3
) in order to fix https://issues.apache.org/jira/browse/HIVE-18767 - see #59 (Hive version is controlled by the version ofemr-apps.repo
).
- If Ranger is configured on the metastore, the read-only instance of
the metastore will be configured for audit-only by using
ApiaryRangerAuthAllAccessPolicyProvider
in apiary-metastore-ranger-plugin
- ReadOnlyAuth Pre Event Listener to manage Hive database whitelist in read-only metastores apiary-metastore-extensions.
- Support for
_
inHIVE_DB_NAMES
variable. Fixes [#5] (ExpediaGroup/apiary#5).
- Updated apiary-metastore-listener to 4.0.0 (was 1.1.0).
- Updated apiary-gluesync-listener to 4.0.0 (was 1.1.0).
- Updated apiary-ranger-plugin to 4.0.0 (was 1.1.0).
- Updated apiary-metastore-metrics to 4.0.0 (was 1.1.0).
- Updated apiary-metastore-auth to 4.0.0 (was 1.1.0).
- Auto configure Hive metastore heapsize when running on ECS.
- Replace EMRFS with hadoop-aws S3A libraries.
- Option to send metastore metrics to CloudWatch - see #4.
- Refactor Environment variable names.
- Migrate secrets from Hashicorp Vault to AWS SecretsManager.
- Update startup script to configure Log4j, to fix sending Hive Metastore logs to CloudWatch.
- Deploy RangerAuth Pre Event Listener from apiary-metastore-extensions.
- Deploy GlueSync Listener from apiary-metastore-extensions.
- Deploy SNS Listener from apiary-metastore-extensions.
- Additional check to support external MySQL database for Hive Metastore, required to implement #48.
- Fix to update cacerts for Java.
- Fix Hive Metastore logging.