Skip to content

Commit

Permalink
Merge pull request #10 from CDOT-CV/jpo-deduplicator-removal
Browse files Browse the repository at this point in the history
Jpo deduplicator removal
  • Loading branch information
Michael7371 authored Jan 24, 2025
2 parents 2596542 + 551e09f commit 7183174
Show file tree
Hide file tree
Showing 59 changed files with 11 additions and 4,143 deletions.
28 changes: 0 additions & 28 deletions .github/workflows/ci.yml

This file was deleted.

22 changes: 1 addition & 21 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,7 @@ on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
jpo-deduplicator:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build
uses: docker/build-push-action@v3
with:
context: jpo-deduplicator
build-args: |
MAVEN_GITHUB_TOKEN_NAME=${{ vars.MAVEN_GITHUB_TOKEN_NAME }}
MAVEN_GITHUB_TOKEN=${{ secrets.MAVEN_GITHUB_TOKEN }}
MAVEN_GITHUB_ORG=${{ github.repository_owner }}
secrets: |
MAVEN_GITHUB_TOKEN: ${{ secrets.MAVEN_GITHUB_TOKEN }}
cache-from: type=gha
cache-to: type=gha,mode=max

jobs:
jpo-jikkou:
runs-on: ubuntu-latest
steps:
Expand Down
34 changes: 1 addition & 33 deletions .github/workflows/dockerhub.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,39 +7,7 @@ on:
- "master"
- "release/*"

jobs:
dockerhub-jpo-deduplicator:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Replace Docker tag
id: set_tag
run: echo "TAG=$(echo ${GITHUB_REF##*/} | sed 's/\//-/g')" >> $GITHUB_ENV

- name: Build
uses: docker/build-push-action@v3
with:
context: jpo-deduplicator
push: true
tags: usdotjpoode/jpo-deduplicator:${{ env.TAG }}
build-args: |
MAVEN_GITHUB_TOKEN_NAME=${{ vars.MAVEN_GITHUB_TOKEN_NAME }}
MAVEN_GITHUB_TOKEN=${{ secrets.MAVEN_GITHUB_TOKEN }}
MAVEN_GITHUB_ORG=${{ github.repository_owner }}
secrets: |
MAVEN_GITHUB_TOKEN: ${{ secrets.MAVEN_GITHUB_TOKEN }}
cache-from: type=gha
cache-to: type=gha,mode=max

jobs:
dockerhub-jpo-jikkou:
runs-on: ubuntu-latest
steps:
Expand Down
68 changes: 0 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@ The JPO ITS utilities repository serves as a central location for deploying open
- [Configuration](#configuration)
- [Configure Kafka Connector Creation](#configure-kafka-connector-creation)
- [Quick Run](#quick-run-2)
- [5. jpo-deduplicator](#5-jpo-deduplicator)
- [Deduplication Config](#deduplication-config)
- [Generate a Github Token](#generate-a-github-token)
- [Quick Run](#quick-run-3)
- [Security Notice](#security-notice)


Expand Down Expand Up @@ -190,70 +186,6 @@ The following environment variables can be used to configure Kafka Connectors:
3. Click `OdeBsmJson`, and now you should see your message!
8. Feel free to test this with other topics or by producing to these topics using the [ODE](https://github.com/usdot-jpo-ode/jpo-ode)


<a name="deduplicator"></a>

## 5. jpo-deduplicator
The JPO-Deduplicator is a Kafka Java spring-boot application designed to reduce the number of messages stored and processed in the ODE system. This is done by reading in messages from an input topic (such as topic.ProcessedMap) and outputting a subset of those messages on a related output topic (topic.DeduplicatedProcessedMap). Functionally, this is done by removing deduplicate messages from the input topic and only passing on unique messages. In addition, each topic will pass on at least 1 message per hour even if the message is a duplicate. This behavior helps ensure messages are still flowing through the system. The following topics currently support deduplication.

- topic.ProcessedMap -> topic.DeduplicatedProcessedMap
- topic.ProcessedMapWKT -> topic.DeduplicatedProcessedMapWKT
- topic.OdeMapJson -> topic.DeduplicatedOdeMapJson
- topic.OdeTimJson -> topic.DeduplicatedOdeTimJson
- topic.OdeRawEncodedTIMJson -> topic.DeduplicatedOdeRawEncodedTIMJson
- topic.OdeBsmJson -> topic.DeduplicatedOdeBsmJson
- topic.ProcessedSpat -> topic.DeduplicatedProcessedSpat

### Deduplication Config

When running the jpo-deduplication as a submodule in jpo-utils, the deduplicator will automatically configure an algorithm as enabled or disabled depending on if the corresponding subcomponent is also active. For example if the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER environment variable is set to true, the deduplicator will start performing deduplication for ProcessedMap, ProcessedMapWKT, and ProcessedSpat data. If the KAFKA_TOPIC_CREATE_GEOJSONCONVERTER is set to false, the deduplicator will disable deduplication for those same topics. To manually configure deduplication for a topic, the following environment variables can also be used. If no value is passed for a given environment variable, the corresponding deduplication algorithm will default to enabled.

| Environment Variable | Description |
|---|---|
| `ENABLE_PROCESSED_MAP_DEDUPLICATION` | `true` / `false` - Enable ProcessedMap message Deduplication |
| `ENABLE_PROCESSED_MAP_WKT_DEDUPLICATION` | `true` / `false` - Enable ProcessedMap WKT message Deduplication |
| `ENABLE_ODE_MAP_DEDUPLICATION` | `true` / `false` - Enable ODE MAP message Deduplication |
| `ENABLE_ODE_TIM_DEDUPLICATION` | `true` / `false` - Enable ODE TIM message Deduplication |
| `ENABLE_ODE_RAW_ENCODED_TIM_DEDUPLICATION` | `true` / `false` - Enable ODE Raw Encoded TIM Deduplication |
| `ENABLE_PROCESSED_SPAT_DEDUPLICATION` | `true` / `false` - Enable ProcessedSpat Deduplication |
| `ENABLE_ODE_BSM_DEDUPLICATION` | `true` / `false` - Enable ODE BSM Deduplication |

### Generate a Github Token

A GitHub token is required to pull artifacts from GitHub repositories. This is required to obtain the jpo-deduplicator jars and must be done before attempting to build this repository.

1. Log into GitHub.
2. Navigate to Settings -> Developer settings -> Personal access tokens.
3. Click "New personal access token (classic)".
1. As of now, GitHub does not support `Fine-grained tokens` for obtaining packages.
4. Provide a name and expiration for the token.
5. Select the `read:packages` scope.
6. Click "Generate token" and copy the token.
7. Copy the token name and token value into your `.env` file.

For local development the following steps are also required
8. Create a copy of [settings.xml](jpo-deduplicator/jpo-deduplicator/settings.xml) and save it to `~/.m2/settings.xml`
9. Update the variables in your `~/.m2/settings.xml` with the token value and target jpo-ode organization.

### Quick Run
1. Create a copy of `sample.env` and rename it to `.env`.
2. Update the variable `MAVEN_GITHUB_TOKEN` to a github token used for downloading jar file dependencies. For full instructions on how to generate a token please see here:
3. Set the password for `MONGO_ADMIN_DB_PASS` and `MONGO_READ_WRITE_PASS` environmental variables to a secure password.
4. Set the `COMPOSE_PROFILES` variable to: `kafka,kafka_ui,kafka_setup, jpo-deduplicator`
5. Navigate back to the root directory and run the following command: `docker compose up -d`
6. Produce a sample message to one of the sink topics by using `kafka_ui` by:
1. Go to `localhost:8001`
2. Click local -> Topics
3. Select `topic.OdeMapJson`
4. Select `Produce Message`
5. Copy in sample JSON for a Map Message
6. Click `Produce Message` multiple times
7. View the synced message in `kafka_ui` by:
1. Go to `localhost:8001`
2. Click local -> Topics
3. Select `topic.DeduplicatedOdeMapJson`
4. You should now see only one copy of the map message sent.

[Back to top](#toc)

## Security Notice
Expand Down
41 changes: 0 additions & 41 deletions docker-compose-deduplicator.yml

This file was deleted.

3 changes: 1 addition & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
include:
- docker-compose-connect.yml
- docker-compose-mongo.yml
- docker-compose-kafka.yml
- docker-compose-deduplicator.yml
- docker-compose-kafka.yml
3 changes: 2 additions & 1 deletion docs/Release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,5 @@ USDOT PR 23: Adding missing ode topics
USDOT PR 24: Index updates
CDOT PR 6: Adding MEC Deposit Resources
USDOT PR 25: Updating version for kafka ui to latest release
CDOT PR 7: Tim compatibility and CI updates
CDOT PR 7: Tim compatibility and CI updates
CDOT PR 8: Jpo deduplicator removal
1 change: 0 additions & 1 deletion jpo-deduplicator/.dockerignore

This file was deleted.

48 changes: 0 additions & 48 deletions jpo-deduplicator/Dockerfile

This file was deleted.

Loading

0 comments on commit 7183174

Please sign in to comment.