Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX #19386 & #19388: Fixing Data Insights index mapping #19423

Merged
merged 7 commits into from
Jan 23, 2025

Conversation

IceS2
Copy link
Contributor

@IceS2 IceS2 commented Jan 17, 2025

This PR aims to solve three things:

  1. First and foremost, it aims to make the Data Insights indexes more reliable, by using more static mapping, in order to rely less on the dynamic mapping.
  2. Meanwhile it also exposes a command for OpenMetadataOps to recreate and backfill Data Insights indexes.
  3. Finally, it aims to reduce the amount of data that we are indexing, by reducing the amount of attributes in the data insights documents.

Fix #19386
Fix #19388

Type of change:

  • Improvement

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

@IceS2 IceS2 changed the title [WIP] Fixing Data Insights index mapping FIX #19386 & #19388: Fixing Data Insights index mapping Jan 21, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically if we want to add new fields/metrics (or support for a new entity) we'll need to add those to this file, is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support a new entity actually we have a list on the Java Application itself (It'd be a good idea to remove it and basically support based on the list)

But yes, in order to add any new fields, we'd need to add it there. That way we'd control the amount of data indexed and also we're fetching the mapping that we define in our not-data-insights indexes to reduce the amount of dynamic mapping

@@ -513,6 +517,104 @@ private int executeSearchReindexApp(
return result;
}

@Command(
Copy link
Contributor

@mohityadav766 mohityadav766 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should allow the backfill configuration here to be supplied if needed ,
else only way would be to use UI to set it and then run here if it needs to be run with some different config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already there:

  @Option(
          names = {"--start-date"},
          description = "Start Date to backfill from.")
      String startDate,
  @Option(
          names = {"--end-date"},
          description = "End Date to backfill to.")
      String endDate) {

Copy link

@IceS2 IceS2 merged commit 901063b into main Jan 23, 2025
18 of 20 checks passed
@IceS2 IceS2 deleted the fix-data-insights-mapping branch January 23, 2025 09:02
Copy link
Contributor

Changes have been cherry-picked to the 1.6.3 branch.

github-actions bot pushed a commit that referenced this pull request Jan 23, 2025
* Fixing Data Insights index mapping

* Add OpenMetadataOperations cli endpoint to reindex data insights

* Improve IndexMapTemplate building

* Improve the code a bit

* Fix test

(cherry picked from commit 901063b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce the amount of columns we are indexing for Data Insights Fix Mappings for Data Insights
4 participants