[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

sorenlouv · 2024-08-22T04:22:33Z

The Problem
The LLM decides the identifier (both _id and doc_id) for knowledge base entries. The _id must be globally unique in Elasticsearch but the LLM can easily pick the same id for different users thereby overwriting one users learning with another users learning.

Solution
The LLM should not pick the _id. With this PR a UUID is generated for new entries. The LLM can supply a "lookup_id" (stored as doc_id for backwards compatibility) so that if the entry already exists for the currently active user, the LLM will overwrite it.

Another problem was that we conflated lookup id (aka doc_id) with a human readable title. This meant that when users gave entries titles, they would accidentally overwrite other users entries with the same title.
To solve this, entries now have a dedicated title field. For backwards-compat we fall back to using doc_id as title if no title is given.

obltmachine · 2024-08-22T04:22:45Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

…iting accidentally

dgieselaar · 2024-08-27T18:57:30Z

I've not looked through the code so maybe you took this into account, but we also have the documents that we pre-load into the knowledge base. Those should not have dynamically generated uuids, but predetermined IDs.

sorenlouv · 2024-08-27T20:28:32Z

x-pack/plugins/observability_solution/observability_ai_assistant/common/types.ts

@@ -79,9 +79,10 @@ export type ConversationUpdateRequest = ConversationRequestBase & {

 export interface KnowledgeBaseEntry {
  '@timestamp': string;
-  id: string;
+  id: string; // unique ID
+  doc_id?: string; // human readable ID generated by the LLM and used by the LLM to lookup and update existing entries. TODO: rename `doc_id` to `lookup_id`


id is globally unique, doc_id is only unique per user. Multiple entries can be assigned the same doc_id if they are created for different users.

sorenlouv · 2024-08-27T20:37:41Z

x-pack/plugins/observability_solution/observability_ai_assistant/common/types.ts

-  doc_id?: string;
+  id?: string;


doc_id can be used by the LLM to lookup entries. I see no reason to expand that concept to instructions. instructions can still have pre-determined id's - they do not have to be UUIDs. See the lens docs for an example of this

sorenlouv · 2024-08-27T20:46:32Z

x-pack/plugins/observability_solution/observability_ai_assistant/server/routes/chat/route.ts

@@ -42,7 +42,7 @@ const chatCompleteBaseRt = t.type({
      ]),
      instructions: t.array(
        t.intersection([
-          t.partial({ doc_id: t.string }),
+          t.partial({ id: t.string }),


It's still possible to overwrite existing instructions by specifying the id

sorenlouv · 2024-08-27T20:58:22Z

...ns/observability_solution/observability_ai_assistant/server/service/kb_component_template.ts

+          keyword: {
+            type: 'keyword',
+            ignore_above: 256,
+          },


Adding nested keyword in order to be able to sort on it. Using nested keyword is recommended over fielddata as it is more performant (should have been used for doc_id as well).

sorenlouv · 2024-08-27T21:01:20Z

...rvability_solution/observability_ai_assistant/server/service/knowledge_base_service/index.ts

+    this.dependencies.logger.debug(
+      `Adding ${operations.length} operations to queue. Queue size now: ${this._queue.length})`
+    );
+    this._queue.push(...operations);


Afaict we had a bug here before: By calling this._queue.push conditionally we were not adding operations to the queue when isModelReady=true. This meant that anything imported after the model had been setup was being dropped 😱

In general I hope we can get rid of the queue, or separate the queuing logic from the knowledge base. Having the queue embedded makes it more complex to work with the KB than it needs to be.

...s/observability_solution/observability_ai_assistant/server/utils/recall/score_suggestions.ts

sorenlouv · 2024-08-28T07:55:34Z

I've not looked through the code so maybe you took this into account, but we also have the documents that we pre-load into the knowledge base. Those should not have dynamically generated uuids, but predetermined IDs.

@dgieselaar Perhaps see this comment #191043 (comment)

sorenlouv · 2024-08-28T22:30:23Z

Tested and appears to be working correctly. One thing I noticed is often when I switch users to add for eg, a favorite color, it no longer overwrites the existing entry but it appears with the entries nested and the type changes from "assistant" to "system".

initially adding with elastic user:

updating with test2 user:
It functions correctly retrieving the right entry when asked per user.

God catch! Fixed in b3f7d3a

…entries-to-avoid-overwriting

…ocs'

sorenlouv · 2024-08-30T09:35:20Z

...s/observability_solution/observability_ai_assistant/server/utils/recall/score_suggestions.ts

+    If the prompt is a statement that should be stored in the knowledge base:
+    - The document contains information that directly contradicts the user's prompt or previous statements, indicating that it may need to be updated or corrected.
+    - The document contains outdated user preferences or information that the user indicate they want corrected or replaced.


Note
I added this in order for the LLM to include knowledge base entries that contradict the prompt. An example is a knowledge base entry that says "The user's favourite color is red" and the prompt says "My favourite color is blue".
Before adding these lines the LLM would not deem such a document relevant - now it does. The reason we want to include contradictory entries is to let the LLM update/overwrite them. It can only do that if it knows their doc_id.
My only worry would be if this leads the LLM to include irrelevant documents in other scenarios.

kibanamachine · 2024-08-31T00:02:28Z

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#6840

[✅] x-pack/test/observability_ai_assistant_functional/enterprise/config.ts: 25/25 tests passed.

see run history

sorenlouv · 2024-08-31T14:26:33Z

x-pack/plugins/observability_solution/observability_ai_assistant/server/functions/summarize.ts

      signal
    ) => {
+      // The LLM should be able to update an existing entry by providing the same doc_id
+      // if no existing entry is found, we generate a uuid
+      const id = await client.getUuidFromDocId(docId);


The LLM will (blindly) suggest a doc_id, without any information about existing entries. With the doc_id we can retrieve the _id. It does work but I don't like it very much because the LLM does not consistently produce the same doc_id's even when it should.

A better approach might be to get rid of doc_id entirely. We already provide the LLM with relevant entries via recall. By improving the recall to also include contradicting entries (which I've done in this PR) the LLM should be able to get the _id for the existing entry and use that in order to update it.

kibana-ci · 2024-09-02T11:32:17Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 91c84d2
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-191043-91c84d2c14ba

Failed CI Steps

Jest Tests #6

Test Failures

[job] [logs] Jest Tests #6 / When rendering PolicySettingsLayout and user has Edit permissions should allow updates to be made

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`observabilityAIAssistant`	284	286	+2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`observabilityAiAssistantManagement`	91.5KB	91.9KB	+479.0B

Unknown metric groups

API count

id	before	after	diff
`observabilityAIAssistant`	286	288	+2

History

💛 Build #231299 was flaky 5ea4ac2
💛 Build #231286 was flaky 9b2cf9b
💔 Build #231278 failed 03e8fe6
💔 Build #230980 failed 923681c
💔 Build #230818 failed 08aa578

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…entries-to-avoid-overwriting # Conflicts: # x-pack/plugins/observability_solution/observability_ai_assistant/server/service/client/index.ts # x-pack/plugins/observability_solution/observability_ai_assistant/server/service/index.ts # x-pack/plugins/observability_solution/observability_ai_assistant/server/service/knowledge_base_service/index.ts

…entries-to-avoid-overwriting # Conflicts: # x-pack/plugins/observability_solution/observability_ai_assistant/server/service/client/index.ts # x-pack/plugins/observability_solution/observability_ai_assistant/server/service/types.ts # x-pack/plugins/observability_solution/observability_ai_assistant/server/service/util/get_system_message_from_instructions.ts

elasticmachine · 2024-10-31T11:25:45Z

💚 Build Succeeded

Buildkite Build
Commit: 0aaf657
Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-191043-0aaf657de211

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`observabilityAIAssistant`	294	296	+2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`observabilityAiAssistantManagement`	92.8KB	93.3KB	+479.0B

Unknown metric groups

API count

id	before	after	diff
`observabilityAIAssistant`	296	298	+2

History

💔 Build #247356 failed fc0a970
💔 Build #247136 failed b7d2f8e

sorenlouv requested a review from a team as a code owner August 22, 2024 04:22

botelastic bot added ci:project-deploy-observability Create an Observability project Team:Obs AI Assistant labels Aug 22, 2024

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwr…

e627b85

…iting accidentally

sorenlouv force-pushed the add-uuid-to-kb-entries-to-avoid-overwriting branch from 614ee57 to e627b85 Compare August 22, 2024 04:26

sorenlouv added release_note:fix v8.16.0 and removed ci:project-deploy-observability Create an Observability project labels Aug 22, 2024

Fix issues

ede9593

botelastic bot added the ci:project-deploy-observability Create an Observability project label Aug 22, 2024

sorenlouv and others added 2 commits August 26, 2024 12:24

Fix user instruction test

4f5cb94

Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting

bafa18b

sorenlouv force-pushed the add-uuid-to-kb-entries-to-avoid-overwriting branch from 0cc07e8 to a9ed9e9 Compare August 27, 2024 18:41

sorenlouv commented Aug 27, 2024

View reviewed changes

Fix test

14854d2

sorenlouv force-pushed the add-uuid-to-kb-entries-to-avoid-overwriting branch from a9ed9e9 to 14854d2 Compare August 27, 2024 20:44

sorenlouv commented Aug 27, 2024

View reviewed changes

Cleanup

f6b8303

sorenlouv commented Aug 27, 2024

View reviewed changes

...s/observability_solution/observability_ai_assistant/server/utils/recall/score_suggestions.ts Show resolved Hide resolved

sorenlouv added 2 commits August 28, 2024 00:05

Improve types

f88d826

i18n

f42f1ec

sorenlouv added 3 commits August 28, 2024 10:56

Remove unused imports

b718ae5

Re-add ShortIdTable

59a9362

Change recall to not return entries as nested object

51cb688

sorenlouv and others added 8 commits August 29, 2024 01:26

Revert changes to files

b20bd1b

Add functional test

0f531e4

Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…

63b171d

…entries-to-avoid-overwriting

i18n

08aa578

Revert tsconfig changes

ad425be

Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…

6da17f3

…entries-to-avoid-overwriting

Include KB entries when they contain contradicting info

90b5484

[CI] Auto-commit changed files from 'node scripts/build_plugin_list_d…

923681c

…ocs'

sorenlouv commented Aug 30, 2024

View reviewed changes

sorenlouv added 2 commits August 30, 2024 22:31

Fix tsc and jest

03e8fe6

Improve functional test

9b2cf9b

Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting

5ea4ac2

sorenlouv commented Aug 31, 2024

View reviewed changes

Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting

91c84d2

sorenlouv added v8.17.0 backport:skip This commit does not require backporting and removed v8.16.0 backport:skip This commit does not require backporting labels Oct 30, 2024

sorenlouv added 2 commits October 31, 2024 10:43

Fix tsc

fc0a970

sorenlouv added backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) and removed backport:skip This commit does not require backporting v8.17.0 labels Oct 31, 2024

Fix serverless test

0aaf657

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

sorenlouv commented Aug 22, 2024 •

edited

Loading

obltmachine commented Aug 22, 2024

dgieselaar commented Aug 27, 2024

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv commented Aug 28, 2024

sorenlouv commented Aug 28, 2024 •

edited

Loading

sorenlouv Aug 30, 2024 •

edited

Loading

kibanamachine commented Aug 31, 2024

sorenlouv Aug 31, 2024 •

edited

Loading

kibana-ci commented Sep 2, 2024 •

edited

Loading

API count

elasticmachine commented Oct 31, 2024 •

edited

Loading

API count

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

Are you sure you want to change the base?

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

Conversation

sorenlouv commented Aug 22, 2024 • edited Loading

obltmachine commented Aug 22, 2024

🤖 GitHub comments

dgieselaar commented Aug 27, 2024

sorenlouv Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sorenlouv Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sorenlouv Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sorenlouv Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sorenlouv Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

sorenlouv commented Aug 28, 2024

sorenlouv commented Aug 28, 2024 • edited Loading

sorenlouv Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

kibanamachine commented Aug 31, 2024

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#6840

sorenlouv Aug 31, 2024 • edited Loading

Choose a reason for hiding this comment

kibana-ci commented Sep 2, 2024 • edited Loading

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

Public APIs missing comments

Async chunks

API count

History

elasticmachine commented Oct 31, 2024 • edited Loading

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Async chunks

API count

History

sorenlouv commented Aug 22, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv Aug 27, 2024 •

edited

Loading

sorenlouv commented Aug 28, 2024 •

edited

Loading

sorenlouv Aug 30, 2024 •

edited

Loading

sorenlouv Aug 31, 2024 •

edited

Loading

kibana-ci commented Sep 2, 2024 •

edited

Loading

elasticmachine commented Oct 31, 2024 •

edited

Loading