Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwriting accidentally #191043

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
e627b85
[Obs AI Assistant] Add uuid to knowledge base entries to avoid overwr…
sorenlouv Aug 22, 2024
ede9593
Fix issues
sorenlouv Aug 22, 2024
4f5cb94
Fix user instruction test
sorenlouv Aug 26, 2024
bafa18b
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
neptunian Aug 27, 2024
14854d2
Fix test
sorenlouv Aug 27, 2024
f6b8303
Cleanup
sorenlouv Aug 27, 2024
f88d826
Improve types
sorenlouv Aug 27, 2024
f42f1ec
i18n
sorenlouv Aug 27, 2024
b718ae5
Remove unused imports
sorenlouv Aug 28, 2024
59a9362
Re-add `ShortIdTable`
sorenlouv Aug 28, 2024
51cb688
Change `recall` to not return entries as nested object
sorenlouv Aug 28, 2024
51495d2
Rename `groupId` back to `docId` to reduce diff
sorenlouv Aug 28, 2024
5fdcc4d
Add test for shortIdTable
sorenlouv Aug 28, 2024
d470e39
Revert change to reduce diff
sorenlouv Aug 28, 2024
fb7370b
Keep original id
sorenlouv Aug 28, 2024
c63a8f9
Omit “User” suffix from API client methods
sorenlouv Aug 28, 2024
a6d55ed
Type fix
sorenlouv Aug 28, 2024
b3f7d3a
Fix categorization bug
sorenlouv Aug 28, 2024
b20bd1b
Revert changes to files
sorenlouv Aug 28, 2024
0f531e4
Add functional test
sorenlouv Aug 29, 2024
63b171d
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Aug 29, 2024
08aa578
i18n
sorenlouv Aug 29, 2024
ad425be
Revert tsconfig changes
sorenlouv Aug 29, 2024
6da17f3
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Aug 29, 2024
90b5484
Include KB entries when they contain contradicting info
sorenlouv Aug 29, 2024
923681c
[CI] Auto-commit changed files from 'node scripts/build_plugin_list_d…
kibanamachine Aug 29, 2024
03e8fe6
Fix tsc and jest
sorenlouv Aug 30, 2024
9b2cf9b
Improve functional test
sorenlouv Aug 30, 2024
5ea4ac2
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Aug 31, 2024
91c84d2
Merge branch 'main' into add-uuid-to-kb-entries-to-avoid-overwriting
sorenlouv Sep 2, 2024
b7d2f8e
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Oct 30, 2024
1d6beef
Merge branch 'main' of github.com:elastic/kibana into add-uuid-to-kb-…
sorenlouv Oct 31, 2024
fc0a970
Fix tsc
sorenlouv Oct 31, 2024
0aaf657
Fix serverless test
sorenlouv Oct 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,10 @@ export type ConversationUpdateRequest = ConversationRequestBase & {

export interface KnowledgeBaseEntry {
'@timestamp': string;
id: string;
id: string; // unique ID
doc_id?: string; // human readable ID generated by the LLM and used by the LLM to lookup and update existing entries. TODO: rename `doc_id` to `lookup_id`
Copy link
Member Author

@sorenlouv sorenlouv Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is globally unique, doc_id is only unique per user. Multiple entries can be assigned the same doc_id if they are created for different users.

title?: string;
text: string;
doc_id: string;
confidence: 'low' | 'medium' | 'high';
is_correction: boolean;
type?: 'user_instruction' | 'contextual';
Expand All @@ -96,12 +97,12 @@ export interface KnowledgeBaseEntry {
}

export interface Instruction {
doc_id: string;
id: string;
text: string;
}

export interface AdHocInstruction {
doc_id?: string;
id?: string;
Comment on lines -104 to +105
Copy link
Member Author

@sorenlouv sorenlouv Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc_id can be used by the LLM to lookup entries. I see no reason to expand that concept to instructions. instructions can still have pre-determined id's - they do not have to be UUIDs. See the lens docs for an example of this

text: string;
instruction_type: 'user_instruction' | 'application_instruction';
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,16 @@
import { ShortIdTable } from './short_id_table';

describe('shortIdTable', () => {
it('generates a short id from a uuid', () => {
const table = new ShortIdTable();

const uuid = 'd877f65c-4036-42c4-b105-19e2f1a1c045';
const shortId = table.take(uuid);

expect(shortId.length).toBe(4);
expect(table.lookup(shortId)).toBe(uuid);
});

it('generates at least 10k unique ids consistently', () => {
const ids = new Set();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@ const schema: RootSchema<RecallRanking> = {
},
};

export const RecallRankingEventType = 'observability_ai_assistant_recall_ranking';
export const recallRankingEventType = 'observability_ai_assistant_recall_ranking';

export const recallRankingEvent: EventTypeOpts<RecallRanking> = {
eventType: RecallRankingEventType,
eventType: recallRankingEventType,
schema,
};
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* 2.0.
*/

import { KnowledgeBaseType } from '../../common/types';
import { v4 } from 'uuid';
import type { FunctionRegistrationParameters } from '.';
import { KnowledgeBaseEntryRole } from '../../common';

Expand All @@ -14,6 +14,7 @@ export const SUMMARIZE_FUNCTION_NAME = 'summarize';
export function registerSummarizationFunction({
client,
functions,
resources,
}: FunctionRegistrationParameters) {
functions.registerFunction(
{
Expand All @@ -31,7 +32,7 @@ export function registerSummarizationFunction({
id: {
type: 'string',
description:
'An id for the document. This should be a short human-readable keyword field with only alphabetic characters and underscores, that allow you to update it later.',
'A lookup id for the document. This should be a short human-readable keyword field with only alphabetic characters and underscores, that allow you to find and update it later.',
},
text: {
type: 'string',
Expand Down Expand Up @@ -62,21 +63,31 @@ export function registerSummarizationFunction({
],
},
},
(
{ arguments: { id, text, is_correction: isCorrection, confidence, public: isPublic } },
async (
{ arguments: { id: docId, text, is_correction: isCorrection, confidence, public: isPublic } },
signal
) => {
// The LLM should be able to update an existing entry by providing the same doc_id
// if no existing entry is found, we generate a uuid
const id = await client.getUuidFromDocId(docId);
Copy link
Member Author

@sorenlouv sorenlouv Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM will (blindly) suggest a doc_id, without any information about existing entries. With the doc_id we can retrieve the _id. It does work but I don't like it very much because the LLM does not consistently produce the same doc_id's even when it should.

A better approach might be to get rid of doc_id entirely. We already provide the LLM with relevant entries via recall. By improving the recall to also include contradicting entries (which I've done in this PR) the LLM should be able to get the _id for the existing entry and use that in order to update it.


resources.logger.debug(
id
? `Updating knowledge base entry with id: ${id}, doc_id: ${docId}`
: `Creating new knowledge base entry with doc_id: ${docId}`
);

return client
.addKnowledgeBaseEntry({
entry: {
doc_id: id,
role: KnowledgeBaseEntryRole.AssistantSummarization,
id,
id: id ?? v4(),
doc_id: docId,
title: docId, // use doc_id as title for now
text,
is_correction: isCorrection,
type: KnowledgeBaseType.Contextual,
confidence,
public: isPublic,
role: KnowledgeBaseEntryRole.AssistantSummarization,
confidence,
is_correction: isCorrection,
labels: {},
},
// signal,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ const chatCompleteBaseRt = t.type({
]),
instructions: t.array(
t.intersection([
t.partial({ doc_id: t.string }),
t.partial({ id: t.string }),
Copy link
Member Author

@sorenlouv sorenlouv Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still possible to overwrite existing instructions by specifying the id

t.type({
text: t.string,
instruction_type: t.union([
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import { notImplemented } from '@hapi/boom';
import { nonEmptyStringRt, toBooleanRt } from '@kbn/io-ts-utils';
import * as t from 'io-ts';
import { v4 } from 'uuid';
import { FunctionDefinition } from '../../../common/functions/types';
import { KnowledgeBaseEntryRole } from '../../../common/types';
import type { RecalledEntry } from '../../service/knowledge_base_service';
Expand Down Expand Up @@ -114,19 +115,19 @@ const functionRecallRoute = createObservabilityAIAssistantServerRoute({
throw notImplemented();
}

return client.recall({ queries, categories });
const entries = await client.recall({ queries, categories });
return { entries };
},
});

const functionSummariseRoute = createObservabilityAIAssistantServerRoute({
endpoint: 'POST /internal/observability_ai_assistant/functions/summarize',
params: t.type({
body: t.type({
id: t.string,
doc_id: t.string,
text: nonEmptyStringRt,
confidence: t.union([t.literal('low'), t.literal('medium'), t.literal('high')]),
is_correction: toBooleanRt,
type: t.union([t.literal('user_instruction'), t.literal('contextual')]),
public: toBooleanRt,
labels: t.record(t.string, t.string),
}),
Expand All @@ -142,22 +143,21 @@ const functionSummariseRoute = createObservabilityAIAssistantServerRoute({
}

const {
doc_id: docId,
confidence,
id,
is_correction: isCorrection,
type,
text,
public: isPublic,
labels,
} = resources.params.body;

const id = await client.getUuidFromDocId(docId);
return client.addKnowledgeBaseEntry({
entry: {
confidence,
id,
doc_id: id,
id: id ?? v4(),
doc_id: docId,
is_correction: isCorrection,
type,
text,
public: isPublic,
labels,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,7 @@ import { notImplemented } from '@hapi/boom';
import { nonEmptyStringRt, toBooleanRt } from '@kbn/io-ts-utils';
import * as t from 'io-ts';
import { createObservabilityAIAssistantServerRoute } from '../create_observability_ai_assistant_server_route';
import {
Instruction,
KnowledgeBaseEntry,
KnowledgeBaseEntryRole,
KnowledgeBaseType,
} from '../../../common/types';
import { Instruction, KnowledgeBaseEntry, KnowledgeBaseEntryRole } from '../../../common/types';

const getKnowledgeBaseStatus = createObservabilityAIAssistantServerRoute({
endpoint: 'GET /internal/observability_ai_assistant/kb/status',
Expand Down Expand Up @@ -108,18 +103,8 @@ const saveKnowledgeBaseUserInstruction = createObservabilityAIAssistantServerRou
}

const { id, text, public: isPublic } = resources.params.body;
return client.addKnowledgeBaseEntry({
entry: {
id,
doc_id: id,
text,
public: isPublic,
confidence: 'high',
is_correction: false,
type: KnowledgeBaseType.UserInstruction,
labels: {},
role: KnowledgeBaseEntryRole.UserEntry,
},
return client.addUserInstruction({
entry: { id, text, public: isPublic },
});
},
});
Expand Down Expand Up @@ -159,17 +144,19 @@ const saveKnowledgeBaseEntry = createObservabilityAIAssistantServerRoute({
body: t.intersection([
t.type({
id: t.string,
title: t.string,
text: nonEmptyStringRt,
}),
t.partial({
doc_id: t.string,
confidence: t.union([t.literal('low'), t.literal('medium'), t.literal('high')]),
is_correction: toBooleanRt,
public: toBooleanRt,
labels: t.record(t.string, t.string),
role: t.union([
t.literal('assistant_summarization'),
t.literal('user_entry'),
t.literal('elastic'),
t.literal(KnowledgeBaseEntryRole.AssistantSummarization),
t.literal(KnowledgeBaseEntryRole.UserEntry),
t.literal(KnowledgeBaseEntryRole.Elastic),
]),
}),
]),
Expand All @@ -186,6 +173,8 @@ const saveKnowledgeBaseEntry = createObservabilityAIAssistantServerRoute({

const {
id,
doc_id: docId,
title,
text,
public: isPublic,
confidence,
Expand All @@ -196,15 +185,15 @@ const saveKnowledgeBaseEntry = createObservabilityAIAssistantServerRoute({

return client.addKnowledgeBaseEntry({
entry: {
id,
text,
doc_id: id,
id,
doc_id: docId,
title,
confidence: confidence ?? 'high',
is_correction: isCorrection ?? false,
type: 'contextual',
public: isPublic ?? true,
labels: labels ?? {},
role: (role as KnowledgeBaseEntryRole) ?? KnowledgeBaseEntryRole.UserEntry,
role: role ?? KnowledgeBaseEntryRole.UserEntry,
},
});
},
Expand Down Expand Up @@ -236,10 +225,15 @@ const importKnowledgeBaseEntries = createObservabilityAIAssistantServerRoute({
params: t.type({
body: t.type({
entries: t.array(
t.type({
id: t.string,
text: nonEmptyStringRt,
})
t.intersection([
t.type({
id: t.string,
text: nonEmptyStringRt,
}),
t.partial({
title: t.string,
}),
])
),
}),
}),
Expand All @@ -253,18 +247,48 @@ const importKnowledgeBaseEntries = createObservabilityAIAssistantServerRoute({
throw notImplemented();
}

const entries = resources.params.body.entries.map((entry) => ({
doc_id: entry.id,
const formattedEntries = resources.params.body.entries.map((entry) => ({
id: entry.id,
title: entry.title,
text: entry.text,
confidence: 'high' as KnowledgeBaseEntry['confidence'],
is_correction: false,
type: 'contextual' as const,
public: true,
labels: {},
role: KnowledgeBaseEntryRole.UserEntry,
...entry,
}));

return await client.importKnowledgeBaseEntries({ entries });
return await client.importKnowledgeBaseEntries({ entries: formattedEntries });
},
});

const importKnowledgeBaseCategoryEntries = createObservabilityAIAssistantServerRoute({
endpoint: 'POST /internal/observability_ai_assistant/kb/entries/category/import',
params: t.type({
body: t.type({
category: t.string,
entries: t.array(
t.type({
id: t.string,
texts: t.array(t.string),
})
),
}),
}),
options: {
tags: ['access:ai_assistant'],
},
handler: async (resources): Promise<void> => {
const client = await resources.service.getClient({ request: resources.request });

if (!client) {
throw notImplemented();
}

const { entries, category } = resources.params.body;

return resources.service.addCategoryToKnowledgeBase(category, entries);
},
});

Expand All @@ -275,6 +299,7 @@ export const knowledgeBaseRoutes = {
...saveKnowledgeBaseUserInstruction,
...getKnowledgeBaseUserInstructions,
...importKnowledgeBaseEntries,
...importKnowledgeBaseCategoryEntries,
...saveKnowledgeBaseEntry,
...deleteKnowledgeBaseEntry,
};
Loading