Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enhance ai-cache Plugin with Vector Similarity-Based LLM Cache Recall and Multi-DB Support #1248

Merged
merged 92 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from 87 commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
4f7bfbd
fix bugs
johnlanni Jul 31, 2024
0f9e816
fix bugs
Suchun-sv Aug 1, 2024
ff1bce6
fix bugs
Suchun-sv Aug 12, 2024
1e9d42e
init
EnableAsync Aug 15, 2024
f2a9ff6
fix conflict
Suchun-sv Aug 23, 2024
5cbae03
Merge branch 'alibaba:main' into main
Suchun-sv Aug 23, 2024
27b2f71
alter some errors
Suchun-sv Aug 24, 2024
130f2ee
fix: embedding error
EnableAsync Aug 24, 2024
56314d7
fix bugs && update interface design
Suchun-sv Aug 24, 2024
3d7e85c
feat: add elasticsearch
EnableAsync Aug 25, 2024
85549d0
fix bugs && refine the variable names
Suchun-sv Aug 25, 2024
8444f5e
update design for cache to support extension
Suchun-sv Aug 25, 2024
a655bc4
Merge branch 'alibaba:main' into main
Suchun-sv Sep 5, 2024
57bc863
Merge branch 'alibaba:main' into feat/chroma
Suchun-sv Sep 5, 2024
d68fa88
Refined the code; README.md content needs to be updated.
Suchun-sv Sep 5, 2024
d6c643f
add: makefile for weaviate
EnableAsync Sep 6, 2024
3f3a1bc
feat: add weaviate
EnableAsync Sep 6, 2024
71cc25b
feat: add pinecone
EnableAsync Sep 6, 2024
5179392
fix bugs, README.md to be updated
Suchun-sv Sep 6, 2024
ece7e2f
fix bugs, refine variable name, update README.md
Suchun-sv Sep 6, 2024
e868a1a
Merge branch 'alibaba:main' into main
Suchun-sv Sep 6, 2024
138a526
delete folder
Suchun-sv Sep 6, 2024
65aafbd
Merge branch 'feat/chroma' of https://github.com/Suchun-sv/higress in…
EnableAsync Sep 6, 2024
bfaed4c
fix: format
EnableAsync Sep 6, 2024
e8ad550
fix typos
Suchun-sv Sep 6, 2024
95b06b7
Merge branch 'alibaba:main' into feat/chroma
Suchun-sv Sep 6, 2024
a40f5e9
update
EnableAsync Sep 6, 2024
c83f5c4
fix typos
Suchun-sv Sep 6, 2024
f3d3292
change append to appendMsg
Suchun-sv Sep 6, 2024
b0cf29d
fix bugs and refine code
Suchun-sv Sep 11, 2024
4a18f96
Merge branch 'main' into main
Suchun-sv Sep 11, 2024
21c9a79
fix bugs and update the SetEx function
Suchun-sv Sep 12, 2024
1767896
Merge branch 'main' into main
Suchun-sv Sep 12, 2024
71b9530
Optimize query flow logic (not fully tested)
Suchun-sv Sep 17, 2024
51b9ccc
Fix bugs and verify removal of cache setting
Suchun-sv Sep 21, 2024
3583bc9
fix bugs and update logic as requested
Suchun-sv Sep 21, 2024
10cc7ef
Merge branch 'main' into main
Suchun-sv Oct 10, 2024
36ca3f1
Merge branch 'alibaba:main' into main
Suchun-sv Oct 13, 2024
c261583
add cacheKeyStrategy and enableSemanticCache
Suchun-sv Oct 14, 2024
fa22d63
add cacheKeyStrategy and enableSemanticCache
Suchun-sv Oct 14, 2024
9145132
Vector or cache database must be configured
Suchun-sv Oct 14, 2024
ef443bf
new version envoy
EnableAsync Oct 18, 2024
14a2a3d
fix: GetContext type
EnableAsync Oct 18, 2024
b862ef9
feat: chroma
EnableAsync Oct 18, 2024
7bc5f65
merge
EnableAsync Oct 18, 2024
303f6ed
feat: weaviate
EnableAsync Oct 18, 2024
fb2c26c
fix: clean useless code
EnableAsync Oct 18, 2024
8486555
fix: clean useless code
EnableAsync Oct 18, 2024
e9a14d8
feat: es
EnableAsync Oct 18, 2024
32eccd7
feat: pinecone
EnableAsync Oct 18, 2024
e6f700c
feat: chroma dasvector es pinecone weaviate
EnableAsync Oct 18, 2024
02bc9a2
Merge remote-tracking branch 'origin/main' into feat/chroma
EnableAsync Oct 18, 2024
440cd8d
fix: bugs
EnableAsync Oct 18, 2024
628b74b
fix: bugs
EnableAsync Oct 18, 2024
342bd94
fix: remove uesless files
EnableAsync Oct 18, 2024
cbeb71b
fix: remove uesless files
EnableAsync Oct 18, 2024
43cfdaf
feat: qdrant
EnableAsync Oct 19, 2024
2a4363a
feat: milvus
EnableAsync Oct 19, 2024
9603479
feat: custom threshold
EnableAsync Oct 19, 2024
3d615cc
feat: custom threshold
EnableAsync Oct 19, 2024
558e75e
fix: code format
EnableAsync Oct 20, 2024
2cfcda6
add ai cache test
Suchun-sv Oct 20, 2024
4caf9be
update test
Suchun-sv Oct 20, 2024
d04d78a
fix bugs
Suchun-sv Oct 24, 2024
81bde6d
update
EnableAsync Oct 24, 2024
ea34f4a
fix: bugs
EnableAsync Oct 24, 2024
784740f
Merge branch 'main' into main
Suchun-sv Oct 24, 2024
f5b50fd
add support for skip-cache
Suchun-sv Oct 24, 2024
a1fe701
update README.md and change to FQDNCluster
Suchun-sv Oct 24, 2024
730d951
change to FQDNCluster
Suchun-sv Oct 24, 2024
335c04c
provide support for the legacy configuration
Suchun-sv Oct 25, 2024
59bddf6
simplify resp func, add func name when debug
Suchun-sv Oct 26, 2024
e4901d9
Merge branch 'alibaba:main' into main
Suchun-sv Oct 26, 2024
36f0d77
change *.typ to *
Suchun-sv Oct 26, 2024
009a1b1
add support for legacy config
Suchun-sv Oct 26, 2024
4515f43
update content_type in stream resp
Suchun-sv Oct 26, 2024
c048280
fix bugs
Suchun-sv Oct 26, 2024
0ec24f3
add support for legacy configuration
Suchun-sv Oct 26, 2024
a658bfe
fix bugs
Suchun-sv Oct 26, 2024
a199144
handle the data: [DONE] and return in escaped string
Suchun-sv Oct 26, 2024
77f05d6
dont read resp when ERROR_PARTIAL_MESSAGE_KEY not nil
Suchun-sv Oct 26, 2024
28c629c
Update redis_wrapper.go
CH3CHO Oct 27, 2024
bd84cd0
merge
EnableAsync Oct 29, 2024
d9ce358
merge
EnableAsync Oct 29, 2024
4a95557
update: README.md
EnableAsync Oct 29, 2024
04f288c
merge
EnableAsync Oct 29, 2024
902d810
fix: READMME.md
EnableAsync Oct 29, 2024
a1a7eef
Update README.md
EnableAsync Oct 29, 2024
d1b99b3
Merge remote-tracking branch 'my/feat/chroma' into feat/chroma
EnableAsync Nov 17, 2024
6a782a4
update
EnableAsync Nov 17, 2024
014d3ea
update
EnableAsync Nov 19, 2024
134aecc
Merge branch 'main' into feat/chroma
CH3CHO Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion plugins/wasm-go/extensions/ai-cache/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,31 @@ LLM 结果缓存插件,默认配置方式可以直接用于 openai 协议的
| responseTemplate | string | optional | `{"id":"ai-cache.hit","choices":[{"index":0,"message":{"role":"assistant","content":%s},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | 返回 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 |
| streamResponseTemplate | string | optional | `data:{"id":"ai-cache.hit","choices":[{"index":0,"delta":{"role":"assistant","content":%s},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | 返回流式 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 |

# 向量数据库提供商特有配置
## Chroma
Chroma 所对应的 `vector.type` 为 `chroma`。它并无特有的配置字段。

## DashVector
DashVector 所对应的 `vector.type` 为 `dashvector`。它并无特有的配置字段。

## ElasticSearch
ElasticSearch 所对应的 `vector.type` 为 `elasticsearch`。它特有的配置字段如下:
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|-------------------|----------|----------|--------|-------------------------------------------------------------------------------|
| `vector.esUsername` | string | 非必填 | - | ElasticSearch 用户名 |
| `vector.esPassword` | string | 非必填 | - | ElasticSearch 密码 |

## Milvus
Milvus 所对应的 `vector.type` 为 `milvus`。它并无特有的配置字段。

## Pinecone
Pinecone 所对应的 `vector.type` 为 `pinecone`。它并无特有的配置字段。

## Qdrant
Qdrant 所对应的 `vector.type` 为 `qdrant`。它并无特有的配置字段。

## Weaviate
Weaviate 所对应的 `vector.type` 为 `weaviate`。它并无特有的配置字段。

## 配置示例
### 基础配置
Expand Down Expand Up @@ -144,4 +169,4 @@ GJSON PATH 支持条件判断语法,例如希望取最后一个 role 为 user

## 常见问题

1. 如果返回的错误为 `error status returned by host: bad argument`,请检查`serviceName`是否正确包含了服务的类型后缀(.dns等)。
1. 如果返回的错误为 `error status returned by host: bad argument`,请检查`serviceName`是否正确包含了服务的类型后缀(.dns等)。
201 changes: 201 additions & 0 deletions plugins/wasm-go/extensions/ai-cache/vector/chroma.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
package vector

import (
"encoding/json"
"errors"
"fmt"
"net/http"

"github.com/alibaba/higress/plugins/wasm-go/pkg/wrapper"
)

type chromaProviderInitializer struct{}

func (c *chromaProviderInitializer) ValidateConfig(config ProviderConfig) error {
if len(config.collectionID) == 0 {
return errors.New("[Chroma] collectionID is required")
}
if len(config.serviceName) == 0 {
return errors.New("[Chroma] serviceName is required")
}
return nil
}

func (c *chromaProviderInitializer) CreateProvider(config ProviderConfig) (Provider, error) {
return &ChromaProvider{
config: config,
client: wrapper.NewClusterClient(wrapper.FQDNCluster{
FQDN: config.serviceName,
Host: config.serviceHost,
Port: int64(config.servicePort),
}),
}, nil
}

type ChromaProvider struct {
config ProviderConfig
client wrapper.HttpClient
}

func (c *ChromaProvider) GetProviderType() string {
return PROVIDER_TYPE_CHROMA
}

func (d *ChromaProvider) QueryEmbedding(
emb []float64,
ctx wrapper.HttpContext,
log wrapper.Log,
callback func(results []QueryResult, ctx wrapper.HttpContext, log wrapper.Log, err error)) error {
// 最少需要填写的参数为 collection_id, embeddings 和 ids
// 下面是一个例子
// {
// "where": {}, // 用于 metadata 过滤,可选参数
// "where_document": {}, // 用于 document 过滤,可选参数
// "query_embeddings": [
// [1.1, 2.3, 3.2]
// ],
// "limit": 5,
// "include": [
// "metadatas", // 可选
// "documents", // 如果需要答案则需要
// "distances"
// ]
// }

requestBody, err := json.Marshal(chromaQueryRequest{
QueryEmbeddings: []chromaEmbedding{emb},
Limit: d.config.topK,
Include: []string{"distances", "documents"},
})

if err != nil {
log.Errorf("[Chroma] Failed to marshal query embedding request body: %v", err)
return err
}

return d.client.Post(
fmt.Sprintf("/api/v1/collections/%s/query", d.config.collectionID),
[][2]string{
{"Content-Type", "application/json"},
},
requestBody,
func(statusCode int, responseHeaders http.Header, responseBody []byte) {
log.Debugf("[Chroma] Query embedding response: %d, %s", statusCode, responseBody)
results, err := d.parseQueryResponse(responseBody, log)
if err != nil {
err = fmt.Errorf("[Chroma] Failed to parse query response: %v", err)
}
callback(results, ctx, log, err)
},
d.config.timeout,
)
}

func (d *ChromaProvider) UploadAnswerAndEmbedding(
queryString string,
queryEmb []float64,
queryAnswer string,
ctx wrapper.HttpContext,
log wrapper.Log,
callback func(ctx wrapper.HttpContext, log wrapper.Log, err error)) error {
// 最少需要填写的参数为 collection_id, embeddings 和 ids
// 下面是一个例子
// {
// "embeddings": [
// [1.1, 2.3, 3.2]
// ],
// "ids": [
// "你吃了吗?"
// ],
// "documents": [
// "我吃了。"
// ]
// }
// 如果要添加 answer,则按照以下例子
// {
// "embeddings": [
// [1.1, 2.3, 3.2]
// ],
// "documents": [
// "answer1"
// ],
// "ids": [
// "id1"
// ]
// }
requestBody, err := json.Marshal(chromaInsertRequest{
Embeddings: []chromaEmbedding{queryEmb},
IDs: []string{queryString}, // queryString 指的是用户查询的问题
Documents: []string{queryAnswer}, // queryAnswer 指的是用户查询的问题的答案
})

if err != nil {
log.Errorf("[Chroma] Failed to marshal upload embedding request body: %v", err)
return err
}

err = d.client.Post(
fmt.Sprintf("/api/v1/collections/%s/add", d.config.collectionID),
[][2]string{
{"Content-Type", "application/json"},
},
requestBody,
func(statusCode int, responseHeaders http.Header, responseBody []byte) {
log.Debugf("[Chroma] statusCode:%d, responseBody:%s", statusCode, string(responseBody))
callback(ctx, log, err)
},
d.config.timeout,
)
return err
}

type chromaEmbedding []float64
type chromaMetadataMap map[string]string
type chromaInsertRequest struct {
Embeddings []chromaEmbedding `json:"embeddings"`
Metadatas []chromaMetadataMap `json:"metadatas,omitempty"` // 可选参数
Documents []string `json:"documents,omitempty"` // 可选参数
IDs []string `json:"ids"`
}

type chromaQueryRequest struct {
Where map[string]string `json:"where,omitempty"` // 可选参数
WhereDocument map[string]string `json:"where_document,omitempty"` // 可选参数
QueryEmbeddings []chromaEmbedding `json:"query_embeddings"`
Limit int `json:"limit"`
Include []string `json:"include"`
}

type chromaQueryResponse struct {
Ids [][]string `json:"ids"` // 第一维是 batch query,第二维是查询到的多个 ids
Distances [][]float64 `json:"distances,omitempty"` // 与 Ids 一一对应
Metadatas []chromaMetadataMap `json:"metadatas,omitempty"` // 可选参数
Embeddings []chromaEmbedding `json:"embeddings,omitempty"` // 可选参数
Documents [][]string `json:"documents,omitempty"` // 与 Ids 一一对应
Uris []string `json:"uris,omitempty"` // 可选参数
Data []interface{} `json:"data,omitempty"` // 可选参数
Included []string `json:"included"`
}

func (d *ChromaProvider) parseQueryResponse(responseBody []byte, log wrapper.Log) ([]QueryResult, error) {
var queryResp chromaQueryResponse
err := json.Unmarshal(responseBody, &queryResp)
if err != nil {
return nil, err
}

log.Debugf("[Chroma] queryResp Ids len: %d", len(queryResp.Ids))
if len(queryResp.Ids) == 1 && len(queryResp.Ids[0]) == 0 {
return nil, errors.New("no query results found in response")
}
results := make([]QueryResult, 0, len(queryResp.Ids[0]))
for i := range queryResp.Ids[0] {
result := QueryResult{
Text: queryResp.Ids[0][i],
Score: queryResp.Distances[0][i],
Answer: queryResp.Documents[0][i],
}
results = append(results, result)
}
return results, nil
}
Loading
Loading