Skip to content

Commit

Permalink
更新SDK CreateKnowledgeBase文档以及修复部分BUG (#728)
Browse files Browse the repository at this point in the history
* 更新SDK CreateKnowledgeBase文档以及修复部分BUG

* 修复KnowledgeBaseConfigIndex更新后出现的get_knowledge_base_list接口返回值解析不兼容问题

---------

Co-authored-by: yinjiaqi <[email protected]>
  • Loading branch information
C9luster and yinjiaqi authored Jan 24, 2025
1 parent bc1bf19 commit 7577c3c
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 104 deletions.
118 changes: 20 additions & 98 deletions docs/BasisModule/Platform/KnowledgeBase/knowledgebase.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,18 @@

## Python基本用法

### 1、新建知识库`KnowledgeBase().create_knowledge_base(name: str, description: str, type: str, esUrl: str, esUserName: str, esPassword: str) -> KnowledgeBaseDetailResponse`
### 1、新建知识库`KnowledgeBase().create_knowledge_base(name: str, description: str, type: str, clusterId: str, esUserName: str, esPassword: str, location: str) -> KnowledgeBaseDetailResponse`

#### 方法参数
| 参数名称 | 参数类型 | 是否必传 | 描述 | 示例值 |
| ----------- | -------- | -------- | ------------------------------------- | ------------------ |
| name | string || 希望创建的知识库名称 | "我的知识库" |
| description | string || 知识库描述 | "我的知识库" |
| type | string || 知识库索引存储配置 (public、bes、vdb) | "public" |
| clusterId | string | | 集群/实例 ID | "clusterId" |
| esUserName | string | | bes 用户名,type填bes时填写 | "username" |
| esPassword | string | | bes密码,type填bes时填写 | "password" |
|location|string||托管资源的区域,type填vdb时填写<br>可选值:<br>- bj:北京<br>- bd:保定<br>- sz:苏州<br>- gz:广州|"bj"|
| clusterId | string | type=bes 和 vdb 时填写 | 集群/实例 ID<br>请在bes、vdb控制台中查看。 | "clusterId" |
| esUserName | string | type=bes 和 vdb 时填写 | 用户名/账号<br>请在bes、vdb控制台中查看 | "username" |
| esPassword | string | type=bes 和 vdb 时填写 | 密码/API密钥<br>请在bes、vdb控制台中查看 | "password" |
|location|string|type=bes 和 vdb 时填写|托管资源的区域,type填vdb时填写<br>可选值:<br>- bj:北京<br>- bd:保定<br>- sz:苏州<br>- gz:广州|"bj"|

#### 方法返回值

Expand All @@ -55,9 +55,10 @@ class KnowledgeBaseConfig(BaseModel):
```python
class KnowledgeBaseConfigIndex(BaseModel):
type: str = Field(..., description="索引类型", enum=["public", "bes", "vdb"])
esUrl: Optional[str] = Field(..., description="ES地址")
username: Optional[str] = Field(None, description="ES用户名")
password: Optional[str] = Field(None, description="ES密码")
clusterId: Optional[str] = Field(None, description="集群/实例 ID")
username: Optional[str] = Field(None, description="bes用户名")
password: Optional[str] = Field(None, description="bes密码")
location: Optional[str] = Field(None, description="托管资源的区域", enum=["bj", "bd", "sz", "gz"])
```

#### 方法示例
Expand Down Expand Up @@ -442,57 +443,7 @@ knowledge.upload_documents(
)
```

### 9、向知识库添加文档 `KnowledgeBase().add_document()->KnowledgeBaseAddDocumentResponse`

#### 方法参数
| 参数名称 | 参数类型 | 是否必传 | 描述 | 示例值 |
| ------------------- | ----------------- | -------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| content_type | enum[str] || 知识库文档类型,有raw_text和qa两种可选,分别对应文本文档 和 结构化的excel问答对 | "raw_text" |
| file_ids | list[str] || 文件ID列表,文件ID通过`upload_file`接口获得 | ['ef12119b-d5be-492a-997c-77f8e84axxxx'] |
| is_enhanced | bool || 文档是否开启基于大模型的知识增强 | False |
| custom_process_rule | CustomProcessRule || 文档的自定义切分逻辑 | `appbuilder.CustomProcessRule(separators=["?"], target_length=400,overlap_rate=0.2)` |


`CustomProcessRule` 类定义如下:
```python
class CustomProcessRule(BaseModel):
separators: list[str] = Field(..., description="分段符号列表", example=[",", "?"])
target_length: int = Field(..., description="分段最大长度", ge=300, le=1200)
overlap_rate: float = Field(..., description="分段重叠最大字数占比,推荐值0.25", ge=0, le=0.3, example=0.2)

```

#### 方法返回值
方法返回`KnowledgeBaseAddDocumentResponse`,该类定义如下:
```python
class KnowledgeBaseAddDocumentResponse(BaseModel):
request_id: str = Field(..., description="请求ID")
knowledge_base_id: str = Field(..., description="知识库ID")
document_ids: list[str] = Field(..., description="成功新建的文档id集合")
```

#### 方法示例
```python
import os
import appbuilder
os.environ["APPBUILDER_TOKEN"] = "your_appbuilder_token"

my_knowledge_base_id = "your_knowledge_base_id"
my_knowledge = appbuilder.KnowledgeBase(my_knowledge_base_id)
print("知识库ID: ", my_knowledge.knowledge_id)

upload_res = my_knowledge.upload_file("./test.txt")
print("文件上传结果: ",upload_res)

add_res = my_knowledge.add_document(content_type='raw_text', file_ids=[upload_res.id])
print("添加文档结果: ",add_res)

# 知识库ID: da51a988-cbe7-4b24-aa5b-768985e8xxxx
# 文件上传结果: request_id='255eec22-ec87-4564-bdeb-3e5623eaxxxx' id='ef12119b-d5be-492a-997c-77f8e84axxxx' name='test.txt'
# 添加文档结果: request_id='412e1630-b570-47c9-a042-caf3cd9dxxxx' knowledge_base_id='da51a988-cbe7-4b24-aa5b-768985e8xxxx' document_ids=['5e0eb279-7688-4100-95d1-241f3d19xxxx']
```

### 10、从知识库删除文档 `KnowledgeBase().delete_document()->KnowledgeBaseDeleteDocumentResponse`
### 9、从知识库删除文档 `KnowledgeBase().delete_document()->KnowledgeBaseDeleteDocumentResponse`

#### 方法参数
| 参数名称 | 参数类型 | 是否必传 | 描述 | 示例值 |
Expand Down Expand Up @@ -533,7 +484,7 @@ print("删除文档结果: ",delete_res)
# 删除文档结果: request_id='ba0e8bc0-b799-45b5-bdac-0d4c50e2xxxx'
```

### 11、获取知识库的文档列表`KnowledgeBase().get_documents_list()->KnowledgeBaseGetDocumentsListResponse`
### 10、获取知识库的文档列表`KnowledgeBase().get_documents_list()->KnowledgeBaseGetDocumentsListResponse`

#### 方法参数
| 参数名称 | 参数类型 | 是否必传 | 描述 | 示例值 |
Expand Down Expand Up @@ -584,7 +535,7 @@ print("文档列表: ", list_res)
# 文档列表: request_id='f66c2193-6035-4022-811b-c4cd7743xxxx' data=[{'id': '8f388b10-5e6a-423f-8acc-dd5fdc2fxxxx', 'name': 'test.txt', 'created_at': 1719988868, 'word_count': 16886, 'enabled': True, 'meta': {'source': 'upload_file', 'file_id': '0ebb03fb-ea48-4c49-b494-cf0cec11xxxx'}}, {'id': '5e0eb279-7688-4100-95d1-241f3d19xxxx', 'name': 'test.txt', 'created_at': 1719987921, 'word_count': 16886, 'enabled': True, 'meta': {'source': 'upload_file', 'file_id': '059e2ae2-1e3c-43ea-8b42-5d988f93xxxx'}}]
```

### 12、获取知识库全部文档`KnowledgeBase().get_all_documents()->list`
### 11、获取知识库全部文档`KnowledgeBase().get_all_documents()->list`

#### 方法参数
| 参数名称 | 参数类型 | 是否必传 | 描述 | 示例值 |
Expand Down Expand Up @@ -625,7 +576,7 @@ for message in doc_list:
print(message)
```

### 13. 创建切片`create_chunk(documentId: str, content: str) -> CreateChunkResponse`
### 12. 创建切片`create_chunk(documentId: str, content: str) -> CreateChunkResponse`

#### 方法参数

Expand Down Expand Up @@ -659,7 +610,7 @@ print("切片ID: ", resp.id)
chunk_id = resp.id
```

### 14. 修改切片信息`modify_chunk(chunkId: str, content: str, enable: bool)`
### 13. 修改切片信息`modify_chunk(chunkId: str, content: str, enable: bool)`

#### 方法参数

Expand All @@ -683,7 +634,7 @@ print("知识库ID: ", my_knowledge.knowledge_id)
my_knowledge.modify_chunk("your_chunk_id", "content", True, knowledgebase_id=my_knowledge_base_id)
```

### 15. 删除切片`delete_chunk(chunkId: str)`
### 14. 删除切片`delete_chunk(chunkId: str)`

#### 方法参数

Expand All @@ -705,7 +656,7 @@ print("知识库ID: ", my_knowledge.knowledge_id)
my_knowledge.delete_chunk("your_chunk_id", knowledgebase_id=my_knowledge_base_id)
```

### 16. 获取切片信息`describe_chunk(chunkId: str)`
### 15. 获取切片信息`describe_chunk(chunkId: str)`

#### 方法参数

Expand Down Expand Up @@ -750,7 +701,7 @@ print("切片详情:")
print(resp)
```

### 17. 获取切片列表`describe_chunks(documentId: str, marker: str = None, maxKeys: int = None, type: str = None) -> DescribeChunksResponse`
### 16. 获取切片列表`describe_chunks(documentId: str, marker: str = None, maxKeys: int = None, type: str = None) -> DescribeChunksResponse`

#### 方法参数

Expand Down Expand Up @@ -828,7 +779,7 @@ public class KnowledgebaseTest {
}

@Test
public void testAddDocument() throws IOException, AppBuilderServerException {
public void testDocument() throws IOException, AppBuilderServerException {
// 实例化Knowledgebase
String knowledgeBaseId = "";
Knowledgebase knowledgebase = new Knowledgebase();
Expand All @@ -838,24 +789,11 @@ public class KnowledgebaseTest {
listRequest.setKonwledgeBaseId(knowledgeBaseId);
listRequest.setLimit(10);
Document[] documents = knowledgebase.getDocumentList(listRequest);

// 向知识库添加文档
DocumentAddRequest request = new DocumentAddRequest();
request.setKnowledgeBaseId(knowledgeBaseId);
request.setContentType("raw_text");
request.setFileIds(new String[] { fileId });
DocumentAddRequest.CustomProcessRule customProcessRule = new DocumentAddRequest.CustomProcessRule();
customProcessRule.setSeparators(new String[] { "" });
customProcessRule.setTargetLength(300);
customProcessRule.setOverlapRate(0.25);
request.setCustomProcessRule(customProcessRule);
String[] documentsRes = knowledgebase.addDocument(request);
assertNotNull(documentsRes);

// 从知识库删除文档
DocumentDeleteRequest deleteRequest = new DocumentDeleteRequest();
deleteRequest.setKonwledgeBaseId(knowledgeBaseId);
deleteRequest.setDocumentId(documentsRes[0]);
deleteRequest.setDocumentId("期望删除的DocumentId");
knowledgebase.deleteDocument(deleteRequest);
}

Expand Down Expand Up @@ -994,26 +932,10 @@ func TestKnowledgeBase(t *testing.T) {
}
fmt.Println(documentsRes)

// 向知识库中添加文档
createDocumentRes, err := client.CreateDocument(CreateDocumentRequest{
KnowledgeBaseID: knowledgeBaseID,
ContentType: ContentTypeRawText,
FileIDS: []string{fileID},
CustomProcessRule: &CustomProcessRule{
Separators: []string{""},
TargetLength: 300,
OverlapRate: 0.25,
},
})
if err != nil {
t.Fatalf("create document failed: %v", err)
}
fmt.Println(createDocumentRes)

// 从知识库中删除文档
err = client.DeleteDocument(DeleteDocumentRequest{
KnowledgeBaseID: knowledgeBaseID,
DocumentID: createDocumentRes.DocumentsIDS[0]})
DocumentID: "期望删除的DocumentID"})
if err != nil {
t.Fatalf("delete document failed: %v", err)
}
Expand Down
1 change: 1 addition & 0 deletions go/appbuilder/knowledge_base.go
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,7 @@ func (t *KnowledgeBase) deleteKnowledgeBase(knowledgeBaseID string, clientToken
return nil
}

// Deprecated: 此功能即将废弃
func (t *KnowledgeBase) CreateDocuments(req CreateDocumentsRequest) error {
request := http.Request{}
header := t.sdkConfig.AuthHeaderV2()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ private String innerUploadFile(String filePath, String clientToken) throws IOExc
* @throws IOException 当文件上传失败时抛出IOException
* @throws AppBuilderServerException 当服务器返回错误码时抛出AppBuilderServerException
*/
@Deprecated
public String[] addDocument(DocumentAddRequest req)
throws IOException, AppBuilderServerException {
return innerAddDocument(req, java.util.UUID.randomUUID().toString());
Expand All @@ -109,6 +110,7 @@ public String[] addDocument(DocumentAddRequest req)
* @throws IOException 如果发生输入/输出异常,抛出此异常
* @throws AppBuilderServerException 如果应用程序构建服务器发生异常,抛出此异常
*/
@Deprecated
public String[] addDocument(DocumentAddRequest req, String clientToken)
throws IOException, AppBuilderServerException {
return innerAddDocument(req, clientToken);
Expand Down
14 changes: 13 additions & 1 deletion python/core/console/knowledge_base/data_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,22 @@ class KnowledgeBaseGetListRequest(BaseModel):
10, description="返回文档数量大小,默认10,最大值100", le=100, ge=1
)

class KnowledgeBaseGetListConfigIndex(BaseModel):
type: str = Field(None, description="索引类型")
esUrl: Optional[str] = Field('', description="es地址")

class KnowledgeBaseGetListConfig(BaseModel):
index: Optional[KnowledgeBaseGetListConfigIndex] = Field(..., description="索引配置")

class KnowledgeBaseGetListDetailResponse(BaseModel):
id: str = Field(..., description="知识库ID")
name: str = Field(..., description="知识库名称")
description: Optional[str] = Field(None, description="知识库描述")
config: Optional[KnowledgeBaseGetListConfig] = Field(..., description="知识库配置")

class KnowledgeBaseGetListResponse(BaseModel):
requestId: str = Field(..., description="请求ID")
data: list[KnowledgeBaseDetailResponse] = Field([], description="知识库详情列表")
data: list[KnowledgeBaseGetListDetailResponse] = Field([], description="知识库详情列表")
marker: str = Field(..., description="起始位置")
nextMarker: str = Field(..., description="下一页起始位置")
maxKeys: int = Field(10, description="返回文档数量大小,默认10,最大值100")
Expand Down
7 changes: 5 additions & 2 deletions python/core/console/knowledge_base/knowledge_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ def upload_file(

return resp

@deprecated()
def add_document(
self,
content_type: str,
Expand Down Expand Up @@ -286,9 +287,10 @@ def create_knowledge_base(
name: str,
description: str,
type: str = "public",
esUrl: str = None,
clusterId: str = None,
esUserName: str = None,
esPassword: str = None,
location: str = None,
client_token: str = None,
) -> data_class.KnowledgeBaseDetailResponse:
r"""
Expand Down Expand Up @@ -325,9 +327,10 @@ def create_knowledge_base(
config={
"index": {
"type": type,
"esUrl": esUrl,
"clusterId": clusterId,
"username": esUserName,
"password": esPassword,
"location": location,
}
},
)
Expand Down
3 changes: 0 additions & 3 deletions python/tests/test_knowledge_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,6 @@ def test_create_knowledge_base(self):
name="test",
description="test",
type="public",
esUrl="http://localhost:9200",
esUserName="elastic",
esPassword="changeme",
)
knowledge_base_id = resp.id
knowledge.get_knowledge_base_detail(knowledge_base_id)
Expand Down

0 comments on commit 7577c3c

Please sign in to comment.