Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(solver): catch solver execption in main solver #305

Closed
wants to merge 323 commits into from
Closed
Changes from 27 commits
Commits
Show all changes
323 commits
Select commit Hold shift + click to select a range
dba5781
spg guided relation extraction
zhuzhongshu123 Dec 6, 2024
39a27ca
Merge pull request #111 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 6, 2024
2ffada9
fix dict parse with same key
zhuzhongshu123 Dec 9, 2024
b542252
Merge pull request #113 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 9, 2024
dc1e646
rename graphalgoclient to graphclient
northmachine Dec 10, 2024
a82bbad
rename graphalgoclient to graphclient
northmachine Dec 10, 2024
f13f82d
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 10, 2024
8c4f6e7
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 11, 2024
b868e06
file reader supports http url
zhuzhongshu123 Dec 11, 2024
c241ab9
Merge pull request #117 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 11, 2024
2eba9bf
add checkpointer class
zhuzhongshu123 Dec 11, 2024
507a510
parser supports checkpoint
zhuzhongshu123 Dec 12, 2024
d9a071a
add build
northmachine Dec 12, 2024
13fa713
remove incorrect logs
zhuzhongshu123 Dec 12, 2024
38ac5ad
Merge pull request #121 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 12, 2024
1986b1e
remove logs
zhuzhongshu123 Dec 13, 2024
6351fa5
update examples
zhuzhongshu123 Dec 13, 2024
5665991
update chain checkpointer
zhuzhongshu123 Dec 13, 2024
cf37bb6
Merge pull request #124 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 13, 2024
ed349f1
vectorizer batch size set to 32
zhuzhongshu123 Dec 13, 2024
d878d82
Merge pull request #125 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 13, 2024
66a6550
add a zodb backended checkpointer
zhuzhongshu123 Dec 13, 2024
433bc50
add a zodb backended checkpointer
zhuzhongshu123 Dec 13, 2024
03f5153
Merge pull request #127 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 13, 2024
263b073
fix zodb based checkpointer
zhuzhongshu123 Dec 16, 2024
115515b
Merge pull request #128 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
b78b7a3
add thread for zodb IO
zhuzhongshu123 Dec 16, 2024
5a9c73e
fix(common): resolve mutlithread conflict in zodb IO
zhuzhongshu123 Dec 16, 2024
1b147d0
Merge pull request #131 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
8f61b2a
fix(common): load existing zodb checkpoints
zhuzhongshu123 Dec 16, 2024
fd3e5f9
Merge pull request #132 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
3261cda
update examples
zhuzhongshu123 Dec 16, 2024
48214ea
Merge pull request #133 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
0f0481c
update examples
zhuzhongshu123 Dec 16, 2024
519dbcf
Merge pull request #134 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
958f341
fix zodb writer
zhuzhongshu123 Dec 16, 2024
b7e074d
Merge pull request #135 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
1211d8a
add docstring
zhuzhongshu123 Dec 16, 2024
7cab3db
fix jieba version mismatch
zhuzhongshu123 Dec 16, 2024
eae6b4c
Merge pull request #136 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 16, 2024
0bb611b
commit kag_config-tc.yaml
caszkgui Dec 17, 2024
c69cd59
commit kag_config-tc.yaml
caszkgui Dec 17, 2024
5dc70ab
1、fix bug in base_table_splitter
caszkgui Dec 17, 2024
f0231d2
1、fix bug in base_table_splitter
caszkgui Dec 17, 2024
5ad69c3
1、fix bug in default_chain
caszkgui Dec 17, 2024
8daabee
增加solver
royzhao Dec 17, 2024
9490a99
add kag
royzhao Dec 17, 2024
0a95c98
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 18, 2024
135dc56
update outline splitter
northmachine Dec 18, 2024
eb70c53
add main test
northmachine Dec 18, 2024
689f5df
add op
royzhao Dec 18, 2024
c933df1
code refactor
zhuzhongshu123 Dec 18, 2024
e090432
Merge pull request #140 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 18, 2024
6d60513
add tools
royzhao Dec 18, 2024
4da8a06
fix outline splitter
zhuzhongshu123 Dec 18, 2024
d4d2a62
fix outline prompt
zhuzhongshu123 Dec 18, 2024
43e41ca
Merge pull request #142 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 18, 2024
344cee6
graph api pass
royzhao Dec 18, 2024
f5666fd
commit with page rank
royzhao Dec 18, 2024
45837b6
add search api and graph api
royzhao Dec 18, 2024
aeb980c
add markdown report
northmachine Dec 13, 2024
c5aa82a
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 19, 2024
e8511dd
fix vectorizer num batch compute
zhuzhongshu123 Dec 19, 2024
2d5c3a4
add retry for vectorize model call
zhuzhongshu123 Dec 19, 2024
eeffee2
Merge pull request #145 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 19, 2024
2577bdb
update markdown reader
northmachine Dec 19, 2024
eecc99b
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 19, 2024
140c02b
update markdown reader
northmachine Dec 19, 2024
4ae44a3
update pdf reader
northmachine Dec 19, 2024
cf32022
raise extractor failure
zhuzhongshu123 Dec 19, 2024
08bb5b8
Merge pull request #146 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 19, 2024
645230c
add default expr
royzhao Dec 19, 2024
b2d68c7
add log
royzhao Dec 19, 2024
43ab3d9
merge jc reader features
northmachine Dec 19, 2024
86d2189
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 19, 2024
af2d820
rm import
northmachine Dec 19, 2024
e584db5
add build
northmachine Dec 12, 2024
f2853bc
fix zodb based checkpointer
zhuzhongshu123 Dec 16, 2024
8efa84b
add thread for zodb IO
zhuzhongshu123 Dec 16, 2024
b67bf49
fix(common): resolve mutlithread conflict in zodb IO
zhuzhongshu123 Dec 16, 2024
2517812
fix(common): load existing zodb checkpoints
zhuzhongshu123 Dec 16, 2024
fb55f0e
update examples
zhuzhongshu123 Dec 16, 2024
f566020
update examples
zhuzhongshu123 Dec 16, 2024
56e244a
fix zodb writer
zhuzhongshu123 Dec 16, 2024
c8f2413
add docstring
zhuzhongshu123 Dec 16, 2024
3109e47
fix jieba version mismatch
zhuzhongshu123 Dec 16, 2024
896abea
commit kag_config-tc.yaml
caszkgui Dec 17, 2024
b1f7dce
commit kag_config-tc.yaml
caszkgui Dec 17, 2024
bd39d30
1、fix bug in base_table_splitter
caszkgui Dec 17, 2024
f0260fa
1、fix bug in base_table_splitter
caszkgui Dec 17, 2024
7fb0de8
1、fix bug in default_chain
caszkgui Dec 17, 2024
7f0cf73
update outline splitter
northmachine Dec 18, 2024
5609b69
add main test
northmachine Dec 18, 2024
346b3cb
add markdown report
northmachine Dec 13, 2024
2d7d5ac
code refactor
zhuzhongshu123 Dec 18, 2024
524a605
fix outline splitter
zhuzhongshu123 Dec 18, 2024
ae3758f
fix outline prompt
zhuzhongshu123 Dec 18, 2024
d523cc8
update markdown reader
northmachine Dec 19, 2024
7e7cfae
fix vectorizer num batch compute
zhuzhongshu123 Dec 19, 2024
46388f6
add retry for vectorize model call
zhuzhongshu123 Dec 19, 2024
b7507fe
update markdown reader
northmachine Dec 19, 2024
82b57de
raise extractor failure
zhuzhongshu123 Dec 19, 2024
97c936e
rm parser
northmachine Dec 19, 2024
c213ceb
run pipeline
royzhao Dec 19, 2024
5f51fbf
add config option of whether to perform llm config check, default to …
zhuzhongshu123 Dec 20, 2024
726ecd7
Merge pull request #148 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 20, 2024
738969e
fix
royzhao Dec 20, 2024
0535526
merge
royzhao Dec 20, 2024
ba209ec
recover pdf reader
royzhao Dec 20, 2024
a90a501
several components can be null for default chain
zhuzhongshu123 Dec 20, 2024
e73aad0
Merge pull request #149 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 20, 2024
0d0f31d
支持完整qa运行
royzhao Dec 20, 2024
e119af3
add if
royzhao Dec 20, 2024
de5b207
remove unused code
royzhao Dec 20, 2024
f7c13d6
使用chunk兜底
royzhao Dec 20, 2024
d017fd7
excluded source relation to choose
royzhao Dec 20, 2024
aa2085d
add generate
royzhao Dec 20, 2024
47aadc5
default recall 10
royzhao Dec 20, 2024
e685e1c
add local memory
royzhao Dec 20, 2024
c5c4a38
排除相似边
royzhao Dec 20, 2024
a13cf4f
增加保护
royzhao Dec 20, 2024
2444084
修复并发问题
royzhao Dec 20, 2024
0805229
add debug logger
royzhao Dec 20, 2024
c1014bf
支持topk参数化
royzhao Dec 20, 2024
0673bdf
支持chunk截断和调整spo select 的prompt
royzhao Dec 20, 2024
8e6259e
增加查询请求保护
royzhao Dec 21, 2024
4398317
增加force_chunk配置
royzhao Dec 22, 2024
317d94c
fix entity linker algorithm
royzhao Dec 24, 2024
f27355d
增加sub query改写
royzhao Dec 24, 2024
1d7a44b
fix md reader dup in test
northmachine Dec 24, 2024
c93600a
fix
northmachine Dec 24, 2024
29865db
merge knext to kag parallel
northmachine Dec 24, 2024
d40e9fc
fix package
northmachine Dec 24, 2024
792383d
修复指标下跌问题
royzhao Dec 24, 2024
e736c0a
scanner update
zhuzhongshu123 Dec 24, 2024
a523daa
scanner update
zhuzhongshu123 Dec 24, 2024
aabe820
Merge pull request #156 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 24, 2024
9499c3f
add doc and update example scripts
zhuzhongshu123 Dec 24, 2024
0f6071e
Merge branch '0.6_dev' of github.com:zhuzhongshu123/KAG into 0.6_dev
zhuzhongshu123 Dec 24, 2024
82c75ff
Merge pull request #157 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 24, 2024
f90b242
fix
northmachine Dec 13, 2024
ee3bc50
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 24, 2024
3768a81
add bridge to spg server
zhuzhongshu123 Dec 24, 2024
f281b57
Merge pull request #158 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 24, 2024
7796b61
add format
royzhao Dec 24, 2024
adf8e01
Merge branch '0.6_dev' into v0.6_solver
royzhao Dec 24, 2024
7515112
fix bridge
zhuzhongshu123 Dec 24, 2024
80b152f
Merge pull request #159 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 24, 2024
63308ff
update conf for baike
caszkgui Dec 25, 2024
2ecb439
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
caszkgui Dec 25, 2024
6055c46
disable ckpt for spg server runner
zhuzhongshu123 Dec 25, 2024
69ce1f4
Merge pull request #163 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 25, 2024
d9019b8
llm invoke error default raise exceptions
zhuzhongshu123 Dec 25, 2024
8c7b454
Merge pull request #164 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 25, 2024
e0b1056
chore(version): bump version to X.Y.Z
zhuzhongshu123 Dec 25, 2024
1535672
Merge pull request #165 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 25, 2024
a9082bd
update default response generation prompt
xionghuaidong Dec 24, 2024
46ee79f
add method getSummarizationMetrics
xionghuaidong Dec 24, 2024
c0f1a86
Merge pull request #166 from xionghuaidong/0.6_dev
xionghuaidong Dec 25, 2024
7bb2b91
fix(common): fix project conf empty error
zhuzhongshu123 Dec 25, 2024
a7942df
Merge pull request #167 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 25, 2024
60871d7
fix typo
zhuzhongshu123 Dec 25, 2024
707e093
Merge pull request #168 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 25, 2024
5418120
增加上报信息
royzhao Dec 25, 2024
1e382ed
Merge remote-tracking branch 'origin/0.6_dev' into 0.6_dev
royzhao Dec 25, 2024
b8d0a93
修改main solver
royzhao Dec 25, 2024
78d63dd
postprocessor support spg server
zhuzhongshu123 Dec 25, 2024
2f0a750
Merge pull request #169 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 25, 2024
0ad0f40
修改solver支持名
royzhao Dec 25, 2024
8e9f8f5
fix language
royzhao Dec 25, 2024
03bb962
Merge remote-tracking branch 'origin/0.6_dev' into 0.6_dev
royzhao Dec 25, 2024
fd272d7
修改chunker接口,增加openapi
royzhao Dec 25, 2024
788ba06
rename vectorizer to vectorize_model in spg server config
zhuzhongshu123 Dec 26, 2024
9599740
Merge pull request #172 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 26, 2024
b64fee5
generate_random_string start with gen
royzhao Dec 26, 2024
a905f37
Merge remote-tracking branch 'origin/0.6_dev' into 0.6_dev
royzhao Dec 26, 2024
2214389
merge master
zhuzhongshu123 Dec 26, 2024
428e371
Merge pull request #173 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 26, 2024
697f6ca
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 26, 2024
5c52367
add knext llm vector checker
northmachine Dec 26, 2024
b4d878a
add knext llm vector checker
northmachine Dec 26, 2024
38ff935
add knext llm vector checker
northmachine Dec 26, 2024
f823cf2
solver移除默认值
royzhao Dec 26, 2024
2f82bef
Merge remote-tracking branch 'origin/0.6_dev' into 0.6_dev
royzhao Dec 26, 2024
20c447d
udpate yaml and register_name for baike
caszkgui Dec 27, 2024
ef8cbea
udpate yaml and register_name for baike
caszkgui Dec 27, 2024
b3034d5
remove config key check
zhuzhongshu123 Dec 27, 2024
a3a5620
Merge pull request #176 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 27, 2024
870ee22
修复llmmodule
royzhao Dec 27, 2024
3b7dd83
Merge remote-tracking branch 'origin/0.6_dev' into 0.6_dev
royzhao Dec 27, 2024
e72f257
fix knext project
northmachine Dec 27, 2024
cc73a02
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Dec 27, 2024
2e77759
udpate yaml and register_name for examples
caszkgui Dec 27, 2024
b624233
Merge branch '0.6_dev' into 0.6_dev_tc
caszkgui Dec 27, 2024
b3fa5ca
udpate yaml and register_name for examples
caszkgui Dec 27, 2024
63a48bb
Revert "udpate yaml and register_name for examples"
caszkgui Dec 27, 2024
26fdbc0
Merge branch '0.6_dev_tc' of github.com:OpenSPG/KAG into 0.6_dev_tc
caszkgui Dec 27, 2024
192f213
update register name
caszkgui Dec 27, 2024
da5dfb5
fix
northmachine Dec 27, 2024
9b3a73b
fix
northmachine Dec 27, 2024
058bb64
support multiple resigter names
zhuzhongshu123 Dec 28, 2024
023272d
Merge pull request #179 from zhuzhongshu123/0.6_dev_tc
zhuzhongshu123 Dec 28, 2024
10be305
Merge pull request #177 from OpenSPG/0.6_dev_tc
zhuzhongshu123 Dec 28, 2024
67081e8
update component
zhuzhongshu123 Dec 30, 2024
67098c7
Merge pull request #182 from zhuzhongshu123/0.6_dev
zhuzhongshu123 Dec 30, 2024
ce2145b
update reader register names (#183)
zhuzhongshu123 Dec 30, 2024
f366dbd
fix markdown reader
northmachine Dec 30, 2024
b569454
fix llm client for retry
northmachine Dec 30, 2024
7200591
feat(common): add processed chunk id checkpoint (#185)
zhuzhongshu123 Dec 30, 2024
0046f1d
feat(example): add example config (#186)
zhuzhongshu123 Dec 30, 2024
33c4c29
add max_workers parameter for getSummarizationMetrics to make it faster
xionghuaidong Dec 31, 2024
64ae4c1
add csqa data generation script generate_data.py
xionghuaidong Dec 31, 2024
d4fa519
commit generated csqa builder and solver data
xionghuaidong Dec 31, 2024
6abb2ed
add csqa basic project files
xionghuaidong Dec 31, 2024
76ee808
adjust split_length and num_threads_per_chain to match lightrag settings
xionghuaidong Dec 31, 2024
1b4537f
ignore ckpt dirs
xionghuaidong Dec 31, 2024
7e7b337
add csqa evaluation script eval.py
xionghuaidong Dec 31, 2024
725e5b2
save evaluation scripts summarization_metrics.py and factual_correctn…
xionghuaidong Dec 31, 2024
9d368d7
save LightRAG output csqa_lightrag_answers.json
xionghuaidong Dec 31, 2024
5ad618a
ignore KAG output csqa_kag_answers.json
xionghuaidong Dec 31, 2024
a4bf052
add README.md for CSQA
xionghuaidong Dec 31, 2024
a3aaad3
Merge pull request #189 from xionghuaidong/0.6_dev
xionghuaidong Dec 31, 2024
6a098c0
fix(solver): fix solver pipeline conf (#191)
zhuzhongshu123 Dec 31, 2024
3342699
update links and file paths
xionghuaidong Dec 31, 2024
def77b0
reformat csqa kag_config.yaml
xionghuaidong Dec 31, 2024
75698af
reformat csqa python files
xionghuaidong Dec 31, 2024
6a7a7e0
reformat getSummarizationMetrics and compare_summarization_answers
xionghuaidong Dec 31, 2024
fee2f3f
Merge pull request #193 from xionghuaidong/0.6_dev
xionghuaidong Dec 31, 2024
c63959d
fix(solver): fix solver config (#192)
zhuzhongshu123 Dec 31, 2024
de23e1b
add except
royzhao Dec 31, 2024
d68db6d
fix typo in csqa README.md
xionghuaidong Dec 31, 2024
51b08f3
feat(conf): support reinitialize config for call from java side (#199)
zhuzhongshu123 Jan 2, 2025
597eb6e
revert default response generation prompt
xionghuaidong Jan 2, 2025
94026f1
update project list
northmachine Jan 2, 2025
ba552cf
Merge pull request #201 from xionghuaidong/0.6_dev
xionghuaidong Jan 2, 2025
e6948ef
Merge branch '0.6_dev' of github.com:OpenSPG/KAG into 0.6_dev
northmachine Jan 2, 2025
1ff36a2
add README.md for the hotpotqa, 2wiki and musique examples
xionghuaidong Jan 2, 2025
1659147
增加spo检索
royzhao Dec 27, 2024
bc14276
turn off kag config dump by default
xionghuaidong Jan 3, 2025
a0053f4
turn off knext schema dump by default
xionghuaidong Jan 3, 2025
457a5c6
add .gitignore and fix kag_config.yaml
xionghuaidong Jan 3, 2025
1e04078
add README.md for the medicine example
xionghuaidong Jan 3, 2025
97fbcad
add README.md for the supplychain example
xionghuaidong Jan 3, 2025
e7b7fdb
Merge pull request #204 from xionghuaidong/0.6_dev
xionghuaidong Jan 3, 2025
a41ca02
bugfix for risk mining
royzhao Jan 3, 2025
867944b
Merge remote-tracking branch 'origin/0.6_dev' into 0.6_dev
royzhao Jan 3, 2025
eb87e45
use exact out
royzhao Jan 3, 2025
edd170f
refactor(solver): format solver code (#205)
zhuzhongshu123 Jan 3, 2025
419ea61
add chunk
royzhao Jan 17, 2025
8027a43
add retry
royzhao Jan 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions kag/bridge/spg_server_bridge.py
Original file line number Diff line number Diff line change
@@ -9,8 +9,17 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied.
import os
import json
import kag.interface as interface
from kag.common.conf import KAGConstants, init_env


def init_kag_config(project_id: str, host_addr: str):

os.environ[KAGConstants.ENV_KAG_PROJECT_ID] = project_id
os.environ[KAGConstants.ENV_KAG_PROJECT_HOST_ADDR] = host_addr
init_env()


class SPGServerBridge:
20 changes: 15 additions & 5 deletions kag/builder/default_chain.py
Original file line number Diff line number Diff line change
@@ -9,7 +9,6 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied.

import logging
from concurrent.futures import ThreadPoolExecutor, as_completed
from kag.interface import (
@@ -22,7 +21,6 @@
SinkWriterABC,
KAGBuilderChain,
)

from kag.common.utils import generate_hash_id

logger = logging.getLogger(__name__)
@@ -155,16 +153,28 @@ def run_extract(chunk):
if node is None:
continue
flow_data = execute_node(node, flow_data, key=input_key)
return flow_data
return {input_key: flow_data[0]}

reader_output = self.reader.invoke(input_data, key=generate_hash_id(input_data))
splitter_output = []

for chunk in reader_output:
splitter_output.extend(self.splitter.invoke(chunk, key=chunk.hash_key))

processed_chunk_keys = kwargs.get("processed_chunk_keys", set())
filtered_chunks = []
processed = 0
for chunk in splitter_output:
if chunk.hash_key not in processed_chunk_keys:
filtered_chunks.append(chunk)
else:
processed += 1
logger.debug(
f"Total chunks: {len(splitter_output)}. Checkpointed: {processed}, Pending: {len(filtered_chunks)}."
)
result = []
with ThreadPoolExecutor(max_workers) as executor:
futures = [executor.submit(run_extract, chunk) for chunk in splitter_output]
futures = [executor.submit(run_extract, chunk) for chunk in filtered_chunks]

from tqdm import tqdm

@@ -176,5 +186,5 @@ def run_extract(chunk):
leave=False,
):
ret = inner_future.result()
result.extend(ret)
result.append(ret)
return result
23 changes: 22 additions & 1 deletion kag/builder/runner.py
Original file line number Diff line number Diff line change
@@ -108,6 +108,14 @@ def __init__(
"world_size": self.scanner.sharding_info.get_world_size(),
}
)
self.processed_chunks = CheckpointerManager.get_checkpointer(
{
"type": "zodb",
"ckpt_dir": os.path.join(self.ckpt_dir, "chain"),
"rank": self.scanner.sharding_info.get_rank(),
"world_size": self.scanner.sharding_info.get_world_size(),
}
)
self._local = threading.local()

def invoke(self, input):
@@ -135,7 +143,11 @@ def invoke(self, input):

def process(data, data_id, data_abstract):
try:
result = self.chain.invoke(data, max_workers=self.num_threads_per_chain)
result = self.chain.invoke(
data,
max_workers=self.num_threads_per_chain,
processed_chunk_keys=self.processed_chunks.keys(),
)
return data, data_id, data_abstract, result
except Exception:
traceback.print_exc()
@@ -177,6 +189,15 @@ def process(data, data_id, data_abstract):
num_nodes += len(item.nodes)
num_edges += len(item.edges)
num_subgraphs += 1
elif isinstance(item, dict):

for k, v in item.items():
self.processed_chunks.write_to_ckpt(k, k)
if isinstance(v, SubGraph):
num_nodes += len(v.nodes)
num_edges += len(v.edges)
num_subgraphs += 1

info = {
"num_nodes": num_nodes,
"num_edges": num_edges,
30 changes: 20 additions & 10 deletions kag/common/benchmarks/evaUtils.py
Original file line number Diff line number Diff line change
@@ -122,9 +122,17 @@ def get_em_f1(prediction, gold):
return float(em), f1


def compare_summarization_answers(query, answer1, answer2, *,
api_key="EMPTY", base_url="http://127.0.0.1:38080/v1", model="gpt-4o-mini",
language="English", retries=3):
def compare_summarization_answers(
query,
answer1,
answer2,
*,
api_key="EMPTY",
base_url="http://127.0.0.1:38080/v1",
model="gpt-4o-mini",
language="English",
retries=3,
):
"""
Given a query and two answers, compare the answers with an LLM for Comprehensiveness, Diversity and Empowerment.

@@ -213,7 +221,7 @@ def compare_summarization_answers(query, answer1, answer2, *,
messages=[
{"role": "system", "content": sys_prompt},
{"role": "user", "content": prompt},
]
],
)
content = response.choices[0].message.content
if content.startswith("```json") and content.endswith("```"):
@@ -222,11 +230,13 @@ def compare_summarization_answers(query, answer1, answer2, *,
return metrics
except Exception:
if index == retries - 1:
message = (f"Comparing summarization answers failed.\n"
f"query: {query}\n"
f"answer1: {answer1}\n"
f"answer2: {answer2}\n"
f"content: {content}\n"
f"exception:\n{traceback.format_exc()}")
message = (
f"Comparing summarization answers failed.\n"
f"query: {query}\n"
f"answer1: {answer1}\n"
f"answer2: {answer2}\n"
f"content: {content}\n"
f"exception:\n{traceback.format_exc()}"
)
print(message)
return None
74 changes: 55 additions & 19 deletions kag/common/benchmarks/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from typing import List
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed

from .evaUtils import get_em_f1
from .evaUtils import compare_summarization_answers
@@ -67,10 +69,19 @@ def getBenchMark(self, predictionlist: List[str], goldlist: List[str]):
# Return evaluation metrics dictionary
return total_metrics


def getSummarizationMetrics(self, queries: List[str], answers1: List[str], answers2: List[str], *,
api_key="EMPTY", base_url="http://127.0.0.1:38080/v1", model="gpt-4o-mini",
language="English", retries=3):
def getSummarizationMetrics(
self,
queries: List[str],
answers1: List[str],
answers2: List[str],
*,
api_key="EMPTY",
base_url="http://127.0.0.1:38080/v1",
model="gpt-4o-mini",
language="English",
retries=3,
max_workers=50,
):
"""
Calculates and returns QFS (query-focused summarization) evaluation metrics
for the given queries, answers1 and answers2.
@@ -87,31 +98,56 @@ def getSummarizationMetrics(self, queries: List[str], answers1: List[str], answe
model (str): model name to use when invoke the evaluating LLM.
language (str): language of the explanation
retries (int): number of retries
max_workers (int): number of workers

Returns:
dict: Dictionary containing the average metrics and the responses
generated by the evaluating LLM.
"""
responses = []
responses = [None] * len(queries)
all_keys = "Comprehensiveness", "Diversity", "Empowerment", "Overall"
all_items = "Score 1", "Score 2"
average_metrics = {key: {item: 0.0 for item in all_items} for key in all_keys}
success_count = 0
for index, (query, answer1, answer2) in enumerate(zip(queries, answers1, answers2)):
metrics = compare_summarization_answers(query, answer1, answer2,
api_key=api_key, base_url=base_url, model=model,
language=language, retries=retries)

def process_sample(index, query, answer1, answer2):
metrics = compare_summarization_answers(
query,
answer1,
answer2,
api_key=api_key,
base_url=base_url,
model=model,
language=language,
retries=retries,
)
if metrics is None:
print(f"fail to compare answers of query {index + 1}.\n"
f" query: {query}\n"
f" answer1: {answer1}\n"
f" answer2: {answer2}\n")
responses.append(metrics)
if metrics is not None:
for key in all_keys:
for item in all_items:
average_metrics[key][item] += metrics[key][item]
success_count += 1
print(
f"fail to compare answers of query {index + 1}.\n"
f" query: {query}\n"
f" answer1: {answer1}\n"
f" answer2: {answer2}\n"
)
else:
responses[index] = metrics
return metrics

with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = [
executor.submit(process_sample, index, query, answer1, answer2)
for index, (query, answer1, answer2) in enumerate(
zip(queries, answers1, answers2)
)
]
for future in tqdm(
as_completed(futures), total=len(futures), desc="Evaluating: "
):
metrics = future.result()
if metrics is not None:
for key in all_keys:
for item in all_items:
average_metrics[key][item] += metrics[key][item]
success_count += 1
if success_count > 0:
for key in all_keys:
for item in all_items:
10 changes: 10 additions & 0 deletions kag/common/checkpointer/base.py
Original file line number Diff line number Diff line change
@@ -111,6 +111,16 @@ def exists(self, key):
"""
raise NotImplementedError("close not implemented yet.")

def keys(self):
"""
Returns the key set contained in the checkpoint file.

Returns:
set: The key set contained in the checkpoint.
"""

raise NotImplementedError("keys not implemented yet.")

def size(self):
"""
Return the number of records in the checkpoint file.
8 changes: 8 additions & 0 deletions kag/common/checkpointer/bin_checkpointer.py
Original file line number Diff line number Diff line change
@@ -92,6 +92,9 @@ def size(self):

return len(self._ckpt)

def keys(self):
return set(self._ckpt.keys())


@CheckPointer.register("zodb")
class ZODBCheckPointer(CheckPointer):
@@ -207,3 +210,8 @@ def size(self):
with self._lock:
with self._ckpt.transaction() as conn:
return len(conn.root.data)

def keys(self):
with self._lock:
with self._ckpt.transaction() as conn:
return set(conn.root.data.keys())
3 changes: 3 additions & 0 deletions kag/common/checkpointer/txt_checkpointer.py
Original file line number Diff line number Diff line change
@@ -87,3 +87,6 @@ def _close(self):

def size(self):
return len(self._ckpt)

def keys(self):
return set(self._ckpt.keys())
Loading