We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
怎么连接clickhouse,需要使用什么环境?用python?还是使用linux虚拟机?有没有推荐的csdn教程博客(我看了好多个,但是都无法连接到clickhouse,我很懵逼) 我的要达到的目的是,采集github上面的issue内的具体文本数据,我该怎么使用opendigger来挖掘相应数据?能否采集连带着issue相应标签的数据? 或者是,如果不用opendigger,可以怎样挖掘issue文本内容数据?
The text was updated successfully, but these errors were encountered:
同学,目前的情况是这样,由于时间关系,我现在不能无法快速的走一遍opendigger流程。但我可以为了提供另一个方法获得issue 文本内容数据。
具体方式为利用 GitHub 的 REST api,通过这个api向GitHub请求数据,参考使用方式链接为:https://docs.github.com/zh/rest/issues/comments?apiVersion=2022-11-28
如果对REST api感到陌生可以先通过快速入门去了解,本质上就是设置好你的令牌,教程是英文的,使用请保持耐心,一个简单的例子如下,该例子获取的不是文本内容,请结合教程调整
import csv import json import requests # 定义基本的URL base_url = 'https://api.github.com/repos/microsoft/vscode/issues/' # 定义标头 headers = { 'Accept': 'application/vnd.github+json', 'Authorization': 'Bearer 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', # 添加授权标头(如果需要) 'X-GitHub-Api-Version': '2022-11-28' # 添加其他自定义标头 } # 读取CSV文件 def read_csv_file(filename): issue_numbers = [] with open(filename, 'r', newline='') as file: reader = csv.DictReader(file) for row in reader: issue_numbers.append(row['number']) return issue_numbers # 发送API请求 def send_api_request(issue_number): api_url = base_url + issue_number + '/comments' response = requests.get(api_url, headers=headers) if response.status_code == 200: print("success!") return response.json() else: return None # 从CSV文件中获取issue numbers csv_filename = 'issue_id_30.csv' issue_numbers = read_csv_file(csv_filename) # 发送API请求并存储结果 results = [] for issue_number in issue_numbers: result = send_api_request(issue_number) if result is not None: results.append(result) # 将结果输出到JSON文件 json_filename = '30_issues_comments.json' with open(json_filename, 'w') as file: json.dump(results, file) print('请求完成,结果已保存到', json_filename)
Sorry, something went wrong.
No branches or pull requests
怎么连接clickhouse,需要使用什么环境?用python?还是使用linux虚拟机?有没有推荐的csdn教程博客(我看了好多个,但是都无法连接到clickhouse,我很懵逼)
我的要达到的目的是,采集github上面的issue内的具体文本数据,我该怎么使用opendigger来挖掘相应数据?能否采集连带着issue相应标签的数据?
或者是,如果不用opendigger,可以怎样挖掘issue文本内容数据?
The text was updated successfully, but these errors were encountered: