Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何使用Opendigger挖掘github上面的issue内容? #44

Open
byktue opened this issue Dec 17, 2024 · 1 comment
Open

如何使用Opendigger挖掘github上面的issue内容? #44

byktue opened this issue Dec 17, 2024 · 1 comment

Comments

@byktue
Copy link

byktue commented Dec 17, 2024

怎么连接clickhouse,需要使用什么环境?用python?还是使用linux虚拟机?有没有推荐的csdn教程博客(我看了好多个,但是都无法连接到clickhouse,我很懵逼)
我的要达到的目的是,采集github上面的issue内的具体文本数据,我该怎么使用opendigger来挖掘相应数据?能否采集连带着issue相应标签的数据?
或者是,如果不用opendigger,可以怎样挖掘issue文本内容数据?

@Tenth-crew
Copy link
Collaborator

Tenth-crew commented Dec 17, 2024

同学,目前的情况是这样,由于时间关系,我现在不能无法快速的走一遍opendigger流程。但我可以为了提供另一个方法获得issue 文本内容数据。

具体方式为利用 GitHub 的 REST api,通过这个api向GitHub请求数据,参考使用方式链接为:https://docs.github.com/zh/rest/issues/comments?apiVersion=2022-11-28

如果对REST api感到陌生可以先通过快速入门去了解,本质上就是设置好你的令牌,教程是英文的,使用请保持耐心,一个简单的例子如下,该例子获取的不是文本内容,请结合教程调整

import csv
import json
import requests

# 定义基本的URL
base_url = 'https://api.github.com/repos/microsoft/vscode/issues/'

# 定义标头
headers = {
    'Accept': 'application/vnd.github+json',  
    'Authorization': 'Bearer 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',  # 添加授权标头(如果需要)
    'X-GitHub-Api-Version': '2022-11-28'  # 添加其他自定义标头
}

# 读取CSV文件
def read_csv_file(filename):
    issue_numbers = []
    with open(filename, 'r', newline='') as file:
        reader = csv.DictReader(file)
        for row in reader:
            issue_numbers.append(row['number'])
    return issue_numbers

# 发送API请求
def send_api_request(issue_number):
    api_url = base_url + issue_number + '/comments'
    response = requests.get(api_url, headers=headers)
    if response.status_code == 200:
        print("success!")
        return response.json()
    else:
        return None

# 从CSV文件中获取issue numbers
csv_filename = 'issue_id_30.csv'
issue_numbers = read_csv_file(csv_filename)

# 发送API请求并存储结果
results = []
for issue_number in issue_numbers:
    result = send_api_request(issue_number)
    if result is not None:
        results.append(result)

# 将结果输出到JSON文件
json_filename = '30_issues_comments.json'
with open(json_filename, 'w') as file:
    json.dump(results, file)

print('请求完成,结果已保存到', json_filename)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants