Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KMQ-Iterative 下新一轮采样的 weight 更新方式实现与论文不一致? #10

Open
Aurelius84 opened this issue Jan 9, 2025 · 0 comments

Comments

@Aurelius84
Copy link

在论文里,初始每一个 cluster 的采样权重是均分的,都为1/k,后续每一轮各个cluster的采样权重是根据 reward_diff 的 softmax 值对上一轮的 weight 做加权。

但从 iter.py 里的实现代码来看,是直接用了reward_diff 的 softmax 值作为了下一轮的采样权重。想问下论文中的结果是使用的哪种方式?

def select_new_iter(rewards_gathered, dataset, indices_path, calculate_method="exp_reward_diff"):
    # ....(省略)
    merged_df = subset_df.merge(rewards_df, left_index=True, right_index=True)
    merged_df = merged_df.groupby('cluster')['reward_diff'].mean().reset_index()
    if calculate_method == "ppl":
        merged_df['exp_reward_diff'] = merged_df['reward_diff']
    else:
        merged_df['exp_reward_diff'] = np.exp(merged_df['reward_diff'])
        merged_df['exp_reward_diff'] = merged_df['exp_reward_diff'] / merged_df['exp_reward_diff'].sum()
    size = (len(dataset) * portion) / K / round
    exp_reward_diff = merged_df['exp_reward_diff']
    
    # 下面这一行是直接用了 softmax 做新一轮采样 weight,没有乘以上一轮weight,与论文不一样?
    select_new_iter = np.random.choice(K, size=int(size), p=exp_reward_diff, replace=True)  
    selected_clusters_size = Counter(select_new_iter)
    # ....(省略)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant