We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在论文里,初始每一个 cluster 的采样权重是均分的,都为1/k,后续每一轮各个cluster的采样权重是根据 reward_diff 的 softmax 值对上一轮的 weight 做加权。
但从 iter.py 里的实现代码来看,是直接用了reward_diff 的 softmax 值作为了下一轮的采样权重。想问下论文中的结果是使用的哪种方式?
def select_new_iter(rewards_gathered, dataset, indices_path, calculate_method="exp_reward_diff"): # ....(省略) merged_df = subset_df.merge(rewards_df, left_index=True, right_index=True) merged_df = merged_df.groupby('cluster')['reward_diff'].mean().reset_index() if calculate_method == "ppl": merged_df['exp_reward_diff'] = merged_df['reward_diff'] else: merged_df['exp_reward_diff'] = np.exp(merged_df['reward_diff']) merged_df['exp_reward_diff'] = merged_df['exp_reward_diff'] / merged_df['exp_reward_diff'].sum() size = (len(dataset) * portion) / K / round exp_reward_diff = merged_df['exp_reward_diff'] # 下面这一行是直接用了 softmax 做新一轮采样 weight,没有乘以上一轮weight,与论文不一样? select_new_iter = np.random.choice(K, size=int(size), p=exp_reward_diff, replace=True) selected_clusters_size = Counter(select_new_iter) # ....(省略)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
在论文里,初始每一个 cluster 的采样权重是均分的,都为1/k,后续每一轮各个cluster的采样权重是根据 reward_diff 的 softmax 值对上一轮的 weight 做加权。
但从 iter.py 里的实现代码来看,是直接用了reward_diff 的 softmax 值作为了下一轮的采样权重。想问下论文中的结果是使用的哪种方式?
The text was updated successfully, but these errors were encountered: