Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Ranking Evaluation Metrics Exceed 1 with "by_threshold" Relevancy Method #2154

Open
2 tasks
mnhqut opened this issue Aug 23, 2024 · 4 comments
Open
2 tasks
Labels
bug Something isn't working

Comments

@mnhqut
Copy link

mnhqut commented Aug 23, 2024

Description

Hello!

I encountered an issue while evaluating the BPR (Bayesian Personalized Ranking) model with basically the same code provided in the example on a different dataset. Specifically, when using the "by_threshold" relevancy method with ranking metrics, the computed values for precision@k, ndcg@k, and map@k exceed 1, which seems incorrect. This issue does not occur when switching the relevancy method to "top_k."

How do we replicate the issue?

I use the following parameter for BPR (all using the default seed):

bpr = cornac.models.BPR(
    k=200,
    max_iter=100,
    learning_rate=0.01,
    lambda_reg=0.001,
    verbose=True 
)

Using these evaluation

TOP_K = 10
threshold =50
eval_map = map_at_k(test, all_predictions, col_prediction="prediction",
                    relevancy_method='by_threshold', threshold=threshold, k=TOP_K)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction="prediction",
                      relevancy_method='by_threshold', threshold=threshold, k=TOP_K)
eval_precision = precision_at_k(
    test, all_predictions, col_prediction="prediction",
    relevancy_method='by_threshold', threshold=threshold, k=TOP_K)

Here is the dataset I test on: https://github.com/mnhqut/rec_sys-dataset/blob/main/data.csv

My result:
MAP: 1.417529
NDCG: 1.359902
Precision@K: 2.256466

Willingness to contribute

  • Yes, I can contribute for this issue independently.
  • [x ] Yes, I can contribute for this issue with guidance from Recommenders community.
  • No, I cannot contribute at this time.
@mnhqut mnhqut added the bug Something isn't working label Aug 23, 2024
@mnhqut
Copy link
Author

mnhqut commented Aug 24, 2024

I forgot to mention the way I spitted training and testing data was:

train, test = python_stratified_split(df, ratio=0.75)

@miguelgfierro
Copy link
Collaborator

miguelgfierro commented Aug 27, 2024

We need to review this @SimonYansenZhao @anargyri @daviddavo @loomlike and even @yueguoguo

@daviddavo
Copy link
Collaborator

How do you get the all_predictions? Can you provide the full code to reproduce the issue? Or are you just using the deep dive notebook?

@daviddavo
Copy link
Collaborator

The problem seems to be that when you use by_threshold, the k at some equation terms remains being top_k

For example, in precision@k:

return (df_hit_count["hit"] / k).sum() / n_users

It divides by 10 (default k value), instead of by 50 (the specified by_threshold value).

The other metrics have similar problems.

Perhaps this is what by_threshold is intended to do. Is it a way of changing how many items you want, even though you are calculating by k?? I don't really understand how by_threshold should work so I don't really know if this is a bug or intended behaviour.

I can solve the bug by just using threshold instead of k when necessary, but then by_threshold and top_k would be exactly the same.

Btw, here is a notebook that replicates the issue in Google Colab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants