Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is NDCG computed for queries where there is only one document? #6823

Open
cjsombric opened this issue Feb 11, 2025 · 2 comments
Open

How is NDCG computed for queries where there is only one document? #6823

cjsombric opened this issue Feb 11, 2025 · 2 comments
Labels

Comments

@cjsombric
Copy link

I am able to train on queries with only one document without problem, but when I try to compute the ndcg "from scratch" for a single query I get a very reasonable error that:

ValueError: Computing NDCG is only meaningful when there is more than 1 document. Got 1 instead.

How can I reconcile the ndcg in model.evals_result_ with a hand computed one given this?

Here is the code I use to generate the ndcg one query at a time

`
from sklearn.metrics import ndcg_score

model = LGBMRanker(objective="lambdarank",
boosting_type = "gbdt",
random_state=42,
max_depth=10,
min_data_in_leaf=200,
n_estimators=100,
subsample=0.5,
colsample_bytree=0.6,
lambda_l1=0.9,
lambda_l2=0.9,
n_jobs = -1)

model.fit(X_train, y_train, group = X_train_group,
eval_set=[(X_train, y_train)],
eval_group=[X_train_group],
eval_metric=['ndcg'],
eval_at=[1, 5, 20, 100])

model.evals_result_["training"]['ndcg@5'][-1] # Returns a ndcg of ~0.70

def ndcg_scorer(y_true, y_pred, groups, K):
scores = []
start = 0
for group_size in groups:
end = start + group_size
y_true_group = np.array([y_true[start:end]])
y_pred_group = np.array([y_pred[start:end]])
score = ndcg_score(y_true_group, y_pred_group, k=K)
scores.append(score)
start = end
return print(f"ndcg@{K}: {np.mean(scores)}")

def custom_scorer(estimator, X, y, K):
groups = get_group_size(pd.DataFrame(X))
y_pred = estimator.predict(X)
return ndcg_scorer(y, y_pred, groups, K)

custom_scorer(model, X_train, y_train, 5) # Returns: ValueError: Computing NDCG is only meaningful when there is more than 1 document. Got 1 instead.

`
When I remove the queries with only one document, I am able to compute a ndcg, but it does not match the score given by model.evals_result_ (~0.70) does not match the one from the custom_scoreer (~0.02).

Why would this be happening?

@shiyu1994
Copy link
Collaborator

According to the definition if NDCG, a query with only 1 document should always have NDCG score = 1. Since there's only 1 ranking for the 1 document, the DCG score given by the model always equals the ideal DCG score.

I'm not sure how sklearn handles these single document queries. Do you remove them for both sklearn evaluation and lightgbm?

@cjsombric
Copy link
Author

cjsombric commented Feb 18, 2025

According to the definition if NDCG, a query with only 1 document should always have NDCG score = 1. Since there's only 1 ranking for the 1 document, the DCG score given by the model always equals the ideal DCG score.

I'm not sure how sklearn handles these single document queries. Do you remove them for both sklearn evaluation and lightgbm?

I have been unclear, let me try again to explain. I isolated the validation data to one query, with 20 documents in which case the it should be simple to compute the ndcg 3 ways:

  1. using lightGBM's fit function by including the validation data as an "eval_set" and exrtacting the ndcg with the "evals_result_" functionality
  2. using lightGBM's prediction function and then hand computing the ndcg from the observed predictions for the validation data set
  3. using sklean's implementation of ndcg, which should be fine in this case since sklearn's inability to take into account the query grouping doesn't matter since there is only one query

The ndcg's match from the lightGBM's predict function + hand calculation and sklearn's implementation, but I can not replicate the ndcg from lightGBM's fit function.

There are two reasons that I can think of why this is happening:
(1) lightGBM's handles discounting when computing the ndcg in a way that I am not familiar with (the two I tried were documented here: https://en.wikipedia.org/wiki/Discounted_cumulative_gain#cite_note-4)
OR
(2) lightGBM's predicted ranking for the validation set differ between those generated as part of the fit function vs. the predict function.

Can you clarify which implementation of discounting used for ndcg and you indicate if the predictions should/could differ when generated as part of lightGBM's implementation of the fit vs. predict functions.

In regards to you question about removing queries, I am finding I have to remove validation queries which don't have any relevant queries, but as long as there is at least one relevant document everything seems to run. When lightGBM computes a training ndcg via the "evals_result_" functionality how does it accomplish this for queries where there are no relevant documents or only one document which not relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants