How is NDCG computed for queries where there is only one document? #6823

cjsombric · 2025-02-11T21:04:40Z

I am able to train on queries with only one document without problem, but when I try to compute the ndcg "from scratch" for a single query I get a very reasonable error that:

ValueError: Computing NDCG is only meaningful when there is more than 1 document. Got 1 instead.

How can I reconcile the ndcg in model.evals_result_ with a hand computed one given this?

Here is the code I use to generate the ndcg one query at a time

`
from sklearn.metrics import ndcg_score

model = LGBMRanker(objective="lambdarank",
boosting_type = "gbdt",
random_state=42,
max_depth=10,
min_data_in_leaf=200,
n_estimators=100,
subsample=0.5,
colsample_bytree=0.6,
lambda_l1=0.9,
lambda_l2=0.9,
n_jobs = -1)

model.fit(X_train, y_train, group = X_train_group,
eval_set=[(X_train, y_train)],
eval_group=[X_train_group],
eval_metric=['ndcg'],
eval_at=[1, 5, 20, 100])

model.evals_result_["training"]['ndcg@5'][-1] # Returns a ndcg of ~0.70

def ndcg_scorer(y_true, y_pred, groups, K):
scores = []
start = 0
for group_size in groups:
end = start + group_size
y_true_group = np.array([y_true[start:end]])
y_pred_group = np.array([y_pred[start:end]])
score = ndcg_score(y_true_group, y_pred_group, k=K)
scores.append(score)
start = end
return print(f"ndcg@{K}: {np.mean(scores)}")

def custom_scorer(estimator, X, y, K):
groups = get_group_size(pd.DataFrame(X))
y_pred = estimator.predict(X)
return ndcg_scorer(y, y_pred, groups, K)

custom_scorer(model, X_train, y_train, 5) # Returns: ValueError: Computing NDCG is only meaningful when there is more than 1 document. Got 1 instead.

`
When I remove the queries with only one document, I am able to compute a ndcg, but it does not match the score given by model.evals_result_ (~0.70) does not match the one from the custom_scoreer (~0.02).

Why would this be happening?

The text was updated successfully, but these errors were encountered:

shiyu1994 · 2025-02-18T03:28:21Z

According to the definition if NDCG, a query with only 1 document should always have NDCG score = 1. Since there's only 1 ranking for the 1 document, the DCG score given by the model always equals the ideal DCG score.

I'm not sure how sklearn handles these single document queries. Do you remove them for both sklearn evaluation and lightgbm?

cjsombric · 2025-02-18T14:25:08Z

According to the definition if NDCG, a query with only 1 document should always have NDCG score = 1. Since there's only 1 ranking for the 1 document, the DCG score given by the model always equals the ideal DCG score.

I'm not sure how sklearn handles these single document queries. Do you remove them for both sklearn evaluation and lightgbm?

I have been unclear, let me try again to explain. I isolated the validation data to one query, with 20 documents in which case the it should be simple to compute the ndcg 3 ways:

using lightGBM's fit function by including the validation data as an "eval_set" and exrtacting the ndcg with the "evals_result_" functionality
using lightGBM's prediction function and then hand computing the ndcg from the observed predictions for the validation data set
using sklean's implementation of ndcg, which should be fine in this case since sklearn's inability to take into account the query grouping doesn't matter since there is only one query

The ndcg's match from the lightGBM's predict function + hand calculation and sklearn's implementation, but I can not replicate the ndcg from lightGBM's fit function.

There are two reasons that I can think of why this is happening:
(1) lightGBM's handles discounting when computing the ndcg in a way that I am not familiar with (the two I tried were documented here: https://en.wikipedia.org/wiki/Discounted_cumulative_gain#cite_note-4)
OR
(2) lightGBM's predicted ranking for the validation set differ between those generated as part of the fit function vs. the predict function.

Can you clarify which implementation of discounting used for ndcg and you indicate if the predictions should/could differ when generated as part of lightGBM's implementation of the fit vs. predict functions.

In regards to you question about removing queries, I am finding I have to remove validation queries which don't have any relevant queries, but as long as there is at least one relevant document everything seems to run. When lightGBM computes a training ndcg via the "evals_result_" functionality how does it accomplish this for queries where there are no relevant documents or only one document which not relevant?

jameslamb added the question label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is NDCG computed for queries where there is only one document? #6823

How is NDCG computed for queries where there is only one document? #6823

cjsombric commented Feb 11, 2025

shiyu1994 commented Feb 18, 2025

cjsombric commented Feb 18, 2025 •

edited

Loading

How is NDCG computed for queries where there is only one document? #6823

How is NDCG computed for queries where there is only one document? #6823

Comments

cjsombric commented Feb 11, 2025

shiyu1994 commented Feb 18, 2025

cjsombric commented Feb 18, 2025 • edited Loading

cjsombric commented Feb 18, 2025 •

edited

Loading