You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition, to help researchers become familiar with our data and run quick experiments, we are releasing a demo and a small version of the EB-NeRD by randomly sampling 5,000 and 50,000 users and their behavior logs from the full dataset.
But my stats on the small version show that the number of users is 18,827. The code for the statistics is as follows:
import os
import pyarrow.parquet as pq
DATA_DIR = './small'
train_df = pq.ParquetFile(os.path.join(DATA_DIR, 'train', 'behaviors.parquet')).read().to_pandas()
test_df = pq.ParquetFile(os.path.join(DATA_DIR, 'validation', 'behaviors.parquet')).read().to_pandas()
# or
# train_df = pq.ParquetFile(os.path.join(DATA_DIR, 'train', 'history.parquet')).read().to_pandas()
# test_df = pq.ParquetFile(os.path.join(DATA_DIR, 'validation', 'history.parquet')).read().to_pandas()
cnt_dict = {}
for item in train_df.to_dict(orient='records'):
cnt_dict[item['user_id']] = 0
for item in test_df.to_dict(orient='records'):
cnt_dict[item['user_id']] = 0
print(len(cnt_dict))
# Output: 18827
I would like to know the reason for this. Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I noticed that the description mentions:
But my stats on the small version show that the number of users is 18,827. The code for the statistics is as follows:
I would like to know the reason for this. Thanks!
The text was updated successfully, but these errors were encountered: