-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding the reward of sales promotion training dataset #10
Comments
Hi, the calculation of each user's reward is the same, which is the based on Alternately, the datasets actions are made by a human operator (after data anonymization), and we retain the original reward in the dataset for researchers with specific needs. |
Thanks for reporting this issue. This environment is originally designed for online evaluation, and thus some code are tailored to evaluation but not for training. We have locally fixed this reset issue for training, while that branch has not committed. This will sooner come with the newer sales promotion environment with budget constraint. Currently, you can revise this line with deepcopy() as a quick fix, i.e., "self.states = deepcopy(self.val_initial_states)" in
For "I require it to recalculate the reward based on variations in user orders", as mentioned above, the current sp_env does not support this calculation. You may need to use the raw order_number (the user network output) and gmv&cost data in
and
|
Hi,
In the sales promotion environment the reward is denoted by rew = (d_total_gmv - d_total_cost)/self.num_users which means the operator observes one single reward signal over all users. However, in the offline training dataset the reward is different for each user across 50 days. For example refer to the below user orders and reward graph
as per my understanding the reward should be same each day for the three users and gradually increase over 50 days with increase in sales. Could you kindly let me know how the reward in the training dataset was calculated.
The text was updated successfully, but these errors were encountered: