Resume Training with Previous Experience (state-action-state')? #1134

wenjunli-0 · 2021-08-26T14:09:48Z

I am using stable baseline and I want to train an agent with varying environments, i.e. the environment hyper-parameter is adjusted every 1000 timestep.

for i in range(100):
    a = i * 2
    env = CustomizedEnv(parameter=a)
    env.reset()
    
    model.learn(total_timesteps=1000, reset_num_timesteps=False)
    model.save(save_dir + 'timestep_{}'.format(i))

Describe the bug
I want to know if I resume training this way, whether the previous interaction experience will be automatically used in current training. With the i increases, will the model have access to the larger experience space in the buffer?

If not, could you please let me know how can I do this with stable baselinse？Thanks.

Miffyli · 2021-08-26T15:19:01Z

The exact answer depends on the algorithm you use, but at least with DQN the code re-creates the replay buffer on every call to learn.

However in stable-baselines3 the buffer is not re-created, so calling learn again would use the samples from the previous learn call as well.

wenjunli-0 · 2021-08-26T15:27:19Z

The exact answer depends on the algorithm you use, but at least with DQN the code re-creates the replay buffer on every call to learn.

However in stable-baselines3 the buffer is not re-created, so calling learn again would use the samples from the previous learn call as well.

Thanks for your swift response. I am using TRPO and PPO. So, you mean stable-baselines3 would be more suitable for this problem (because stable-baselines3 will collect previous samples and current samples in buffer), right?

Miffyli · 2021-08-26T15:29:49Z

Thanks for your swift response. I am using TRPO and PPO. So, you mean stable-baselines3 would be more suitable for this problem (because stable-baselines3 will collect previous samples and current samples in buffer), right?

I would recommend using SB3 in any case (unless you really need TRPO), as it is more up-to-date and is actively supported/maintained :)

But: if you are using TRPO/PPO, then the answer to your original question is "no". These algorithms use a rollout buffer to collect samples, which are then discarded after they have been used to update the policy, so no samples are retained for a longer time (this is a "feature" of these algorithms).

wenjunli-0 · 2021-08-26T15:49:12Z

Thanks for your swift response. I am using TRPO and PPO. So, you mean stable-baselines3 would be more suitable for this problem (because stable-baselines3 will collect previous samples and current samples in buffer), right?

I would recommend using SB3 in any case (unless you really need TRPO), as it is more up-to-date and is actively supported/maintained :)

But: if you are using TRPO/PPO, then the answer to your original question is "no". These algorithms use a rollout buffer to collect samples, which are then discarded after they have been used to update the policy, so no samples are retained for a longer time (this is a "feature" of these algorithms).

Okay, I will stick to SB3 in my later experiments. There are A2C, DDPG, DQN, HER, PPO, SAC, TD3 in SB3, could you please point out the algorithms that support this continuous training feature for me. I am not that familiar with some of the algorithms, so your explicit answer would be a great help for me.

Miffyli · 2021-08-26T15:50:49Z

I think any algorithm with replay buffer should work like this, so: DDPG, DQN, SAC and TD3.

rambo1111 · 2024-02-03T19:23:58Z

#1192

Miffyli added the question Further information is requested label Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resume Training with Previous Experience (state-action-state')? #1134

Resume Training with Previous Experience (state-action-state')? #1134

wenjunli-0 commented Aug 26, 2021

Miffyli commented Aug 26, 2021

wenjunli-0 commented Aug 26, 2021

Miffyli commented Aug 26, 2021

wenjunli-0 commented Aug 26, 2021

Miffyli commented Aug 26, 2021

rambo1111 commented Feb 3, 2024

Resume Training with Previous Experience (state-action-state')? #1134

Resume Training with Previous Experience (state-action-state')? #1134

Comments

wenjunli-0 commented Aug 26, 2021

Miffyli commented Aug 26, 2021

wenjunli-0 commented Aug 26, 2021

Miffyli commented Aug 26, 2021

wenjunli-0 commented Aug 26, 2021

Miffyli commented Aug 26, 2021

rambo1111 commented Feb 3, 2024