Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up the training process? #36

Open
BrightMoonStar opened this issue Mar 12, 2024 · 0 comments
Open

How to speed up the training process? #36

BrightMoonStar opened this issue Mar 12, 2024 · 0 comments

Comments

@BrightMoonStar
Copy link

I found that the GPU utilization when running training-scriptpython3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical_cl --val_data_size=160 is very low. Could you provide some suggestions to make full use of GPU resource to speed up the process?I have tried to set num_worker larger but it seems doesn't help ,and when I try to set batch_size larger, there will be mistakes like following

len val dataset 160
Running Testing
Traceback (most recent call last):
  File "spirl/spirl/train.py", line 390, in <module>
    ModelTrainer(args=get_args())
  File "spirl/spirl/train.py", line 76, in __init__
    self.train(start_epoch)
  File "spirl/spirl/train.py", line 105, in train
    self.val()
  File "spirl/spirl/train.py", line 199, in val
    self.evaluator.dump_results(self.global_step)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 66, in dump_results
    self.dump_metrics(it)
  File "/home/lyf/Videos/bin/skild/skild/spirl/spirl/components/evaluator.py", line 72, in dump_metrics
    best_idxs = 0 if self._top_of_n == 1 else self._get_best_idxs(self.full_eval_buffer[self._top_comp_metric])
TypeError: 'NoneType' object is not subscriptable

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant