You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each stage can be called a 'task'. For any task, the tasks on its left are 'upstream' tasks. Generally, a task needs its upstream task config files and model weights for both training and deployment (which additionally requires its own configs and model weights).
A desirable scenario
In a brief discussion with @Haichao-Zhang on Friday, we agreed that in order to easily manage and use training results of upstream tasks, two important properties are desired:
The model weights are always stored as one ckpt file, regardless of where the task is at the pipeline. For example, an agent's ckpt contains the model weights for low-level, expert, and itself.
We only need to look up at one stage to get all needed configurations. For example, for either agent training or deployment, it only needs the expert's job dir but not the low-level's. Similarly, for deployment, it only needs the agent's job dir.
The above two properties simplify ckpt and conf management, because we don't want multiple training dirs passed to a downstream task.
Solution
For model weights, it's straightforward to store all as one ckpt. However, it's a little tricky when it comes to conf management. Below is a simple hack for that purpose.
defsave_upstream_confs(upstream_task_root_dir: str):
"""When training the current task B, we copy all upstream task (C,D,...) confs to './.upstream_confs', and then add them to ``_CONF_FILES``. This will make them further copied to 'config_files' under the TB directory of B when later ALF writes the config. So later when one wants to use the ckpt of B for a new downstream task A, he doesn't need trained dirs of C,D,..., because their conf files have been included in B. To use any cached upstream conf ``x_conf.py``, one needs only to do .. code-block:: python alf.import_config('./.upstream_confs/x_conf.py') This will also work if ``x_conf.py`` also imports some upstream conf ``y_conf.py``, if inside ``x_conf.py`` it's written as .. code-block:: python alf.import_config('./.upstream_confs/y_conf.py') A general template of using/saving upstream confs: .. code-block:: python if is_training: save_upstream_confs(upstream_task_root_dir) # import conf files of the current task alf.import_config('x_conf.py') alf.import_config('y_conf.py') # import conf files of upstream tasks alf.import_config('./upstream_confs/z_conf.py') Args: upstream_task_root_dir: the root dir of the upstream task """root_dir=upstream_task_root_dirdst=pathlib.Path(__file__).parentdst=dst/".upstream_confs/"os.system(f"mkdir -p {dst}")
# Copy the upstream task config files, along with its upstream task conf files# if existing.ifos.path.isdir(f"{root_dir}/config_files/.upstream_confs"):
os.system(f"cp -r {root_dir}/config_files/.upstream_confs {dst}")
os.system(f"cp {root_dir}/config_files/*.py {dst}")
forfinglob.glob(f"{dst}/**/*.py", recursive=True):
_add_conf_file(f)
Generally, we copy all files under config_files of an upstream root_dir, recursively to the path of the current conf file, under a special dir called .upstream_confs. Then we add all files in this special dir recursively to ALF's _CONF_FILES which will be copied by ALF to the config_files of the training root dir after one training iteration of the current task.
This can satisfy the second property, if in any conf file x_conf.py of the current task, we import another conf file y_conf.py of the immediate upstream task by
alf.import_config('.upstream_confs/y_conf.py')
This works for both the training and deployment modes of the task. We only call the above function when it's in the training mode:
Background
Our Hobot pipeline is long and has four stages:
Each stage can be called a 'task'. For any task, the tasks on its left are 'upstream' tasks. Generally, a task needs its upstream task config files and model weights for both training and deployment (which additionally requires its own configs and model weights).
A desirable scenario
In a brief discussion with @Haichao-Zhang on Friday, we agreed that in order to easily manage and use training results of upstream tasks, two important properties are desired:
The above two properties simplify ckpt and conf management, because we don't want multiple training dirs passed to a downstream task.
Solution
For model weights, it's straightforward to store all as one ckpt. However, it's a little tricky when it comes to conf management. Below is a simple hack for that purpose.
Generally, we copy all files under
config_files
of an upstream root_dir, recursively to the path of the current conf file, under a special dir called.upstream_confs
. Then we add all files in this special dir recursively to ALF's_CONF_FILES
which will be copied by ALF to theconfig_files
of the training root dir after one training iteration of the current task.This can satisfy the second property, if in any conf file
x_conf.py
of the current task, we import another conf filey_conf.py
of the immediate upstream task byThis works for both the training and deployment modes of the task. We only call the above function when it's in the training mode:
The text was updated successfully, but these errors were encountered: