-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing encoding networks using Sequential #989
Conversation
Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel.
spec = input_tensor_spec | ||
nets = [] | ||
|
||
if input_preprocessors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a slight change for the handling of input_preprocessors. Previously, in PreprocessorNetwork, each input preprocessor will be copied first. The new version will not copy. @Haichao-Zhang do you remember why copy is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a slight change for the handling of input_preprocessors. Previously, in PreprocessorNetwork, each input preprocessor will be copied first. The new version will not copy. @Haichao-Zhang do you remember why copy is needed?
@emailweixu Yes, the reason for this is that it will make sure when we create parallel networks with pre-processors (could have learnable parameters), the pre-processors will also be copied, unless explicitly specified not to copy.
This makes the expected behavior more natural, and fixed some previously encountered errors due to the mis-match between the thought behavior and actual behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The make_parallel of the container will copy all the components. So there should be any problem with this change if the original motivation is to ensure make_parallel work correctly.
@@ -142,7 +142,7 @@ def _train(): | |||
inputs=None, loss_func=_neglogprob, batch_size=batch_size) | |||
generator.update_with_gradient(alg_step.info) | |||
|
|||
for i in range(2000): | |||
for i in range(2100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to input preprocessor is not copied and causes different parameter initialization. See comment for encoding network.
@@ -112,6 +113,7 @@ def test_continuous_actor_distribution(self, lstm_hidden_size): | |||
conv_layer_params=self._conv_layer_params, | |||
continuous_projection_net_ctor=functools.partial( | |||
NormalProjectionNetwork, scale_distribution=True)) | |||
logging.info("---- %s" % str(actor_dist_net.state_spec)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this line
This line is not removed yet.
alf/networks/encoding_networks.py
Outdated
last_kernel_initializer=None, | ||
last_use_fc_bn=False, | ||
name="ParallelEncodingNetwork"): | ||
"""Parallel feed-forward network with FC layers which allows the last layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feed-forward network with FC layers
-> encoding network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
alf/networks/value_networks_test.py
Outdated
@@ -66,6 +67,7 @@ def test_value_distribution(self, lstm_hidden_size): | |||
conv_layer_params=conv_layer_params), None | |||
], | |||
preprocessing_combiner=NestConcat()) | |||
logging.info("----%s" % str(value_net.state_spec)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
@@ -187,7 +183,7 @@ def test_encoding_network_input_preprocessor(self): | |||
self.assertEqual(output.size()[1], 1) | |||
|
|||
@parameterized.parameters((True, ), (False, )) | |||
def test_encoding_network_nested_input(self, lstm): | |||
def test_encoding_network_nested_input(self, lstm=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to set lstm=False
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was for debugging. Restored.
|
||
TODO: remove ``allow_non_parallel_input``. This means to make parallel network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option allow_non_parallel_input
seems to be useful and we can keep it. Does it make sense to set default value of allow_non_parallel_input=True
, so that the input will always be handled correctly by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A network can be a component of big network (container). It does not need the ability to handle non-parallel input in those situations.
@@ -112,6 +113,7 @@ def test_continuous_actor_distribution(self, lstm_hidden_size): | |||
conv_layer_params=self._conv_layer_params, | |||
continuous_projection_net_ctor=functools.partial( | |||
NormalProjectionNetwork, scale_distribution=True)) | |||
logging.info("---- %s" % str(actor_dist_net.state_spec)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this line
This line is not removed yet.
alf/utils/checkpoint_utils_test.py
Outdated
len(p_net_w_preprocessor.state_dict()), | ||
len(p_net_wo_preprocessor.state_dict()) + | ||
replicas * len(input_preprocessors.state_dict())) | ||
if lstm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this if lstm
condition here and also in L587?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test relies on the fact the p_net_w_preprocessor is naively parallelized, which is no longer correct now.
Changed the test to not replying on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great! Just one final comment.
alf/utils/checkpoint_utils_test.py
Outdated
input_preprocessors.state_dict())) | ||
# the number of parameters of a parallel network with a shared | ||
# input_preprocessor should be equal to that of the parallel network | ||
# with non-shared input processor - replicas * the number of parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment, it should be (replicas-1) * ...
, same as the code.
* Implementing encoding networks using Sequential Since Sequential is very flexible, most networks can be implemented using it. And it automatically come with support for make_parallel. * Fix checkpoint_utils_test * Address review comments * Address more comments * Fix comment
Since Sequential is very flexible, most networks can be implemented using it. And it automatically comes with support for make_parallel.
This PR only convert encoding networks to use containers. Future PRs will change other networks.