Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert/train] Attach auxiliary tensors to tensor builder #13282

Open
zetwhite opened this issue Jun 25, 2024 · 4 comments
Open

[onert/train] Attach auxiliary tensors to tensor builder #13282

zetwhite opened this issue Jun 25, 2024 · 4 comments
Assignees

Comments

@zetwhite
Copy link
Contributor

zetwhite commented Jun 25, 2024

Background

backendtrain/ops/*Layer has extra(auxiliary) tensors used for backward().

For example,

// TODO Optimize memory
std::unique_ptr<Tensor> _transposed_weights;
std::unique_ptr<Tensor> _transposed_input;
std::unique_ptr<Tensor> _transposed_back_prop_output;
std::unique_ptr<Tensor> _act_back_prop_output;

// TODO Consider if these tensors should be built in TensorBuilder
std::unique_ptr<Tensor> _transposed_weights;
std::unique_ptr<BackPropTensor> _conv_back_prop_output;
std::unique_ptr<BackPropTensor> _act_back_prop_output;
std::unique_ptr<GradientTensor> _transposed_grad_weights;

These tensors are allocated when KernelGenerator visits each operation.

_transposed_weights = createTransposedTensor(weights);
_transposed_weights->setBuffer(std::make_shared<basic::Allocator>(weights->total_size()));
_transposed_input = createTransposedTensor(input);
_transposed_input->setBuffer(std::make_shared<basic::Allocator>(input->total_size()));
_transposed_back_prop_output = createTransposedTensor(back_prop_output);
_transposed_back_prop_output->setBuffer(
std::make_shared<basic::Allocator>(back_prop_output->total_size()));

What

These auxiliary tensors always hold memory after being configured.
So, Adding these tensors into TensorBuilder to use a memory planner might be helpful.

@zetwhite

This comment was marked as outdated.

@zetwhite
Copy link
Contributor Author

zetwhite commented Jul 31, 2024

After some work on #13486,
I checked how much memory was reduced compared to the master branch.

mnist

	[      ALLOC     ] allocation capacity: 11360128 # non-const 
	[      ALLOC     ] allocation capacity: 1938880  # trainable 
	[      ALLOC     ] allocation capacity: 11360000 # back-prop 
	[      ALLOC     ] allocation capacity: 1938880  # gradient
	[      ALLOC     ] allocation capacity: 3877760  #  opt variable
	[      ALLOC     ] allocation capacity: 3211264  # disposable 
	[      ALLOC     ] allocation capacity: 6627328  # extra tensors 

mobile net v2

	[      ALLOC     ] allocation capacity: 361362288 # non-const
	[      ALLOC     ] allocation capacity: 13951408  # trainable 
	[      ALLOC     ] allocation capacity: 361362240 # back-prop
	[      ALLOC     ] allocation capacity: 13951408  # gradient 
	[      ALLOC     ] allocation capacity: 27902816  # opt variable 
	[      ALLOC     ] allocation capacity: 49032960  # disposable
	[      ALLOC     ] allocation capacity: 96350208  # extra tensors 

/cc @ragmani

@ragmani
Copy link
Contributor

ragmani commented Aug 1, 2024

After applying all PRs related to the draft #13305, the other allocation capacity will be reduces as follows:

mnist

33686912(32.1 MB) -> 25187648(24.0MB)

non-const : 11360128 -> 11341056
trainable : 1938880 -> 1938880
back-prop : 11360000 -> 6423808
gradient : 1938880 -> 1606144
optimizer variables : 3877760 -> 3877760
disposable : 3211264 -> 0

mobile net v2

827562720(789.2MB) -> 490938032(468.1MB)
827562720(789.2MB) -> 508592656(485.0MB)

non-const : 361362288 -> 361362240
trainable : 13951312 -> 13951312
back-prop : 361362240 -> 97241920
gradient : 13951312 -> 5124000
~optimizer variables : 27902608 -> 10248000~
optimizer variables : 27902608 -> 27902624
disposable : 49032960 -> 3010560

The capacity of optimizer variable is 27902624, not 10248000

@zetwhite
Copy link
Contributor Author

zetwhite commented Aug 1, 2024

I'll start to make PR based on the draft. ( #13486 )
Draft is somehow rough, I'll trim it while making a PR.

TODO

zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 11, 2024
This PR adds registerLayerScopeTensors to ITrainableFunction.
'registerLayerScopeTensors` is to register LayerScopeTensor into TensorReigstry.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>
draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 11, 2024
This PR adds registerLayerScopeTensors to ITrainableFunction.
'registerLayerScopeTensors` is to register LayerScopeTensor into TensorReigstry.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>
draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 11, 2024
This PR templatize memory planner factory in train backend.
MemoryPlannerFactory currently used for DisposableTensorIndex, but it will be also used for LayerScopeTensorIndex.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>
draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 13, 2024
This PR introduces LayerScopeMemoryManager.
This Manager will be added to TensorManager and used to allocate LayerScopeTensors.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 13, 2024
This PR introduces LayerScopeMemoryManager.
This Manager will be added to TensorManager and used to allocate LayerScopeTensors.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 13, 2024
This PR introduces LayerScopeMemoryManager.
This Manager will be added to TensorManager and used to allocate LayerScopeTensors.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 13, 2024
This PR adds LayerScopeTensors into TensorRegistry.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
@zetwhite zetwhite reopened this Sep 13, 2024
chunseoklee pushed a commit that referenced this issue Sep 20, 2024
This PR introduces LayerScopeMemoryManager.
This Manager will be added to TensorManager and used to allocate LayerScopeTensors.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : #13486
for : #13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 20, 2024
This PR adds LayerScopeManager into TensorManager.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 20, 2024
This PR adds LayerScopeTensors into TensorRegistry.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 20, 2024
This PR adds LayerScopeTensors into TensorRegistry.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
zetwhite added a commit to zetwhite/ONE that referenced this issue Sep 26, 2024
This PR adds LayerScopeManager into TensorManager.

ONE-DCO-1.0-Signed-off-by: seunghui youn <[email protected]>

draft : Samsung#13486
for : Samsung#13282
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants