forked from BVLC/caffe
-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Backward Time Complexity to O(MK) #1
Open
mfs6174
wants to merge
6
commits into
ydwen:caffe-face
Choose a base branch
from
mfs6174:ydwen-face-mod
base: caffe-face
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
cfd64a6
The time complexity of the backward process of the center loss layer …
86d8200
add details regarding the modification into README.md
5831099
fix README.md
bfe50f8
not setting the diff to zeros in Backward(), allowing the gradient to…
dccea74
add normalize layer to help training center loss
201f79a
add normalize layer header
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,24 @@ | ||
#Faster Center Loss Implementation | ||
This branch is forked from [ydwen's caffe-face](https://github.com/ydwen/caffe-face) and modified by mfs6174 ( [email protected] ) | ||
|
||
Compared to the original implementation by the paper author, the backward time complexity of this implementation is optimized to O(MK) from O(MK+NM). | ||
|
||
In the original implementation, the time complexity of the backward process of the center loss layer is O(MK+NM). It will be very slow when training with a large number of classes since the running time of the backward pass is related to the class number (N). Unfortunately, it is a common case when training face recognition model (e.g. 750k unique persons). | ||
|
||
This implementation rewrites the backward code. The time complexity is optimized to O(MK) with additional O(N) space. Because M (batch size) << N and K (feature length) << N usually hold for face recognition problem, this modification will improve the training speed significantly. | ||
|
||
For a Googlenet v2 model trained with Everphoto's 750k unique person dataset, on a single Nvidia GTX Titan X, with 24 batch size and iter_size = 5, the average backward iteration time for different cases is: | ||
|
||
1. Softmax only: 230ms | ||
2. Softmax + Center loss, original implementation: 3485ms, center loss layer: 3332ms | ||
3. Softmax + Center loss, implementation in this PR: 235.6ms, center loss layer: 5.4ms | ||
|
||
There is more than 600x improvement. | ||
|
||
For the author's "minit_example", running on a single GTX Titan X, training time of the original implementation and the PR is 4min20s V.S. 3min50s. It is shown that even when training with small dataset with only 10 classes, there still is some improvement. | ||
|
||
The implementation also fix the code style to pass the Caffe's lint test (make lint) so that it may be ready to be merged into Caffe's master. | ||
|
||
# Deep Face Recognition with Caffe Implementation | ||
|
||
This branch is developed for deep face recognition, the related paper is as follows. | ||
|
@@ -185,4 +206,4 @@ Please cite Caffe in your publications if it helps your research: | |
Journal = {arXiv preprint arXiv:1408.5093}, | ||
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding}, | ||
Year = {2014} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob's diff should not be reseted in Backward().
It will hurt the gradient accumulation when iter_size > 1. See discussion in https://groups.google.com/forum/#!searchin/caffe-users/iter_size|sort:relevance/caffe-users/PMbycfbpKcY/FTBiMKunEQAJ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. It is true that the diff has been initialized in other code and should not be reset here. I have just fixed it.
Interestingly, I have successfully trained a network with center loss on a large dataset using iter_size > 1 and it seems that the result did not suffered much from this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything you wrote wrong works as a regularize term:)