-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onert-micro training api #12996
base: master
Are you sure you want to change the base?
onert-micro training api #12996
Conversation
6da9827
to
d09603b
Compare
// 4. check loss | ||
float loss; | ||
om_train_get_loss(ctx, 0, &loss); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5/17일자 녹스 미팅에서 말씀 주신대로, PR기준 EarlyStop 관련 추가 요청 드립니다.
아시겠지만 EarlyStop은 대략 이 시점에서 global minimum loss를 계속 추적하는 기능입니다. (app에서 loss를 쓸지, accuracy를 쓸지 metrics를 설정해줄 수도 있지만)
- 이번 step의 loss가 global minimum loss보다 더 좋아지지(낮아지지) 않았으면, count 증가
- count가 app에서 지정해준 횟수 N을 넘으면 최대 epoch에 도달하지 않더라도 학습 중지(early stop) -> 더 이상의 학습이 의미 없다고 판단
만약 복잡도나 일정 등의 비용에 크게 부담이 되지 않는다면, EarlyStop 기능은 이 레벨에서 구현되는 것이 구조적으로 좋을 것 같습니다.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep onert-micro as simple as possible since a variant introduces more maintenance cost. IMHO, it is better to implement at platform(aifw) level. That is, while onert-micro provides basic feature like this api, AIFW can provide high level features by assembling them.
} | ||
else { | ||
om_train_save_as_inferencemodel(ctx, PATH); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
모든 체크포인트를 매번 inference가능한 산출물로 저장하면 안 되나요? 비용 차이가 크게 나는지 궁금합니다.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to unify the inference and the checkpoint format. But, not now.
Moreover, once checkpoint is converted into inference model, it is no more trainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But app should validate origin model and new model if check point happened.
How it possible without updated file? (IMO, at least some kind of temp file is necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
om_train_save_as_inferencemodel(ctx, PATH);
is for producing inference model. To validate the current model(during training), you can use om_train_inference(om_context *context);
// 4. check loss | ||
float loss; | ||
om_train_get_loss(ctx, 0, &loss); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
체크포인트 저장 시점은 보통 유저가 내린 설정에 따라서 결정되는데, 참고로 공유 드립니다.
`
best_model_name = f"./inverter_model/{self.get_model_name()}_{str(now_ts)}.h5"
mc = ModelCheckpoint(best_model_name, monitor="val_loss", save_best_only=True, mode="min", verbose=0)
`
=> monitor 대상인 "val_loss"가 "min"기준으로, best 일 때만 best_model_name 경로에 체크포인트를 저장
Let's try to use the same API as onert ( https://github.com/Samsung/ONE/blob/master/runtime/onert/api/nnfw/include/nnfw_experimental.h and https://github.com/Samsung/ONE/blob/master/runtime/onert/api/nnfw/include/nnfw.h ). I will update PR soon. |
} | ||
else { | ||
om_train_save_as_inferencemodel(ctx, PATH); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But app should validate origin model and new model if check point happened.
How it possible without updated file? (IMO, at least some kind of temp file is necessary)
* | ||
* @return @c OM_STATUS_NO_ERROR if successful | ||
*/ | ||
OM_STATUS om_train_compile(om_context *ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more parameter need to be expanded, as I introduced API set in last meeting.(late, loss, metrix...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add config api like set_train_info
. But note that this config is optional and basically, circle model itself contains training info(loss,...)
* If it is nullptr, it will not change shape and batch size | ||
* @return @c OM_STATUS_NO_ERROR if successful | ||
*/ | ||
OM_STATUS om_train_set_input(om_context *ctx, uint32_t index, const void *input, int size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be conbined with om_train_inference, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, I'd like to keep one api for one role. Thus, hope that this api will be as it is. Only different between you suggestion and current function is single om_train_set_input
call
FYI, we will update checkpoint api based on #12997. |
While transforming to use nnfw's API, it found that we need extra (output buffer) copy to use nnfw_set_output in onert-micro side. onert-micro uses internal buffer for output while onert uses output buffer allocated by user. |
api implementation based on #13107 will be on https://github.com/chunseoklee/ONE/commits/v3 |
f5fb6a7
to
3ae5968
Compare
3ae5968
to
41ecd55
Compare
- onert mico c api - onert-micro-dev library Signed-off-by: chunseoklee <[email protected]>
41ecd55
to
853e01b
Compare
No description provided.