-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of the MSE loss function #173
Conversation
@milancurcic would such an implementation be useful/appropriate? |
I think it's a good approach (simple). Do you foresee us needing to also carry the loss function itself (e.g. If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate. If you agree, we could just model this after the activation or optimizer module e.g. type, abstract :: loss_type
contains
procedure(loss_interface), deferred :: eval, derivative
end type loss_type
abstract interface
pure function loss_interface(self, true, predicted) result(res)
class(loss), intent(in) :: self
real, intent(in) :: true(:)
real, intent(in) :: predicted(:)
end function eval
end interface
type, extends(loss_type) :: mse
contains
procedure :: eval => eval_mse
procedure :: derivative => derivative_mse
end type mse
contains
pure function eval_mse(self, true, predicted) result(res)
class(mse), intent(in) :: self
real, intent(in) :: true(:), predicted(:)
...
end function eval_mse
pure function derivative_mse(self, true, predicted) result(res)
class(mse), intent(in) :: self
real, intent(in) :: true(:), predicted(:)
...
end function derivative_mse
...
end module nf_loss Then in the network type, the component for the loss would be: type network
...
class(loss_type), allocatable :: loss
...
end type network and we'd call the respective functions with This way the pattern is also consistent with the optimizers and activations modules. Let me know what you think. |
I think it would be good to provide a procedure
I actually started to implement it like the optimizer DT, but then I noticed that only the derivative was used, and switched to the current proposition. It should be easy to change it back to a DT.
I think that the
This makes sense, and it is also easier to follow the code (as the same approach is used for all components). Should I close this PR, and open a new PR with a DT approach? Or just modify this PR? |
You're correct regarding the different interfaces (scalar and vector) between Regarding the PR, whatever is easier is fine, you can keep this PR if it's convenient for you. |
Here is a PR to support MSE as loss function.
Additional commits should provide options to the users for choosing among different loss functions (similar to the optimizers).