Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of the MSE loss function #173

Closed
wants to merge 12 commits into from
Closed

Conversation

jvdp1
Copy link
Collaborator

@jvdp1 jvdp1 commented Apr 16, 2024

Here is a PR to support MSE as loss function.
Additional commits should provide options to the users for choosing among different loss functions (similar to the optimizers).

@jvdp1 jvdp1 marked this pull request as ready for review April 16, 2024 16:12
@jvdp1
Copy link
Collaborator Author

jvdp1 commented Apr 16, 2024

@milancurcic would such an implementation be useful/appropriate?

@milancurcic
Copy link
Member

I think it's a good approach (simple).

Do you foresee us needing to also carry the loss function itself (e.g. mse) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training.

If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate. If you agree, we could just model this after the activation or optimizer module e.g.

type, abstract :: loss_type
contains
  procedure(loss_interface), deferred :: eval, derivative
end type loss_type

abstract interface
  pure function loss_interface(self, true, predicted) result(res)
    class(loss), intent(in) :: self
    real, intent(in) :: true(:)
    real, intent(in) :: predicted(:)
  end function eval
end interface

type, extends(loss_type) :: mse
contains
  procedure :: eval => eval_mse
  procedure :: derivative => derivative_mse
end type mse

contains

  pure function eval_mse(self, true, predicted) result(res)
    class(mse), intent(in) :: self
    real, intent(in) :: true(:), predicted(:)
    ...
  end function eval_mse

  pure function derivative_mse(self, true, predicted) result(res)
    class(mse), intent(in) :: self
    real, intent(in) :: true(:), predicted(:)
    ...
  end function derivative_mse

  ...

end module nf_loss

Then in the network type, the component for the loss would be:

type network
  ...
  class(loss_type), allocatable :: loss
  ...
end type network

and we'd call the respective functions with net % loss % eval and net % loss % derivative.

This way the pattern is also consistent with the optimizers and activations modules.

Let me know what you think.

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Apr 16, 2024

Do you foresee us needing to also carry the loss function itself (e.g. mse) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training.
If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate.

I think it would be good to provide a procedure evaluate like for Keras model.evaluate. In this case, a DT is indeed more appropriate.

If you agree, we could just model this after the activation or optimizer module e.g.

I actually started to implement it like the optimizer DT, but then I noticed that only the derivative was used, and switched to the current proposition. It should be easy to change it back to a DT.

and we'd call the respective functions with net % loss % eval and net % loss % derivative.

I think that the eval and derivative should be associated with different interfaces, because eval should return a scalar (e.g. MSE) while derivative should return a vector (e.g., dMSE/dx).

This way the pattern is also consistent with the optimizers and activations modules.

This makes sense, and it is also easier to follow the code (as the same approach is used for all components).

Should I close this PR, and open a new PR with a DT approach? Or just modify this PR?

@milancurcic
Copy link
Member

You're correct regarding the different interfaces (scalar and vector) between eval and derivative!

Regarding the PR, whatever is easier is fine, you can keep this PR if it's convenient for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants