Addition of the MSE loss function #173

jvdp1 · 2024-04-16T13:51:27Z

Here is a PR to support MSE as loss function.
Additional commits should provide options to the users for choosing among different loss functions (similar to the optimizers).

jvdp1 · 2024-04-16T16:13:01Z

@milancurcic would such an implementation be useful/appropriate?

milancurcic · 2024-04-16T16:57:35Z

I think it's a good approach (simple).

Do you foresee us needing to also carry the loss function itself (e.g. mse) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training.

If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate. If you agree, we could just model this after the activation or optimizer module e.g.

type, abstract :: loss_type
contains
  procedure(loss_interface), deferred :: eval, derivative
end type loss_type

abstract interface
  pure function loss_interface(self, true, predicted) result(res)
    class(loss), intent(in) :: self
    real, intent(in) :: true(:)
    real, intent(in) :: predicted(:)
  end function eval
end interface

type, extends(loss_type) :: mse
contains
  procedure :: eval => eval_mse
  procedure :: derivative => derivative_mse
end type mse

contains

  pure function eval_mse(self, true, predicted) result(res)
    class(mse), intent(in) :: self
    real, intent(in) :: true(:), predicted(:)
    ...
  end function eval_mse

  pure function derivative_mse(self, true, predicted) result(res)
    class(mse), intent(in) :: self
    real, intent(in) :: true(:), predicted(:)
    ...
  end function derivative_mse

  ...

end module nf_loss

Then in the network type, the component for the loss would be:

type network
  ...
  class(loss_type), allocatable :: loss
  ...
end type network

and we'd call the respective functions with net % loss % eval and net % loss % derivative.

This way the pattern is also consistent with the optimizers and activations modules.

Let me know what you think.

jvdp1 · 2024-04-16T17:53:56Z

Do you foresee us needing to also carry the loss function itself (e.g. mse) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training.
If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate.

I think it would be good to provide a procedure evaluate like for Keras model.evaluate. In this case, a DT is indeed more appropriate.

If you agree, we could just model this after the activation or optimizer module e.g.

I actually started to implement it like the optimizer DT, but then I noticed that only the derivative was used, and switched to the current proposition. It should be easy to change it back to a DT.

and we'd call the respective functions with net % loss % eval and net % loss % derivative.

I think that the eval and derivative should be associated with different interfaces, because eval should return a scalar (e.g. MSE) while derivative should return a vector (e.g., dMSE/dx).

This way the pattern is also consistent with the optimizers and activations modules.

This makes sense, and it is also easier to follow the code (as the same approach is used for all components).

Should I close this PR, and open a new PR with a DT approach? Or just modify this PR?

milancurcic · 2024-04-16T18:03:27Z

You're correct regarding the different interfaces (scalar and vector) between eval and derivative!

Regarding the PR, whatever is easier is fine, you can keep this PR if it's convenient for you.

Vandenplas, Jeremie added 10 commits April 16, 2024 15:39

Addition of the MSE loss function

3fcd6dc

fix api mse_derivative

a13dc08

fix api

f5fc636

Addition of the procedure interface in DT network

49b8a41

Addition of loss_derivative in procedure API

53d6a8a

fix issue nf_network_submodule

1ac11bd

mv loss_derivative_interface to nf_loss

2b5e27f

provide loss derivative functions in nf

e1cb7dd

Support of optional loss derivative function in the DT network

ecd2979

Addition of documentation

0a37cca

jvdp1 marked this pull request as ready for review April 16, 2024 16:12

Vandenplas, Jeremie added 2 commits April 16, 2024 18:26

Addition of loss_derivative in network % backward

a20150d

Remove loss_derivative from network % update

0e02a8e

jvdp1 closed this Apr 16, 2024

jvdp1 mentioned this pull request Apr 16, 2024

Addition of the Loss derived type and of the MSE loss function #175

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of the MSE loss function #173

Addition of the MSE loss function #173

jvdp1 commented Apr 16, 2024

jvdp1 commented Apr 16, 2024

milancurcic commented Apr 16, 2024

jvdp1 commented Apr 16, 2024

milancurcic commented Apr 16, 2024

Addition of the MSE loss function #173

Addition of the MSE loss function #173

Conversation

jvdp1 commented Apr 16, 2024

jvdp1 commented Apr 16, 2024

milancurcic commented Apr 16, 2024

jvdp1 commented Apr 16, 2024

milancurcic commented Apr 16, 2024