Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate ground truth and calculate MSE #105

Open
ReneSkukies opened this issue Sep 9, 2024 · 4 comments
Open

Simulate ground truth and calculate MSE #105

ReneSkukies opened this issue Sep 9, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@ReneSkukies
Copy link
Member

Ground Truth

To test the validity of a method, it's important to simulate data and test the results against the ground truth. Currently, the ground truth can be obtained via UnfoldSim.simulate_responses (in the current version, v.0.3.2, not exported). However, the way to do this is a bit clunky since simulate_responses takes Simulation as an input, although only the design is needed:

simulation = UnfoldSim.Simulation(
        SingleSubjectDesign(;
        conditions = Dict(
            :condition => ["car", "face"],
            :continuous => range(0, 5, length = 10),
        )),
        components, # Not needed/ Redundant since it's already an input to simulate_responses
        UniformOnset(; width = 30, offset = 5), # Not needed
        NoNoise() # Not needed
    )

ground_truth = UnfoldSim.simulate_responses(MersenneTwister(1),components, simulation)

Additionally, simulate_responses results in a simple Matrix, with one column per event. For coherence with Unfold.effects it might be nice to turn this into a DataFrame. However, the resulting Matrix is needed internally, so we might need a new function entirely...

MSE

Further, as a possible measure of "goodness of fit", it might be nice to have a function that calculates the mean-squared-error between a ground truth and effects from an Unfold model. This might be also rather fit into UnfoldStats however.

Anyway, an issue with that is that ground truth gives me the full ground truth with all true effects for each component. But what if I "just" control certain covariates with Unfold.effects (i.e. set them to the mean) and only want to look at the effect of one. Then (I think) I would have to:

  1. Change the components in a way that the other covariates don't influence them (but how do I set them to the mean?)
  2. Be careful that effects has the same marginalization as the original design (e.g. [0:1:5] or so)

From @jschepers

Maybe it's worth looking at how the effects function is implemented in Unfold and then do something similar for UnfoldSim.
We have to check whether it makes more sense to manipulate the modelmatrix/design matrix after creating it or two manipulate the event table accordingly.
@behinger
Copy link
Member

hi!
Thanks for the detailed thoughts.

Groundtruth

The issue is that we defined our simulate_component including simulation, so that you can e.g. make component dependent on onsets. To work around this, we could add a simple wrapper
`simulate_responses(rng,design::AbstractDesign,components::AbstractVector{<:AbstractComponents}) = Simulation(design,components,NoOnset(),NoNoise())

Your proposal to move the Matrix to a DataFrame is more complex. We could try to use the function

function Unfold.result_to_table(
    eff::Vector{<:AbstractArray},
    events::Vector{<:DataFrame},
    times::Vector,
    eventnames::Vector,
)

maybe even something like Unfold.result_to_table([ground_truth],[generate_design(design)],[1:size(ground_truth,2)],["simulated"]) works directly; I havent tried it - note that all the brackets are necessary as in Unfold.jl we split by event-type.

Interaction with the sequence panel has to be checked for sure... but maybe a start


MSE

I don't think I understand enough of what you want to do and what the problem is. Could you write out in pseudo-math what you want? or describe it in more details?

My naive point: Why not use mean((ground_truth .- predict(::UnfoldModel).^2), why do you need effects at all?

@ReneSkukies
Copy link
Member Author

Ground truth

The simple wrapper seems like an elegant solution. However, I encountered an additional problem:
Simulate responses only gives you a Matrix of size(0:length_of_longest_compoent, n_of_design_factorial); the problem being that this means your ground truth ERP only goes from zero to the end of the longest component, which typically is inconsistent with the estimates that you want to compare it to (e.g. \tau = (-0.2, 1.0)). I worked around this by padding the ground truth to be the same length as the estimates:

using ArrayPadding

τ = [-0.1, 1]
sfreq = 100

if τ[1] < 0
	pad_before = Int64(abs(τ[1] * sfreq))
else
	pad_before = 0
end

pad_after = Int64((τ[2] * sfreq) - size(ground_truth, 1)) + 1 # I had to add +1 here to get same length, not sure why

pad(ground_truth, 0, (pad_before, 0), (pad_after, 0))

For convenience, I used ArrayPadding.jl, but there might be a more direct way that doesn't require and additional dependency.

MSE

For your naive point: In my understanding, this doesn't work due to two points

  1. Afaik the predict(::UnfoldModel) returns a predicted ("y_hat = X*b"), i.e. a continuous stream. However, the ground truth is represented as an ERP
  2. As stated above simulate response_resonse() gives you a Matrix of size(0:length_of_longest_compoent, n_of_design_factorial); meaning a GT calculation like this:
components = [p1, n1, p3]

sim = UnfoldSim.Simulation(
	SingleSubjectDesign(;
        conditions = Dict(
            :condition => ["car", "face"],
            :continuous => range(0, 5, length = 10),
	    :cont2 => range(0, 4, length = 4),
        )),
		components,
		NoOnset(),
		NoNoise()
	)
gt = UnfoldSim.simulate_responses(MersenneTwister(1),components, sim)

results in a matrix of size(45, 80).
Thus, I think there should be a way to control which conditions are actually calculated, since you might not always want to look at all possible combinations.

@behinger
Copy link
Member

  • there is a pad_array function in UnfoldSim :), not documented is that you can use -len to pad before the signal
  • I guess the padding could go into a "difference" or "mse" function? Problem interface wise is that you have to provide the sfreq + time limits of your unfold model fit, which depending on the intended interface is vailable or not
  • predict(UnfoldModel, epoch_to=Any,epoch_timewindow=[-0.1,1])?
  • I still dont understand your conditions problem - do you mean to circumvent needing a simulate_responseSingleSubjectDesign(conditions=effectsDict),...) call?

@ReneSkukies ReneSkukies added the enhancement New feature or request label Sep 12, 2024
@behinger
Copy link
Member

struct GroundTruthDesign
    design::AbstractDesign
    effectsdict::Dict
end

"""
copied from Effects.jl
"""
function expand_grid(design)
           colnames = tuple(keys(design)...)
           rowtab = NamedTuple{colnames}.(product(values(design)...))

           return DataFrame(vec(rowtab))
       end

UnfoldSim.size(t::GroundTruthDesign) = size(generate_events(t))

typical_value(v::Vector{<:Number}) = [mean(v)]
typical_value(v) = unique(v)

function UnfoldSim.generate_events(t::GroundTruthDesign)
    effects_dict = Dict{Any,Any}(t.effects_dict)
    current_design = generate_events(t.des)
    to_be_added = setdiff(names(current_design),keys(effects_dict))
    for tba in to_be_added
           effects_dict[tba] = typical_value(current_design[:,tba])
    end
    return expand_grid(effects_grid)
end

effects_dict = Dict{Symbol,Union{<:Number,<:String}}(:conditionA=>[0,1])
SingleSubjectDesign(...) |> x-> GroundTruthDesign(x,effects_dict)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants