Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide variable in pool dataframe showing which visit the row belongs to #255

Open
gowerc opened this issue Nov 12, 2021 · 8 comments · May be fixed by #362
Open

Provide variable in pool dataframe showing which visit the row belongs to #255

gowerc opened this issue Nov 12, 2021 · 8 comments · May be fixed by #362
Labels
enhancement New feature or request

Comments

@gowerc
Copy link
Collaborator

gowerc commented Nov 12, 2021

Potentially useful for reporting to enable grouping / categorisation by visit

@nociale nociale added the To be done Must be done label Nov 16, 2021
@gowerc gowerc added enhancement New feature or request and removed To be done Must be done labels Dec 22, 2021
@gowerc
Copy link
Collaborator Author

gowerc commented Feb 21, 2022

Am thinking that this could be acheived by converting the analysis function to return a dataframe instead of a list. User can then insert arbitrary variables into the dataframe which are retained and combined into a final dataset. In theory this is also acheivable by keeping the list structure and just allowing the user/analysis function to insert arbitrary meta variables into it...

@pengguanya
Copy link
Collaborator

pengguanya commented Apr 27, 2022

Am thinking that this could be acheived by converting the analysis function to return a dataframe instead of a list. User can then insert arbitrary variables into the dataframe which are retained and combined into a final dataset. In theory this is also acheivable by keeping the list structure and just allowing the user/analysis function to insert arbitrary meta variables into it...

@gowerc I see the nested list with concatenated visit names was originally created in ancova function here

    res <- lapply(
        visits,
        function(x) {
            data2 <- data[data[[visit]] == x, ]
            res <- ancova_single(data2, outcome, group, covariates, weights)
            names(res) <- paste0(names(res), "_", x)
            return(res)
        }
    )

You said you want to convert the return value of analysis from this list to a data.frame. Should we change directly from ancova? Or you want to add some patch in analysis to transform the list to a data.frame after applying ancova ? In this case, we also need to modify pool accordingly since transpose_results in pool relies on the assumption of nested list data-structure.

Regarding your alternative suggestion, could you specify what does it mean by allowing the user/analysis function to insert arbitrary meta variables into it. You want to add a method to pool object or object returned by analysis allowing user modify the names (appendixed by visit number) in the nest list with user specified visit number? Is that what you mean?

Another idea is to modify the data structure of returned attribute pars in pool object from nest list to data.frame containing a visit column. But then the par attribute from pool object and results attribute from the returned object (rubin class) from analysis function will have different data-structures and formats: data.frame vs. list. Is that what you would do?

@gowerc
Copy link
Collaborator Author

gowerc commented May 3, 2022

From @nociale

Hi Guanya

Thanks for presenting your thoughts on this topic today. I thought a bit more about your proposal and I think that the best option is to structure the output of the analysis function as follows:

res <- list(
    trt_1 = list(
        est = ...,
        se = ...,
        df = ...,
        vars_no_pool = list(
            var1 = ...,
            var2 = ...,
            ...
        )
    ),
    trt_2 = list(
        est = ...,
        se = ...,
        df = ...,
        vars_no_pool = list(
            var1 = ...,
            var2 = ...,
            ...
        )
    ),
    ...
)

This was one of the options you suggested. I think this is the only feasible solution since vars_no_pool are specific to trt_1, trt_2 etc.. For example, if we want to include the visit at which the analysis is performed, trt_1 might refer to the first visit, trt_2 to the second visit etc..
Including these vars_no_pool variables as an element of the higher level list might create confusion.

Of course, these vars_no_pool variables must respect certain constraints, since the chosen pooling method is not applied to them. For example, I think that these variables should have the same value on each imputed dataset. For example, the visit of trt_1 has value 1 on each imputed dataset.

Also, the name of this list element should also be chosen carefully to make it easy for the user to understand (and documented properly..).

This is just my opinion of course! I am happy to discuss more.
Kind regards,
Alessandro

@gowerc
Copy link
Collaborator Author

gowerc commented May 3, 2022

@nociale & @pengguanya ,

I think this is roughly in line with my original thoughts for how to handle this. I'm not a fan of the naming vars_no_pool at least to me this name doesn't make it immediately clear what the sub-list is for. Though having said that I'm not sure on an alternative name, perhaps meta or metadata or additional_vars or extravars.

Another comment, I'm wondering if it makes sense to just take these meta values from the first iteration only and accept that as the "true" values. It would save us having to do a load of consistency checking and error handling. Just push it to the user in the documentation to say only the first returned result is used.

@gowerc
Copy link
Collaborator Author

gowerc commented May 3, 2022

@nociale, A suggestion that @pengguanya had mentioned was getting rid of the inner-lists and use an instantiation function / class instead. I liked the idea but he also correctly mentioned that there is a risk that we would then be changing exposed UI. I am wondering if the package is still niche enough for us to get away with changing this UI or not ?

For example the analysis function would return like:

x <- list(
    analysis_result(
        name = "trt_1",
        est = 1.4,
        sd = 0.5,
        df = NA
    ),
    analysis_result(
        name = "trt_2",
        est = 1.4,
        sd = 0.5,
        df = NA
    )
)

Perhaps this is overkill though...

@nociale
Copy link
Collaborator

nociale commented May 3, 2022

Hi both!

Good that we agree on the structure! Regarding the name, I was making an example without focusing too much on it. I like extravars or metavars.

I think it would be nice to get rid of one level of lists, and I think the solution is also smart. How would you change the UI then? For example, would a user be able to set name when he just runs an ANCOVA?
Also, I am thinking whether this solution might complicate (or simplify) the user's life when defining a custom analysis function. The user should maintain the structure using analysis_result(...), so he needs to know where this function comes from and how to use it. I am not discarding this idea, I think we need to think both how the UI changes in analyse() and whether the definition of a custom analysis function gets easier or more difficult.

@gowerc
Copy link
Collaborator Author

gowerc commented May 3, 2022

O yes, to be clear the only place this impacts end-users would be in the custom analysis functions. Excluding the custom analysis functions the only thing this impacts is our own internal ancova function which is purely internal.

I.e. we would be asking users in custom analysis functions to return a list of analysis_results (or perhaps just "results"?) rather than a list of lists with fixed names.

@nociale
Copy link
Collaborator

nociale commented May 6, 2022

It looks good to me, this solution would simplify the output structure. Giving the possibility of inserting a "name" to each parameter being estimated looks also good to have a nicer output table from the pool() step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants