Provide variable in pool dataframe showing which visit the row belongs to #255

gowerc · 2021-11-12T10:14:52Z

Potentially useful for reporting to enable grouping / categorisation by visit

gowerc · 2022-02-21T14:56:02Z

Am thinking that this could be acheived by converting the analysis function to return a dataframe instead of a list. User can then insert arbitrary variables into the dataframe which are retained and combined into a final dataset. In theory this is also acheivable by keeping the list structure and just allowing the user/analysis function to insert arbitrary meta variables into it...

pengguanya · 2022-04-27T15:41:52Z

Am thinking that this could be acheived by converting the analysis function to return a dataframe instead of a list. User can then insert arbitrary variables into the dataframe which are retained and combined into a final dataset. In theory this is also acheivable by keeping the list structure and just allowing the user/analysis function to insert arbitrary meta variables into it...

@gowerc I see the nested list with concatenated visit names was originally created in ancova function here

    res <- lapply(
        visits,
        function(x) {
            data2 <- data[data[[visit]] == x, ]
            res <- ancova_single(data2, outcome, group, covariates, weights)
            names(res) <- paste0(names(res), "_", x)
            return(res)
        }
    )

You said you want to convert the return value of analysis from this list to a data.frame. Should we change directly from ancova? Or you want to add some patch in analysis to transform the list to a data.frame after applying ancova ? In this case, we also need to modify pool accordingly since transpose_results in pool relies on the assumption of nested list data-structure.

Regarding your alternative suggestion, could you specify what does it mean by allowing the user/analysis function to insert arbitrary meta variables into it. You want to add a method to pool object or object returned by analysis allowing user modify the names (appendixed by visit number) in the nest list with user specified visit number? Is that what you mean?

Another idea is to modify the data structure of returned attribute pars in pool object from nest list to data.frame containing a visit column. But then the par attribute from pool object and results attribute from the returned object (rubin class) from analysis function will have different data-structures and formats: data.frame vs. list. Is that what you would do?

gowerc · 2022-05-03T11:21:56Z

From @nociale

Hi Guanya

Thanks for presenting your thoughts on this topic today. I thought a bit more about your proposal and I think that the best option is to structure the output of the analysis function as follows:
res <- list(
    trt_1 = list(
        est = ...,
        se = ...,
        df = ...,
        vars_no_pool = list(
            var1 = ...,
            var2 = ...,
            ...
        )
    ),
    trt_2 = list(
        est = ...,
        se = ...,
        df = ...,
        vars_no_pool = list(
            var1 = ...,
            var2 = ...,
            ...
        )
    ),
    ...
)
This was one of the options you suggested. I think this is the only feasible solution since vars_no_pool are specific to trt_1, trt_2 etc.. For example, if we want to include the visit at which the analysis is performed, trt_1 might refer to the first visit, trt_2 to the second visit etc..
Including these vars_no_pool variables as an element of the higher level list might create confusion.

Of course, these vars_no_pool variables must respect certain constraints, since the chosen pooling method is not applied to them. For example, I think that these variables should have the same value on each imputed dataset. For example, the visit of trt_1 has value 1 on each imputed dataset.

Also, the name of this list element should also be chosen carefully to make it easy for the user to understand (and documented properly..).

This is just my opinion of course! I am happy to discuss more.
Kind regards,
Alessandro

gowerc · 2022-05-03T11:26:11Z

@nociale & @pengguanya ,

I think this is roughly in line with my original thoughts for how to handle this. I'm not a fan of the naming vars_no_pool at least to me this name doesn't make it immediately clear what the sub-list is for. Though having said that I'm not sure on an alternative name, perhaps meta or metadata or additional_vars or extravars.

Another comment, I'm wondering if it makes sense to just take these meta values from the first iteration only and accept that as the "true" values. It would save us having to do a load of consistency checking and error handling. Just push it to the user in the documentation to say only the first returned result is used.

gowerc · 2022-05-03T11:31:09Z

@nociale, A suggestion that @pengguanya had mentioned was getting rid of the inner-lists and use an instantiation function / class instead. I liked the idea but he also correctly mentioned that there is a risk that we would then be changing exposed UI. I am wondering if the package is still niche enough for us to get away with changing this UI or not ?

For example the analysis function would return like:

x <- list(
    analysis_result(
        name = "trt_1",
        est = 1.4,
        sd = 0.5,
        df = NA
    ),
    analysis_result(
        name = "trt_2",
        est = 1.4,
        sd = 0.5,
        df = NA
    )
)

Perhaps this is overkill though...

nociale · 2022-05-03T15:42:51Z

Hi both!

Good that we agree on the structure! Regarding the name, I was making an example without focusing too much on it. I like extravars or metavars.

I think it would be nice to get rid of one level of lists, and I think the solution is also smart. How would you change the UI then? For example, would a user be able to set name when he just runs an ANCOVA?
Also, I am thinking whether this solution might complicate (or simplify) the user's life when defining a custom analysis function. The user should maintain the structure using analysis_result(...), so he needs to know where this function comes from and how to use it. I am not discarding this idea, I think we need to think both how the UI changes in analyse() and whether the definition of a custom analysis function gets easier or more difficult.

gowerc · 2022-05-03T15:59:10Z

O yes, to be clear the only place this impacts end-users would be in the custom analysis functions. Excluding the custom analysis functions the only thing this impacts is our own internal ancova function which is purely internal.

I.e. we would be asking users in custom analysis functions to return a list of analysis_results (or perhaps just "results"?) rather than a list of lists with fixed names.

nociale · 2022-05-06T11:57:40Z

It looks good to me, this solution would simplify the output structure. Giving the possibility of inserting a "name" to each parameter being estimated looks also good to have a nicer output table from the pool() step.

nociale added the To be done Must be done label Nov 16, 2021

gowerc added enhancement New feature or request and removed To be done Must be done labels Dec 22, 2021

gowerc linked a pull request Jul 8, 2022 that will close this issue

Restructure analysis results + add visit variable to pool data.frame #362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide variable in pool dataframe showing which visit the row belongs to #255

Provide variable in pool dataframe showing which visit the row belongs to #255

gowerc commented Nov 12, 2021

gowerc commented Feb 21, 2022

pengguanya commented Apr 27, 2022 •

edited

Loading

gowerc commented May 3, 2022 •

edited

Loading

gowerc commented May 3, 2022

gowerc commented May 3, 2022

nociale commented May 3, 2022

gowerc commented May 3, 2022 •

edited

Loading

nociale commented May 6, 2022

Provide variable in pool dataframe showing which visit the row belongs to #255

Provide variable in pool dataframe showing which visit the row belongs to #255

Comments

gowerc commented Nov 12, 2021

gowerc commented Feb 21, 2022

pengguanya commented Apr 27, 2022 • edited Loading

gowerc commented May 3, 2022 • edited Loading

gowerc commented May 3, 2022

gowerc commented May 3, 2022

nociale commented May 3, 2022

gowerc commented May 3, 2022 • edited Loading

nociale commented May 6, 2022

pengguanya commented Apr 27, 2022 •

edited

Loading

gowerc commented May 3, 2022 •

edited

Loading

gowerc commented May 3, 2022 •

edited

Loading