Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Multiple Groups for One Feature #10

Open
jlevy44 opened this issue Feb 16, 2020 · 3 comments
Open

Specify Multiple Groups for One Feature #10

jlevy44 opened this issue Feb 16, 2020 · 3 comments

Comments

@jlevy44
Copy link

jlevy44 commented Feb 16, 2020

Nice package! Just following up from another thread, in your package is it possible to specify multiple groups for one feature (eg. overlapping groups)? Thanks!

@yngvem
Copy link
Owner

yngvem commented Feb 20, 2020

There is currently no way of specifying that. However, it is on my wishlist for the future.

The reason this is not supported yet is that it is not immediately clear how it should be done, and all options are mathematically much more complex than the non-overlapping case.

  1. We could have overlapping groups so that if a group is excluded, then the coefficients of all the covariates in the group is set to zero. This leads to a prox operator that is very difficult to compute.
  2. We could introduce auxiliary variables, essentially copying the columns that are repeated for each group they are contained in. This leads to a Lipschitz bound that is difficult to compute efficiently. (The approach of latent group lasso).

I am looking into it, but it is a low-priority issue unless I get many more requests for it. If your dataset is small, then you can manually implement the second option by creating multiple copies of rows that correspond to covariates in more than one group.

@jlevy44
Copy link
Author

jlevy44 commented Feb 23, 2020

I have a pretty big dataset, one of my ideas was to parallelize/scatter the l2 norms of the groups, but indexing/copying parts of the parameter matrix can be costly as you had mentioned

@yngvem
Copy link
Owner

yngvem commented Mar 3, 2020

The main reason for why it will be difficult to implement the latent group lasso is that there is no longer an easy closed-form solution to the Lipschitz bound of the loss gradient.

However, if I at some point get the time to implement Poisson regression, then I first need to implement a line-search based FISTA method. Once this is done, then latent group lasso seems relatively straightforward.

Unfortunately, I do not have much time to develop this project before the summer and a line-search will require much rewriting of the code so I will not add latent group lasso before July the earliest.

Edit: I am now using a line search for the step size, so this could in theory be implemented. Unfortunately I don't have time for that now, but welcome a pull request for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants