Interplay of max_subproblem_frac and max_subproblem_size #8

smirarab · 2011-12-14T16:22:57Z

In the current implementation if both max_subproblem_frac and max_subproblem_size are provided, the larger of the two is used. This is I think counter-intuitive. If two numbers are specifying the maximum size for a quantity, to me it makes more sense to take the smaller of the two.

Basically, the two options are "OR"ed in current implementation, but I think "AND"ing them makes more sense. Here is and example. If max_subproblem_frac is set to 20% an mximum_subproblem_size to 200, I think the user means she wants her subproblems to be no larger that 20% of the original problem AND no larger than 200 taxa. That is, the subproblems should be both under 200 taxa and 20% of the original problem. This is not what the current code (see below) does.

        configuration = self.configuration()
        # Here we check if the max_subproblem_frac is more stringent than max_subproblem_size
        frac_max = int(math.ceil(self.max_subproblem_frac*self.tree.n_leaves))
        if frac_max > self.max_subproblem_size:
            configuration['max_subproblem_size'] = frac_max

And here is one more motivation for the approach I am suggesting.

In the current implementation of SATe, if a subproblem is larger than 200 taxa, the fast and inaccurate version of MAFFT is used on it. So if your subproblem turns out to be 210 taxa, you get much worse results that 200 taxa. For that reason, I always like to limit the size of my alignment subsets to 200 taxa. But then if the number of sequences are less than 1000, the 200 limit amounts to more than 20%, which is not good either. This means I would need to create separate config files for different inputs, based on their size. But if the conditions were interpreted as I am suggesting, I could simply set max_subproblem_size to 200 and max_subproblem_frac to 0.2. This way the subsets would have never been more than 200 taxa, and at the same time, they would have never been too big (ie. if the input alignment is small, say 400 taxa).

Also note that the comment is very confusing. In my mind, stringent means lower here, not higher.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interplay of max_subproblem_frac and max_subproblem_size #8

Interplay of max_subproblem_frac and max_subproblem_size #8

smirarab commented Dec 14, 2011

Interplay of max_subproblem_frac and max_subproblem_size #8

Interplay of max_subproblem_frac and max_subproblem_size #8

Comments

smirarab commented Dec 14, 2011