Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch solver backend to optlang / solverbasedmodel from cameo #289

Closed
hredestig opened this issue Oct 14, 2016 · 34 comments
Closed

switch solver backend to optlang / solverbasedmodel from cameo #289

hredestig opened this issue Oct 14, 2016 · 34 comments

Comments

@hredestig
Copy link
Contributor

No description provided.

@hredestig
Copy link
Contributor Author

current status:
all tests pass except

  • optknock which uses dual model - cobrapy's implementation is not compatible with optlang solver
  • tests using constraints on the metabolites themselves
  • issue with reading matlab models (unclear why)

timings are currently

--------------------------------------------------------------------------
Name (time in us)                                optlang           cobrapy
--------------------------------------------------------------------------
test_add_metabolite_benchmark                   443.9320            9.5420 
test_add_remove_reaction_benchmark              524.6990           73.9210 
test_benchmark_read                          16,380.6060        8,477.8000 
test_benchmark_write                         60,923.8430       55,200.8510 
test_change_objective_benchmark              15,938.2120            8.1800 
test_copy_benchmark                         458,085.4020        2,775.4770 
test_deepcopy_benchmark                     415,341.8570       17,943.2370 
test_double_gene_deletion_benchmark          48,784.4910       41,247.7540 
test_flux_variability_benchmark             940,350.4910       10,299.4770 
test_loopless_benchmark                      48,038.1010          456.5510 
test_pfba_benchmark                          29,467.6280        3,098.5290 
test_phenotype_phase_plane_benchmark         29,990.3930       26,758.3250 
test_single_gene_deletion_benchmark           3,718.6280        2,369.2300 
test_single_gene_deletion_fba_benchmark       3,814.5840        1,802.5000 
test_subtract_metabolite_benchmark                9.5290            8.0390 
--------------------------------------------------------------------------

copy pasted together, to reproduce in more verbose form do:

git checkout devel
pytest --benchmark-save
git checkout optlang_model
pytest --benchmark-compare

@cdiener
Copy link
Member

cdiener commented Nov 1, 2016

Do I see correctly that FVA became 90 times slower? There seem to be some performance issues...

Edit: No it didn't, see below...

@hredestig
Copy link
Contributor Author

I just realised the benchmark are actually quite unfair, optlang model is slower in creation of the model etc but with the tiny "textbook" model that is what takes most of the time when re-running the FVA the number of times that are needed to get data, so I am switching these to a larger more realistic model (but doing so ran into some unexpected issues.. story continues)

either way, don't worry, won't merge anything like this to devel :)

@cdiener
Copy link
Member

cdiener commented Nov 1, 2016

I also saw that your benchmarks also call the entire test function along with all asserts. That also creates noise. Why does model creation take so long with optlang? Is it the Sympy parsing of the constraints?

@cdiener
Copy link
Member

cdiener commented Nov 1, 2016

Just an addition just ran some benchmarks with %time and I could not reproduce the performance differences. optlang and devel take about the same time for FBA and FVA on my system. Model creation takes about twice as long with optlang. So seems to be a problem with pytest-benchmark rather than optlang. So much ado about nothing ;)

@cdiener
Copy link
Member

cdiener commented Nov 1, 2016

And another comment (sorry): I think we should keep __repr__ function for the new Solution class. Many people have gotten used to seeing Solution 1.7 at bla>. Also careful with using pandas for __str__ since it is not a hard dependency for cobrapy.

@pstjohn
Copy link
Contributor

pstjohn commented Nov 1, 2016

Just a thought, would it possible to refactor some of the optlang code in cobra.Model and cobra.Reaction to a separate file? Those classes are getting to be fairly long (~1000 lines) and the optlang hooks seem like they might be better formulated as functions of a reaction rather than class methods. Idk, you'll obviously know better about when/where they're called, just a quick observation.

@hredestig
Copy link
Contributor Author

updated status (including #307)

----------------------------------------------------------------------------
Name (time in us)                                  optlang          cobrapy
----------------------------------------------------------------------------
test_add_metabolite_benchmark                     445.1630           9.3970 
test_add_remove_reaction_benchmark                535.4990          74.5510 
test_benchmark_read                            20,031.7060       8,252.1710 
test_benchmark_write                           64,713.6160      55,232.2170 
test_change_objective_benchmark                16,054.4890           8.1210 
test_copy_benchmark                           472,290.4230       2,738.8600 
test_deepcopy_benchmark                       437,081.7910      18,562.3210 
test_double_gene_deletion_benchmark         1,443,851.7150   1,277,075.5060 
test_flux_variability_benchmark             1,608,795.1180   1,493,266.9820 
test_loopless_benchmark                        55,025.4340         453.5710 
test_pfba_benchmark                           498,052.5540     543,615.4500 
test_phenotype_phase_plane_benchmark           32,687.4700      26,568.9930 
test_single_gene_deletion_benchmark           365,926.8060     323,907.7010 
test_single_gene_deletion_fba_benchmark       444,926.5630     397,567.3320 
test_subtract_metabolite_benchmark                  9.4470           8.1380 
----------------------------------------------------------------------------

good points above, will consider them

@hredestig
Copy link
Contributor Author

adjusting the benchmarks to reflect that optlang version also updates the solver problem (which cobrapy also eventually has to do) we arrive at these numbers


------------------------------------------------------------ --------------
Name (time in us)                                    optlang        cobrapy
------------------------------------------------------------ --------------
test_add_metabolite_benchmark[cglpk]             13,232.1950       375.6060 
test_add_remove_reaction_benchmark[cglpk]           538.7850       295.7280 
test_benchmark_read                              17,018.4280     8,838.1990 
test_benchmark_write                             61,193.0380    64,410.8760 
test_change_objective_benchmark[cglpk]              411.9710       216.0620 
test_copy_benchmark                             465,763.4080     2,793.9070 
test_deepcopy_benchmark                         432,935.5270    19,048.1710 
test_double_gene_deletion_benchmark           1,433,433.8710 1,344,390.2910 
test_flux_variability_benchmark[cglpk]        1,600,486.5420 1,515,880.8660 
test_loopless_benchmark                          51,907.1700       457.9230 
test_pfba_benchmark[cglpk]                      503,139.0660   497,062.8120 
test_phenotype_phase_plane_benchmark             46,886.1440    31,784.9910 
test_single_gene_deletion_benchmark             366,714.6620   318,752.4890 
test_single_gene_deletion_fba_benchmark         438,964.5880   400,384.6360 
test_subtract_metabolite_benchmark[cglpk]             9.8440       216.4100 
------------------------------------------------------------ --------------

@phantomas1234 test_loopless and test_gapfilling are both failing with cplex
@cdiener solution._repr_ and pandas dependency fixed
@pstjohn I see what you mean with the Reaction.py (706 vs 1089) Model.py (382 vs 656) are getting heavy but somehow I feel that re-organizing that isn't really going to make the code easier to read since the attributes there are still very much concern only with those classes..

@pstjohn
Copy link
Contributor

pstjohn commented Nov 4, 2016

I tried pulling the optlang branch, but got a

In [1]: import cobra
No solvers available.
In [2]: import cobra.test
In [3]: model = cobra.test.create_test_model('ecoli')
...
AttributeError: module 'optlang' has no attribute 'Model'
...

Should I not be installing optlang via pip?

Also, it complained about pytest in a new conda environment, so that might need to get added as a requirement.

In terms of refactoring the properties, you can always define functions that operate on reactions def blah(reaction, *args) rather than def blah(self). Or we could do some inheritance which would also allow breaking up the methods into different files. Maybe just more commenting would be enough too 😄

@cdiener
Copy link
Member

cdiener commented Nov 4, 2016

Hi @hredestig I ran the benchmark comparison but get pretty different results from yours. Could you post how you got that nice formatting with "optlang" and "cobrapy" in different columns so I can post the results here. Major difference i fould now was MOMA which is about 20 times slower with optlang...

test_gapfilling works fine for me. test_loopless fails for cplex as well.

EDIT: Nevermind figured it out. Here my results. Factor means optlang/coprapy:

                                          test  time [ms] cobrapy  time [ms] optlang      factor
18  test_subtract_metabolite_benchmark[gurobi]           4.582203           0.006804    0.001485
17   test_subtract_metabolite_benchmark[cplex]           4.884656           0.007282    0.001491
19   test_subtract_metabolite_benchmark[cglpk]           0.150635           0.007030    0.046672
25      test_change_objective_benchmark[cplex]           4.758334           0.550066    0.115601
26     test_change_objective_benchmark[gurobi]           4.565922           0.546461    0.119683
21  test_add_remove_reaction_benchmark[gurobi]           4.703368           0.610961    0.129899
20   test_add_remove_reaction_benchmark[cplex]           4.523443           0.626644    0.138533
1                  test_pfba_benchmark[gurobi]         233.341131         177.806495    0.762002
0                   test_pfba_benchmark[cplex]         282.916656         252.139153    0.891214
7       test_flux_variability_benchmark[cplex]        1082.724704        1166.438857    1.077318
8      test_flux_variability_benchmark[gurobi]        2177.550406        2367.999671    1.087460
5          test_single_gene_deletion_benchmark         249.077424         272.857519    1.095473
9       test_flux_variability_benchmark[cglpk]        1105.834278        1216.494740    1.100070
6          test_double_gene_deletion_benchmark         694.960794         764.976082    1.100747
3      test_single_gene_deletion_fba_benchmark         306.315370         338.928193    1.106468
2                   test_pfba_benchmark[cglpk]         325.843853         365.088172    1.120439
13                        test_benchmark_write          31.708558          35.595469    1.122582
11        test_phenotype_phase_plane_benchmark          17.913414          31.039275    1.732739
15       test_add_metabolite_benchmark[gurobi]           4.743980           8.348050    1.759714
14        test_add_metabolite_benchmark[cplex]           4.608387           8.259900    1.792363
22   test_add_remove_reaction_benchmark[cglpk]           0.214981           0.611174    2.842921
27      test_change_objective_benchmark[cglpk]           0.150194           0.550403    3.664625
12                         test_benchmark_read           5.687287          21.185443    3.725052
4     test_single_gene_deletion_moma_benchmark        2217.651304       34998.070239   15.781593
24                     test_deepcopy_benchmark          13.135935         278.250522   21.182391
16        test_add_metabolite_benchmark[cglpk]           0.261074           8.404744   32.192897
23                         test_copy_benchmark           2.189972         297.671050  135.924615
10                     test_loopless_benchmark           0.350967          50.506941  143.908147

Many of the benchmarks are not as bad for me with optlang. However, there are some weird performance differences we will need to check out. I will continue profiling the problematic operations a bit.

@cdiener
Copy link
Member

cdiener commented Nov 4, 2016

Hi sorry, just as a clarification because I think it's confusing: almost none of the benchmark results reported here actually use the optlang solvers. optlang solvers are still not added to the solver_dict so they are not used in pFBA, FVA, and in model.optimize if the solver argument is set. What you see in the benchmarks is just the overhead generated by updating the solver interface at every model modification and the slower copy.

@pstjohn
Copy link
Contributor

pstjohn commented Nov 4, 2016

Modifying the solutions in-place should theoretically lead to speed improvements in the long run, right? That way the solver doesn't need to be re-initialized for repeated model changes...

@cdiener
Copy link
Member

cdiener commented Nov 4, 2016

That depends on the use case. Basically it should be faster if you have a workflow where you change only few reactions and solve a lot of times (the first few lines of the tests in my benchmark correspond to that case). However, there is also some overhead due to the optlang API. It is slower in cases where you modify the model a lot and only solve once, since in the worst case you update every reaction several times before solving (see loopless and moma who do that more or less).

It really depends on what our default use case is supposed to be. For instance there are a lot of people only using cobrapy to convert between model formats or construct models de novo (with only little FBA) as one can see on the mailing list. Those will experience significant performance reductions.

In general the nicer optlang API will cost some performance. I think that's okay given the better usability.

@pstjohn
Copy link
Contributor

pstjohn commented Nov 4, 2016

Would it be better to evaluate lazily then? I wonder if we could use something similar to the context manager approach proposed in #306 to build up a history of changes that need to be done to the solver, and then flush the queue when we call optimize. (sorry for being slow to put in a PR for that... scrambling for some upcoming presentations 😄 )

@cdiener
Copy link
Member

cdiener commented Nov 4, 2016

That would be insanely cool! However, I have no idea how to implement that since you would also have to know when a change overwrites another one. For instance, calling:

model.lower_bound = 3
...
model.lower_bound = 4

The history should "know" that it can delete the first one since the second one supersedes it. Maybe just having some data structure that logs which bounds, reactions, metabolites and objectives have changed (constant memory usage yeiii). Than only update those in the solver upon optimize or getting the solver property. Like for a reaction it would check whether it is still there and update bounds and stoichiometries or just remove it.

@pstjohn
Copy link
Contributor

pstjohn commented Nov 4, 2016

You could actually say the same thing about the context managers, no need to re-write a bound many times over. But that's not a required feature for a first pass implementation, it would still be as efficient as the current methods even with re-writing.

If everyone's on board we should try to merge the history classes into one, I would bet a lot of the code would be pretty similar. Anything that we want to track for the solution would likely also want to get reverted by the context, so we might be able to similarly decorate lower_bound and the like and move some of the optlang code out from Reaction.py that way.

@cdiener
Copy link
Member

cdiener commented Nov 4, 2016

Yeah. The only difference is that solver history is not executed until solving the model whereas model history is applied immediately and reset afterwards. No idea how you would do that in the same history, but looking forward to your ideas.

@pstjohn
Copy link
Contributor

pstjohn commented Nov 5, 2016

I dont have it totally worked out either. In each case we'd build up a stack of modifications and then trigger their applications at a later time... In the model context, that happens on model.__exit__, for optlang that happens on model.optimize. The decorators could just keep track of two simultaneous histories, one for optlang and one for the context manager - each one would define different undo functions, one would reset the model to its original state, the other would act on the solution object.

@cdiener
Copy link
Member

cdiener commented Nov 7, 2016

Using the reversible representation in the solver instead of the irreversible does give significant performance improvement. For a large model reading from SBML gets about 1/3 faster and calling optimize the first time is twice as fast. Also muuuuch less code in the bound setters and _populate_solver...

EDIT:

Also model.copy is slow to due to the deepcopy of the solution, not the deepcopy of _solver. Resetting the solution of the copied object to a new LazySolution make model.copy much faster. Still 5-10 times slower than the old copy but much better. This conserves the solution basis so solving the copied model still gains the speed advantage of previous copies.

@phantomas1234
Copy link
Contributor

@cdiener that sounds promising. Can you maybe share your code? @hredestig should we maybe open a pull request for the refactored code so we can discuss it more easily?

@cdiener
Copy link
Member

cdiener commented Nov 7, 2016

Sure, I made a branch here: https://github.com/cdiener/cobrapy/tree/optlang_cdiener. Comparison with the old (current optlang branch):

                                        test  time [ms] old  time [ms]new    factor
23                         test_copy_benchmark     297.671050     20.608631  0.069233
24                     test_deepcopy_benchmark     278.250522     33.817388  0.121536
4     test_single_gene_deletion_moma_benchmark   34998.070239   6889.473639  0.196853
10                     test_loopless_benchmark      50.506941     17.797014  0.352368
2                   test_pfba_benchmark[cplex]     252.139153    135.429851  0.537123
0                  test_pfba_benchmark[gurobi]     177.806495     95.888721  0.539287
26      test_change_objective_benchmark[cglpk]       0.550403      0.326667  0.593505
27      test_change_objective_benchmark[cplex]       0.550066      0.327242  0.594913
25     test_change_objective_benchmark[gurobi]       0.546461      0.327909  0.600059
22   test_add_remove_reaction_benchmark[cplex]       0.626644      0.435079  0.694301
20  test_add_remove_reaction_benchmark[gurobi]       0.610961      0.438270  0.717345
21   test_add_remove_reaction_benchmark[cglpk]       0.611174      0.439949  0.719842
1                   test_pfba_benchmark[cglpk]     365.088172    274.528417  0.751951
14       test_add_metabolite_benchmark[gurobi]       8.348050      6.311666  0.756065
15        test_add_metabolite_benchmark[cglpk]       8.404744      6.418215  0.763642
16        test_add_metabolite_benchmark[cplex]       8.259900      6.363597  0.770421
12                         test_benchmark_read      21.185443     16.960052  0.800552
3      test_single_gene_deletion_fba_benchmark     338.928193    311.072180  0.917811
7      test_flux_variability_benchmark[gurobi]    2367.999671   2175.041806  0.918514
8       test_flux_variability_benchmark[cglpk]    1216.494740   1122.802473  0.922982
9       test_flux_variability_benchmark[cplex]    1166.438857   1091.038845  0.935359
5          test_single_gene_deletion_benchmark     272.857519    256.449415  0.939866
13                        test_benchmark_write      35.595469     33.492812  0.940929
6          test_double_gene_deletion_benchmark     764.976082    722.069430  0.943911
19   test_subtract_metabolite_benchmark[cplex]       0.007282      0.006949  0.954214
11        test_phenotype_phase_plane_benchmark      31.039275     30.079134  0.969067
18   test_subtract_metabolite_benchmark[cglpk]       0.007030      0.006888  0.979790
17  test_subtract_metabolite_benchmark[gurobi]       0.006804      0.007063  1.038079

@phantomas1234
Copy link
Contributor

@cdiener Ok, this might be the way to go but will break a lot of things in cameo unfortunately. After getting rid of the 2 variables per reaction, what would be your take on implementing pfba? Because I don't want to go back to things like convert_to_irreversible.

@cdiener
Copy link
Member

cdiener commented Nov 8, 2016

The gain is not huge, we can still think about leaving it as two but might have to optimize a bit more. CProfile is your friend here. For pFBA I would temporsrily add new variables to the cameo solver and change the bounds. Remove everything afterwards. As we can see from the benchmarks now the overhead of that is not huge. Another thing I found with speed is to optimize Reaction.objective_coefficient. Getting the objective fron the low-level solver, converting it to sympy and extracting the coefficients and doing that for every reaction is not very efficient.

@pstjohn
Copy link
Contributor

pstjohn commented Nov 8, 2016

Can we implement absolute value lower down in the stack? One of the 'future directions' for optlang lists handling absolute values. I'm thinking for things like linear MOMA & future methods I'd rather have the solver interface handle that

@phantomas1234
Copy link
Contributor

We tested 2 vs. 1 variable per reaction and we see only a marginal increase in runtime for model.optimize.

@phantomas1234
Copy link
Contributor

phantomas1234 commented Nov 8, 2016

Give me ~4 days and I'll work something out that I deem compatible with cameo (after all I don't see a value in merging things from cameo into cobrapy in a way that breaks cameo; adding variables on the fly is not efficient but I get that pfba might be the only method that would suffer from that). If for example pfba in cobrapy becomes a lot slower due to the merge because it uses convert_to_irreversible then the solution is not to make add_reaction faster but to replace it with cameo's pfba function.

Edit:
Actually there a many more methods in cameo that benefit from the 2 variables per reaction setup.

@phantomas1234
Copy link
Contributor

After fixing model.copy() by not deepcopying model.solution (thanks @cdiener), it is about 1.7x slower than devel-2 cobrapy (for the cplex interface). I don't thank that we'll have a real issue here anymore.

@cdiener
Copy link
Member

cdiener commented Nov 8, 2016

Yeah for solving its only about 30% faster. I think the issue we have is more with model creation. For instance reading from SBML became twice as slow due to _populate_solver. However, I think the right way to go here is to implement the lazy solver as proposed by @pstjohn. This way model creation for models that are not to be solved (converting between matlab and SBML for example) wouldn't suffer because of some optimization for a future FBA that is never run. In hindsight I also think leaving the problem in standard form might be better for allowing dualising methods to be formulated faster.

Regarding pFBA, I think all methods have to be ported to the new solver interface. Some operations that were cheap before a slow now and vice versa.

@phantomas1234 I noted that switching the solver interface with optlang is really slow (model.solver = "glpk"). Also optlang is not deterministic in the order of the solvers it picks. Might be worthwhile having a look at optlang to see whether we can squeeze some speed out of there too...

@phantomas1234
Copy link
Contributor

I like the lazy solver idea from @pstjohn. We actually would have use for that too. Would you and @pstjohn maybe like to get started on that in #309? Should we maybe move discussions to #309 or should we leave the PR for code review and keep the issue for discussions?

@cdiener
Copy link
Member

cdiener commented Nov 8, 2016

Sure, I can push some stuff. For the functions I would like to take the moma module. I have some ideas how to implement that without any new variables.

@phantomas1234
Copy link
Contributor

Sounds good. Do you mean quadratic or linear moma? Also, when it comes to benchmarks one should keep in mind the huge timing differences for create_problem between the current cglpk and cplex interfaces. In order to reach the speed of cglpk (uses glp_load_matrix) we would have to optimize optlang even further. I doubt it would be worth it.

@cdiener
Copy link
Member

cdiener commented Nov 8, 2016

Yeah this would be solved by the lazy solver anyways since that extra time
is added only if the model is solved the first time. I meant quadratic moma.

On Tue, Nov 8, 2016 at 2:16 PM, Nikolaus Sonnenschein <
[email protected]> wrote:

Sounds good. Do you mean quadratic or linear moma? Also, when it comes to
benchmarks one should keep in mind the huge timing differences for
create_problem between the current cglpk and cplex interfaces. In order
to reach the speed of cglpk (uses glp_load_matrix) we would have to
optimize optlang even further. I doubt it would be worth it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#289 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AG0pDzMjHBnpgEqNGH-xAYPt32_VoQkjks5q8NiPgaJpZM4KW8-V
.

Dr. Christian Diener [email protected]

For encrypted communication you may use the following public key:
https://cdiener.com/key

@hredestig
Copy link
Contributor Author

with creation of devel-2 branch featuring first version of solverbased model, this ticket is no longer specific enough so closing this, further refactoring to be discussed in other issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants