-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏗️ Integrate lnschema-core
into lamindb
#921
Conversation
lnschema-core
lnschema-core
lnschema-core
into lamindb
@@ -42,8 +42,8 @@ def get_schema_module_name(schema_name, raise_import_error: bool = True) -> str | |||
|
|||
|
|||
def register_storage_in_instance(ssettings: StorageSettings): | |||
from lnschema_core.models import Storage | |||
from lnschema_core.users import current_user_id | |||
from lamindb.models import Storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this is much slower, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be much slower?
(1) Do you mean because of a few additional dependencies that lamindb has over lamindb_setup? If that's the reason, I'm pretty sure it should be negligible. If it's not negligible we ought to think through optimizing import time on lamindb.
(2) If something else: 🤔 -- I can only speculate that you're having the auto-connect behavior in mind. Also here there is no additional slow-ness because you need to have django setup (connection established, if you will) also in the case of lnschema_core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean 1), yes. lamindb is much much heavier in terms of imports, and I don’t really think it can be improved. We definitely need to test the import times. In my opinion significant degradation of performance is not justified by more convenient code organization (and I am not even sure about convenience here). lamin-cli is already pretty slow and we shouldn’t make it even slower I think.
In summary I think we need to check the performance carefully before going forward with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I don't believe it, let's assume that lamindb is so much heavier that it leads to noteworthy degradation of import times; but even if noteworthy, it's hard to believe that they'd ever degrade more than 300ms.
Now consider that from lamindb.models import Storage
and all similar statements is a dynamic import in lamindb-setup
. It has to be because django needs to be setup before it runs through.
All instances in which the dynamic import is being run are cases in which run time is a few seconds. The worst case scenario of 300ms then isn't nice, but is also no big deal.
I've played with this for the past hours and I don't notice any slow down. If anything, I notice that several parts of the UX are faster (because I disentangled some recursive imports).
more convenient code organization (and I am not even sure about convenience here)
This is not at all about convenience. It's about:
- achieving a mono repo
- https://github.com/laminlabs/pfizer-lamin-usage/issues/75
- no longer depending on an external Django (https://laminlabs.slack.com/archives/C04FPE8V01W/p1735587801694429)
Points 2. and 3. are enabled through 1. -- I imagine that 1. will enable more fundamental code organization improvements like 2. and 3. beyond easing operational complications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not that optimistic about performance. I know that it is fine on Linux (and probably macOS), but on windows it is already inconveniently slow and probably will be even slower with these changes. On windows it looks like additional imports make a difference.
Re part 2, I don’t entirely get why 1 and 2 are really needed, but we can discuss of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I disentangled some recursive imports
maybe this will make a positive difference on windows also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also systems with pretty slow import of modules, like Helmholtz compute cluster, we need to have them in mind also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not that optimistic about performance. I know that it is fine on Linux (and probably macOS), but on windows it is already inconveniently slow and probably will be even slower with these changes. On windows it looks like additional imports make a difference.
Interesting, I wasn't aware of this at all. What is inconveniently slow? Can you make an issue on the lamindb
repo and we discuss there?
Re part 2, I don’t entirely get why 1 and 2 are really needed, but we can discuss of course.
- is a big discussion, which we should rather have in person. 2. is what Andreas wants, and I agree with him in particular because it'll enable even deeper improvements like 3.
maybe this will make a positive difference on windows also.
I hope so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I wasn't aware of this at all. What is inconveniently slow? Can you make an issue on the lamindb repo and we discuss there?
Actually I am pretty sure we discussed this once or twice but never did anything. I will make an issue with measured loading times vs Linux. Also maybe on Helmholtz cluster (I believe they have pretty standard setup for compute clusters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw import times on Helmholtz cluster are pretty abysmal at times.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #921 +/- ##
==========================================
+ Coverage 83.60% 83.68% +0.07%
==========================================
Files 43 43
Lines 3435 3426 -9
==========================================
- Hits 2872 2867 -5
+ Misses 563 559 -4 ☔ View full report in Codecov by Sentry. |
See:
lnschema-core
intolamindb
lamindb#2309