Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏗️ Integrate lnschema-core into lamindb #921

Merged
merged 32 commits into from
Jan 2, 2025
Merged

Conversation

falexwolf
Copy link
Member

@falexwolf falexwolf commented Dec 31, 2024

@falexwolf falexwolf changed the title 🚚 Prepare migration away from lnschema-core 🚚 Prepare migration away from lnschema-core Dec 31, 2024
@falexwolf falexwolf changed the title 🚚 Prepare migration away from lnschema-core 🏗️ Integrate lnschema-core into lamindb Dec 31, 2024
@@ -42,8 +42,8 @@ def get_schema_module_name(schema_name, raise_import_error: bool = True) -> str


def register_storage_in_instance(ssettings: StorageSettings):
from lnschema_core.models import Storage
from lnschema_core.users import current_user_id
from lamindb.models import Storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this is much slower, no?

Copy link
Member Author

@falexwolf falexwolf Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be much slower?

(1) Do you mean because of a few additional dependencies that lamindb has over lamindb_setup? If that's the reason, I'm pretty sure it should be negligible. If it's not negligible we ought to think through optimizing import time on lamindb.

(2) If something else: 🤔 -- I can only speculate that you're having the auto-connect behavior in mind. Also here there is no additional slow-ness because you need to have django setup (connection established, if you will) also in the case of lnschema_core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean 1), yes. lamindb is much much heavier in terms of imports, and I don’t really think it can be improved. We definitely need to test the import times. In my opinion significant degradation of performance is not justified by more convenient code organization (and I am not even sure about convenience here). lamin-cli is already pretty slow and we shouldn’t make it even slower I think.

In summary I think we need to check the performance carefully before going forward with this.

Copy link
Member Author

@falexwolf falexwolf Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I don't believe it, let's assume that lamindb is so much heavier that it leads to noteworthy degradation of import times; but even if noteworthy, it's hard to believe that they'd ever degrade more than 300ms.

Now consider that from lamindb.models import Storage and all similar statements is a dynamic import in lamindb-setup. It has to be because django needs to be setup before it runs through.

All instances in which the dynamic import is being run are cases in which run time is a few seconds. The worst case scenario of 300ms then isn't nice, but is also no big deal.

I've played with this for the past hours and I don't notice any slow down. If anything, I notice that several parts of the UX are faster (because I disentangled some recursive imports).

more convenient code organization (and I am not even sure about convenience here)

This is not at all about convenience. It's about:

  1. achieving a mono repo
  2. https://github.com/laminlabs/pfizer-lamin-usage/issues/75
  3. no longer depending on an external Django (https://laminlabs.slack.com/archives/C04FPE8V01W/p1735587801694429)

Points 2. and 3. are enabled through 1. -- I imagine that 1. will enable more fundamental code organization improvements like 2. and 3. beyond easing operational complications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not that optimistic about performance. I know that it is fine on Linux (and probably macOS), but on windows it is already inconveniently slow and probably will be even slower with these changes. On windows it looks like additional imports make a difference.

Re part 2, I don’t entirely get why 1 and 2 are really needed, but we can discuss of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I disentangled some recursive imports

maybe this will make a positive difference on windows also.

Copy link
Member

@Koncopd Koncopd Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also systems with pretty slow import of modules, like Helmholtz compute cluster, we need to have them in mind also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not that optimistic about performance. I know that it is fine on Linux (and probably macOS), but on windows it is already inconveniently slow and probably will be even slower with these changes. On windows it looks like additional imports make a difference.

Interesting, I wasn't aware of this at all. What is inconveniently slow? Can you make an issue on the lamindb repo and we discuss there?

Re part 2, I don’t entirely get why 1 and 2 are really needed, but we can discuss of course.

  1. is a big discussion, which we should rather have in person. 2. is what Andreas wants, and I agree with him in particular because it'll enable even deeper improvements like 3.

maybe this will make a positive difference on windows also.

I hope so!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I wasn't aware of this at all. What is inconveniently slow? Can you make an issue on the lamindb repo and we discuss there?

Actually I am pretty sure we discussed this once or twice but never did anything. I will make an issue with measured loading times vs Linux. Also maybe on Helmholtz cluster (I believe they have pretty standard setup for compute clusters).

Copy link
Member

@Koncopd Koncopd Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw import times on Helmholtz cluster are pretty abysmal at times.

Copy link

codecov bot commented Jan 1, 2025

Codecov Report

Attention: Patch coverage is 78.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 83.68%. Comparing base (fada60e) to head (a96cc7c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
lamindb_setup/_connect_instance.py 53.84% 6 Missing ⚠️
lamindb_setup/_migrate.py 60.00% 2 Missing ⚠️
lamindb_setup/_init_instance.py 90.00% 1 Missing ⚠️
lamindb_setup/_schema_metadata.py 80.00% 1 Missing ⚠️
lamindb_setup/core/_settings_user.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
+ Coverage   83.60%   83.68%   +0.07%     
==========================================
  Files          43       43              
  Lines        3435     3426       -9     
==========================================
- Hits         2872     2867       -5     
+ Misses        563      559       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jan 2, 2025

@github-actions github-actions bot temporarily deployed to pull request January 2, 2025 01:12 Inactive
@falexwolf falexwolf merged commit 54cc927 into main Jan 2, 2025
12 of 13 checks passed
@falexwolf falexwolf deleted the migrate-lnschema-core branch January 2, 2025 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants