Simplify api hierarchy #712

WaylonWalker · 2021-03-05T13:52:58Z

Description

When using kedro as a user it can sometimes be hard to remember where even the most common kedro features are located.

Context

This will lower the barrier to entry to using kedro, and make it easier to use without needing to use the docs for even the most simple task such as "How do I make a node".

Possible Implementation

To simplify the barrier to entry into kedro import the most commonly used features right into main.

WARNING - for the person browsing through, this is not the real api, this is simply a suggestion.

import kedro as kd

my_pipeline = kd.Pipeline([kd.node(create_cars, None, "cars")])

Alternatively users can also use the same api as so.

from kedro import Pipeline, node

my_pipeline = Pipeline([node(create_cars, None, "cars")])

At a minimum I feel that the very core features that every kedro user needs should be at the top of the api.

Pipeline
node

The next priority would be those used by hooks authors.

KedroSession
ConfigLoader

Finally the full scope of what should be considered to be made easy to reach should be every occurance of from kedro import x in the project template.

The text was updated successfully, but these errors were encountered:

limdauto · 2021-03-06T19:47:41Z

I think this is a great idea.

Just thinking out loud here: one thing that will land in 0.17.2 is the ability to import user-defined pipelines as follows:

from kedro.framework.project import pipelines

This pipelines reference will be bounded with the pipelines value defined by users in their project at run time. Hopefully this will be the way forward to access other user-defined values such as catalog and config_loader as well (we are still looking for use cases to support making this move wholesale, so suggestion is very welcome). We do have a big use case that needs pipelines to be importable though, rather than accessing through the context as it is at the moment.

So essentially there will 2 things you can to import from kedro:

Things you define in your project, e.g. settings, pipelines
Library components define by kedro, e.g. Pipeline, node(), pipeline(), etc.

For the first one, importing from kedro.framework.project makes sense because it reads as "importing pipelines from my project". I wonder if we can come up with a similar API for the second one?

WaylonWalker · 2021-03-06T22:57:41Z

Do other frameworks do this? I feel that its a bit confusing to get project defined objects from kedro rather than from my_project import settings, pipelines What is the benefit of importing from kedro rather than from my_project. I do not have any familiarity with a framwork that behaves this way, so it might be more common than I realize.

mzjp2 · 2021-03-07T12:12:52Z

Do other frameworks do this?

Django is a (pretty big) example that I've seen work really well. You define your settings in project/setting.py and then use them in the rest of your app by doing:

# project/settings.py
MY_SETTING = 1 if os.environ["env"] == "production" else 0

# project/any_file.py
from django.conf import settings

if settings.MY_SETTING == 1:
  blah

with this justification (that I've seen work well in practice):

Also note that your code should not import from either global_settings or your own settings file. django.conf.settings abstracts the concepts of default settings and site-specific settings; it presents a single interface. It also decouples the code that uses settings from the location of your settings

limdauto · 2021-03-07T22:02:54Z

The main reason why we have to provide these imports on the kedro namespace and not the user-defined namespace is because we have to refer to the user-defined objects in the framework as well. Kedro doesn't know which value my_project will be ahead of time.

As @mzjp2 pointed out, this approach is very similar to Django's and dynaconf's.

Pinning setuptools's upper bound will cause conflict with dependencies in user's projects that don't pin it. For example, consider a user's project which has the following in their requirements.txt, which is very common: ``` ipython kedro ``` Ipython also specifies setuptools as a dependency without an upper bound and it is always listed before Kedro in alphabetical order. When pip tries to resolve dependency, at the moment, since it only performs conflicts resolution in a "first-encounter wins" manner, it will register latest setuptools from IPython as a dependency, which will conflict with Kedro's pinned version. If user moves IPython below Kedro, it should work correctly again. So to be absolutely certain, this commit unpins setuptools.

WaylonWalker · 2021-05-16T14:16:33Z

This is a very interesting approach that is a mix of mind-blowing and confusing to me, mostly because I have not used a framework that does this and not something I expect. It's definitely a big mental shift to import something that I created starting from from kedro..

antonymilne · 2022-07-01T08:22:57Z

I think we should make this happen. pipeline (note lowercase P) and node are now the recommended user-facing API, and we can make this recommendation clearer by exposing these (but not the underlying classes) as a higher level namespace.

As @WaylonWalker says, we don't have to work everything out now, but pipeline and node would be a great starting place as they seem very stable. There's also a suggestion to alias kedro.extras.extensions.ipython at a higher level, like kedro.ipython.

Overall the only question in my mind here is what exactly the structure should be:

kedro.node, kedro.pipeline as per the initial suggestion
kedro.something.node, kedro.something.pipeline as per @limdauto's idea that imports from the kedro library would contrast with imports from kedro.framework.project (maybe kedro.library.node, kedro.library.pipeline? I don't like that though)

1 seems easiest for users to remember, and I don't imagine us ending up with too many components at that top level, but maybe 2 is more correct somehow. I don't know - thoughts very welcome! Maybe @deepyaman @noklam?

Given this issue has an unprecedented number of positive reactions, it's clearly popular and so it would be nice if we could move forward with it.

noklam · 2022-07-01T10:04:03Z

@AntonyMilneQB I think there are definitely opportunities to simplify the kedro, I agree with kedro.extras.extensions.ipython together with the line magic need to be simplified.

I am not sure though if the original issue suggests changing our folder structure. I thought it can be done with a kedro/__init__.py for the most common kedro object. This comes with the drawback that importing any sub-module will also import whatever defined the kedro/__init__.py.

This is more inline with from kedro import node (What the issue suggests) since having a kedro.node module will still resulted in from kedro.node import node.

We should also do something with __all__, as we don't really want people to import Node but node only. So even when the user do from kedro.pipeline.node import *.

In reality, I don't know how often people actually need to import these stuff, but the most useful one for me is inside an interactive environment like Jupyter, so I can do kedro. + Tab to discover what I need to import. There are some comments above mention catalog too, I think we can also make auto-discovery happens, user should be able to do catalog. + Tab and see the datasets without checking catalog.yml, this will be valuable to interactive workflow.

limdauto · 2022-07-01T10:20:54Z

I think the autosuggestion of dataset names in the catalog in an interactive workflow is already available. I think @datajoely added it iirc.

Re API: I have been doing a lot of Django lately and they expose some commonly used functions through a shortcuts namespace: https://docs.djangoproject.com/en/4.0/topics/http/shortcuts/ -- I personally don't like it too much because I hardly ever type in the import statement myself anymore (always the IDE doing it), but could be an option.

noklam · 2022-07-01T10:41:38Z

The last time I tried I don't remember seeing the datasets shortcuts, maybe I did something wrong there.

I think there are some gaps between using IDE and Jupyter/Databricks, it doesn't come with things like IntelliSense / Pylance that guess the import for you.

limdauto · 2022-07-01T10:54:00Z

A bit off topic but @noklam if you are interested in IDE/Notebook tooling I had this project that I haven't found the time to develop further: https://github.com/Kedro-Zero-to-Hero/kedro-lsp -- if you are interested, pls let me know.

datajoely · 2022-07-01T10:55:11Z

My view is that LSP is the right solution for all of this :)

antonymilne · 2022-07-01T12:01:01Z

@noklam - sorry I wasn't clear. What I meant by kedro.node was exactly from kedro import node rather than a new kedro.node package. I don't think we should be changing any of our folder structure here (that would be a huge breaking change for a start), just aliasing things so you can import from the top level.

@limdauto that's interesting about shortcuts. I don't think I like the idea of kedro.shortcuts, but I also very rarely type in the import myself due to IDE autocompletion. Do you think it would be bad to make these things available at top level like from kedro import node? Presumably Django decided against it for some reason in favour of a dedicated shortcuts namespace, but they're a much bigger package.

noklam · 2022-07-01T12:18:47Z

@limdauto This is something that I have no experience but it looks very interesting! Any recommended resource for a complete beginner :) I'll try to look at it sometimes

limdauto · 2022-07-02T15:22:34Z

@noklam I think kedro-lsp is a PoC so it's dead simple and therefore could be a good place to start. Apart from that, I think this is a good guide to language server protocol: https://microsoft.github.io/language-server-protocol/#:~:text=A%20Language%20Server%20is%20meant,servers%20and%20development%20tools%20communicate.

In the end of the day it's a JSON-RPC server that responds to requests from an IDE.

merelcht · 2022-07-21T12:52:58Z

Notes from Technical Design session 20/7:

Discussion

This issue was discussed during a Technical Design session and the following points where brought up:

The idea of moving our imports to higher level or even top level for the core Kedro components, sounds like a good idea, but it depends on a couple of things:

If you do from kedro import ... , would it import everything under this path? This wouldn't be desirable so we need to investigate if this is the case.
Investigate which other packages do top-level imports and how they handle it.

Instead of moving imports all the way to top-level we could have something like a "shortcuts" or "library" section, which is what Django is doing.

Why is Django doing it this way?
How deep are the imports? Is it worth changing it? E.g. if an import currently is from kedro.pipelines import Pipeline making it from kedro.shortcuts import Pipeline wouldn't improve anything.
If we decide to move components to higher level imports which components would that be and which components from Kedro would we want to be importable from the top level? One good candidate for change is the line magic imports. These are very long and could become from kedro.ipython
Is this just for the interactive workflow, (IDE can autocomplete), if so can we put these in ipython extension?

noklam · 2022-08-26T18:07:13Z

If you do from kedro import ... , would it import everything under this path? This wouldn't be desirable so we need to investigate if this is the case.

I am pretty sure this is the case and it's just how Python works, and there isn't much we can do here.

For example if you have a __init__.py which just has 1 line.

import kedro.xxx.yyy.zzz

Everytime you do import kedro, it will also import kedro.xxx and kedro.xxx.yyy and kedro.xxx.yyy.zzz.
Equally when people do from xxx.yyy.zzz import something, it doesn't save any memory magically as the top-level library is imported, it just import it into a local namespace and optionally have an alias.

Direct Import

From import

deepyaman · 2023-03-27T14:06:46Z

Vomiting some initial thoughts (based on what I remember):

A lot of libraries have a top-level api module in the package that defines the public API (e.g. from kedro.api import pipeline).
Some larger libraries take this further by having api modules for each submodule. Then, the higher-level APIs can import * from each of the submodules.
- Personal preference: I like when the imports from each submodule are explicit, instead of using import *, because that makes it really difficult to track where some things are actually defined IMO.
There are also libraries like https://pypi.org/project/atpublic/ to keep __all__ more maintainable, though I haven't used it (and honestly only seen it used in one place).

astrojuanlu · 2023-09-25T15:04:58Z

Importing everything in kedro/__init__.py has the potential of making initialization even slower #1476

And, as mentioned in #712 (comment), some imports are already only 1 level deep.

At the moment (2.5 years after this issue was opened), pipeline code does the following:

from kedro.pipeline import Pipeline, node, pipeline

from .nodes import create_model_input_table, preprocess_companies, preprocess_shuttles

I don't think that's excessively verbose, and changing it to

from kedro.api import pipeline, node

certainly does not seem like a huge life saver compared to the churn it would cause.

The issue of tab-completion confusing users is being tracked in #2805.

I am voting to close this one.

astrojuanlu · 2023-10-23T13:14:10Z

We decided not to do this for now, we are open to reconsidering this in the future. If you'd like to see this happening, please drop a comment here.

WaylonWalker added the Issue: Feature Request New feature or improvement to existing feature label Mar 5, 2021

lorenabalan changed the title ~~Simplify api heirarchy~~ Simplify api hierarchy Apr 19, 2021

lorenabalan added the pinned Issue shouldn't be closed by stale bot label Apr 21, 2021

lorenabalan mentioned this issue Jun 24, 2021

Alternative to accessing attributes of KedroContext after 0.18.0 #807

Closed

Galileo-Galilei mentioned this issue Jul 13, 2021

Universal Kedro deployment (Part 1) - Separate external and applicative configuration to make Kedro cloud native #770

Open

lorenabalan removed the pinned Issue shouldn't be closed by stale bot label Mar 2, 2022

merelcht added the Community Issue/PR opened by the open-source community label Mar 7, 2022

merelcht added this to Kedro Framework Mar 7, 2022

merelcht removed the Community Issue/PR opened by the open-source community label May 16, 2022

merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Jul 6, 2022

antonymilne mentioned this issue Aug 5, 2022

Alias IPython extension #1763

Closed

AhdraMeraliQB added the Community Issue/PR opened by the open-source community label Mar 20, 2023

AhdraMeraliQB mentioned this issue Mar 20, 2023

Investigate simplifying the api hierarchy #2441

Closed

antonymilne mentioned this issue Mar 24, 2023

Narrow down kedro-datasets public API kedro-org/kedro-plugins#139

Open

merelcht removed the Community Issue/PR opened by the open-source community label Apr 4, 2023

noklam mentioned this issue Jun 20, 2023

Lazy sub-module import #2702

Open

astrojuanlu mentioned this issue Jun 23, 2023

Rectify "modular pipelines" terminology #2723

Open

noklam mentioned this issue Jun 26, 2023

Pipeline builder #2726

Closed

This was referenced Jul 17, 2023

pipeline.pipeline confuses IDE autocompletion #2805

Open

Create IDE plugins for major editors #2821

Closed

astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2023

github-project-automation bot moved this to Done in Kedro Framework Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify api hierarchy #712

Simplify api hierarchy #712

WaylonWalker commented Mar 5, 2021

limdauto commented Mar 6, 2021 •

edited

Loading

WaylonWalker commented Mar 6, 2021

mzjp2 commented Mar 7, 2021 •

edited

Loading

limdauto commented Mar 7, 2021

WaylonWalker commented May 16, 2021

antonymilne commented Jul 1, 2022 •

edited

Loading

noklam commented Jul 1, 2022 •

edited

Loading

limdauto commented Jul 1, 2022

noklam commented Jul 1, 2022

limdauto commented Jul 1, 2022

datajoely commented Jul 1, 2022

antonymilne commented Jul 1, 2022

noklam commented Jul 1, 2022

limdauto commented Jul 2, 2022

merelcht commented Jul 21, 2022 •

edited by antonymilne

Loading

noklam commented Aug 26, 2022 •

edited

Loading

deepyaman commented Mar 27, 2023

astrojuanlu commented Sep 25, 2023

astrojuanlu commented Oct 23, 2023

Simplify api hierarchy #712

Simplify api hierarchy #712

Comments

WaylonWalker commented Mar 5, 2021

Description

Context

Possible Implementation

limdauto commented Mar 6, 2021 • edited Loading

WaylonWalker commented Mar 6, 2021

mzjp2 commented Mar 7, 2021 • edited Loading

limdauto commented Mar 7, 2021

WaylonWalker commented May 16, 2021

antonymilne commented Jul 1, 2022 • edited Loading

noklam commented Jul 1, 2022 • edited Loading

limdauto commented Jul 1, 2022

noklam commented Jul 1, 2022

limdauto commented Jul 1, 2022

datajoely commented Jul 1, 2022

antonymilne commented Jul 1, 2022

noklam commented Jul 1, 2022

limdauto commented Jul 2, 2022

merelcht commented Jul 21, 2022 • edited by antonymilne Loading

Discussion

noklam commented Aug 26, 2022 • edited Loading

Direct Import

From import

deepyaman commented Mar 27, 2023

astrojuanlu commented Sep 25, 2023

astrojuanlu commented Oct 23, 2023

limdauto commented Mar 6, 2021 •

edited

Loading

mzjp2 commented Mar 7, 2021 •

edited

Loading

antonymilne commented Jul 1, 2022 •

edited

Loading

noklam commented Jul 1, 2022 •

edited

Loading

merelcht commented Jul 21, 2022 •

edited by antonymilne

Loading

noklam commented Aug 26, 2022 •

edited

Loading