Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree contract #250

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ dependencies = [
"grimp>=3.2",
"tomli>=1.2.1; python_version < '3.11'",
"typing-extensions>=3.10.0.0",
"networkx>=3",
Copy link
Owner

@seddonym seddonym Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to include networkx as a dependency, it used to be one, and we worked hard to remove it from Grimp (the underlying graph algorithm library, also maintained by me) so that Import Linter had fewer dependencies.

I would probably be more open to us copying the relevant simple cycles code from networkx - with a view to eventually moving the algorithm into Grimp (where it could use faster Rust graph libraries).

That said, it's not an unreasonable approach (though probably suboptimal performance-wise). One option you could consider is for you to create a separate Python library that defines this import linter contract - users could install that (along with networkx) if they want to use it. They could use the custom contract types mechanism to register the tree contract type. We might then be able to incorporate the work into Import Linter in due course.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only use simple_cycles algorithm from networkx it may be a good idea to just create such a function than creating an additional dependency. On the other hand, we delegate the maintenance of the functionality to a trusted and supported package. That was my thinking before I read this comment.

I guess the dependency has been already discussed and the decision is to not use networkx and implement graph-related functionalities from a scratch in Grimp. I will follow the concept.

So the next move will be to implement simple_cycles functionality in Grimp using Python. After some stress tests we will have data about potential algorithm optimization needs. I like the idea of using Rust to optimize. If the feature will not be useful for a huge code bases due to significant computational time, the Rust implementation seems like a really good approach to fix the problem.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that's where we want to get to.

In the interests of not wasting your time unnecessarily, I'm actively working with others on a Rust rewrite which isn't far off. So if you did want to contribute to Grimp on this, it's probably better to hold off until that lands (or just wait for me to implement). If you'd like to see this feature sooner, as I said you could experiment with releasing it as a separate library with a custom contract.

]
classifiers = [
"Development Status :: 5 - Production/Stable",
Expand Down
86 changes: 86 additions & 0 deletions src/importlinter/contracts/tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
from grimp import ImportGraph
from importlinter.domain.contract import Contract, ContractCheck
import networkx as nx # type: ignore[import-untyped]
from importlinter.application import output
from importlinter.domain import fields


class TreeContract(Contract):
"""
A contract that checks whether the dependency graph of modules forms a tree structure.

This contract verifies that the directed graph of module imports does not contain any cycles.
A cycle in the graph implies that there is a circular dependency between modules, which
violates the tree structure requirement.

The `check` method constructs a directed graph using NetworkX, where nodes represent modules
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation detail, probably doesn't belong in the docstring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. To be removed.

and edges represent import relationships. It then uses NetworkX's `simple_cycles` function
to detect any cycles in the graph.

Configuration options:
- consider_package_dependencies: Whether to consider cyclic dependencies between packages.
"True" or "true" will be treated as True. (Optional.)
"""

type_name = "tree"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had envisaged 'modular' being a good description of this structure, but the tree idea is an interesting take that I had not considered. I'd like to think a bit about whether we call it this, but maybe it's better than 'modular'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Tree" indicates directly what structure should modules/packages form in order to fill the contact. But I don't mind changing the name to "Modular". Discussion can be continued. Since it is just an easy to change naming, we can postpone the decision up until the day of the potential merge.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!


consider_package_dependencies = fields.BooleanField(required=False, default=True)

_CYCLES_METADATA_KEY = "cycles"

def check(self, graph: ImportGraph, verbose: bool) -> ContractCheck:
nx_graph = nx.DiGraph()

for importer_module in graph.modules:
importer_module_family = [importer_module] + (
self._get_module_ancestors(module=importer_module)
if self._consider_package_dependencies
else []
)
imported_modules = graph.find_modules_directly_imported_by(module=importer_module)

for importer_module_family_member in importer_module_family:
nx_graph.add_node(node_for_adding=importer_module_family_member)

for imported_module in imported_modules:
imported_module_family = [imported_module] + (
self._get_module_ancestors(module=imported_module)
if self._consider_package_dependencies
else []
)

for imported_module_family_member in imported_module_family:
if importer_module_family_member == imported_module_family_member:
continue

if not nx_graph.has_node(n=imported_module_family_member):
nx_graph.add_node(node_for_adding=imported_module_family_member)

nx_graph.add_edge(
u_of_edge=importer_module_family_member,
v_of_edge=imported_module_family_member,
)

cycles = list(nx.simple_cycles(G=nx_graph))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth running this on a real-life Python project that doesn't follow a tree structure. I suspect this may return a huge number of results, and possibly never complete. Perhaps we should add some kind of break after a certain number of cycles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I will run stress tests on some huge Python code bases.

return ContractCheck(kept=len(cycles) == 0, metadata={self._CYCLES_METADATA_KEY: cycles})

@property
def _consider_package_dependencies(self) -> bool:
return str(self.consider_package_dependencies).lower() == "true"

@staticmethod
def _get_module_ancestors(module: str) -> list[str]:
module_ancestors = []
module_split = module.split(".")
del module_split[-1]

while module_split:
module_ancestors.append(".".join(module_split))
del module_split[-1]

return module_ancestors

def render_broken_contract(self, check: ContractCheck) -> None:
for cycle in check.metadata.get(self._CYCLES_METADATA_KEY, []):
output.print_error(text=f"Cycle found: {cycle}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interesting to see what cycles in a real repo look like, and if this is a helpful way to present them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running some real-life tests we can take a look at the log and conclude how to modify the logging to make the message easier to understand.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

output.new_line()
90 changes: 90 additions & 0 deletions tests/unit/contracts/test_tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from unittest.mock import MagicMock, patch
from grimp.adaptors.graph import ImportGraph
from importlinter.contracts.tree import TreeContract
from importlinter.domain.contract import ContractCheck


def _build_contract() -> TreeContract:
return TreeContract(name="test", session_options=dict(), contract_options=dict())


class TestTreeContractCheck:
"""
Naming convention for modules:

<level_in_the_directory_tree>_<hierarchy_lvl_in_the_package>

"""

def test_tree_indeed(self) -> None:
# Given
graph = ImportGraph()

for module in (
"1_a",
"1_a.2_a",
"1_a.2_b",
"1_b",
"1_b.2_a",
"1_b.2_b",
"1_c",
"1_c.2_a",
"1_c.2_b",
):
graph.add_module(module)

graph.add_import(importer="1_a.2_a", imported="1_b.2_b")
graph.add_import(importer="1_a.2_a", imported="1_a.2_b")
graph.add_import(importer="1_a.2_b", imported="1_c.2_a")
graph.add_import(importer="1_b.2_a", imported="1_c.2_b")
contract = _build_contract()
# When
contract_check = contract.check(graph=graph, verbose=False)
# Then
assert contract_check.kept

def test_not_a_tree(self) -> None:
# Given
graph = ImportGraph()

for module in ("1_a", "1_a.2_a", "1_a.2_b", "1_b", "1_b.2_a", "1_b.2_b"):
graph.add_module(module)

graph.add_import(importer="1_a.2_a", imported="1_b.2_b")
graph.add_import(importer="1_b.2_a", imported="1_a.2_b")
contract = _build_contract()
# When
contract_check = contract.check(graph=graph, verbose=False)
# Then
assert not contract_check.kept

def test_do_not_consider_package_dependencies(self) -> None:
pass # TODO


class TestTreeContractRenderBrokenContract:
@patch("importlinter.contracts.tree.output.print_error")
@patch("importlinter.contracts.tree.output.new_line")
def test_no_cycles(self, new_line_mock: MagicMock, print_error_mock: MagicMock) -> None:
# Given
contract = _build_contract()
contract_check = ContractCheck(kept=True)
# When
contract.render_broken_contract(check=contract_check)
# Then
print_error_mock.assert_not_called()
new_line_mock.assert_not_called()

@patch("importlinter.contracts.tree.output.print_error")
@patch("importlinter.contracts.tree.output.new_line")
def test_cycle_exists(self, new_line_mock: MagicMock, print_error_mock: MagicMock) -> None:
# Given
contract = _build_contract()
contract_check = ContractCheck(
kept=True, metadata={contract._CYCLES_METADATA_KEY: [["1_b", "1_a"]]}
)
# When
contract.render_broken_contract(check=contract_check)
# Then
print_error_mock.assert_called_once_with(text="Cycle found: ['1_b', '1_a']")
new_line_mock.assert_called_once_with()
Loading