-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree contract #250
base: master
Are you sure you want to change the base?
Tree contract #250
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
from grimp import ImportGraph | ||
from importlinter.domain.contract import Contract, ContractCheck | ||
import networkx as nx # type: ignore[import-untyped] | ||
from importlinter.application import output | ||
from importlinter.domain import fields | ||
|
||
|
||
class TreeContract(Contract): | ||
""" | ||
A contract that checks whether the dependency graph of modules forms a tree structure. | ||
|
||
This contract verifies that the directed graph of module imports does not contain any cycles. | ||
A cycle in the graph implies that there is a circular dependency between modules, which | ||
violates the tree structure requirement. | ||
|
||
The `check` method constructs a directed graph using NetworkX, where nodes represent modules | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an implementation detail, probably doesn't belong in the docstring. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. To be removed. |
||
and edges represent import relationships. It then uses NetworkX's `simple_cycles` function | ||
to detect any cycles in the graph. | ||
|
||
Configuration options: | ||
- consider_package_dependencies: Whether to consider cyclic dependencies between packages. | ||
"True" or "true" will be treated as True. (Optional.) | ||
""" | ||
|
||
type_name = "tree" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had envisaged 'modular' being a good description of this structure, but the tree idea is an interesting take that I had not considered. I'd like to think a bit about whether we call it this, but maybe it's better than 'modular'. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think "Tree" indicates directly what structure should modules/packages form in order to fill the contact. But I don't mind changing the name to "Modular". Discussion can be continued. Since it is just an easy to change naming, we can postpone the decision up until the day of the potential merge. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed! |
||
|
||
consider_package_dependencies = fields.BooleanField(required=False, default=True) | ||
|
||
_CYCLES_METADATA_KEY = "cycles" | ||
|
||
def check(self, graph: ImportGraph, verbose: bool) -> ContractCheck: | ||
nx_graph = nx.DiGraph() | ||
|
||
for importer_module in graph.modules: | ||
importer_module_family = [importer_module] + ( | ||
self._get_module_ancestors(module=importer_module) | ||
if self._consider_package_dependencies | ||
else [] | ||
) | ||
imported_modules = graph.find_modules_directly_imported_by(module=importer_module) | ||
|
||
for importer_module_family_member in importer_module_family: | ||
nx_graph.add_node(node_for_adding=importer_module_family_member) | ||
|
||
for imported_module in imported_modules: | ||
imported_module_family = [imported_module] + ( | ||
self._get_module_ancestors(module=imported_module) | ||
if self._consider_package_dependencies | ||
else [] | ||
) | ||
|
||
for imported_module_family_member in imported_module_family: | ||
if importer_module_family_member == imported_module_family_member: | ||
continue | ||
|
||
if not nx_graph.has_node(n=imported_module_family_member): | ||
nx_graph.add_node(node_for_adding=imported_module_family_member) | ||
|
||
nx_graph.add_edge( | ||
u_of_edge=importer_module_family_member, | ||
v_of_edge=imported_module_family_member, | ||
) | ||
|
||
cycles = list(nx.simple_cycles(G=nx_graph)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be worth running this on a real-life Python project that doesn't follow a tree structure. I suspect this may return a huge number of results, and possibly never complete. Perhaps we should add some kind of break after a certain number of cycles. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I will run stress tests on some huge Python code bases. |
||
return ContractCheck(kept=len(cycles) == 0, metadata={self._CYCLES_METADATA_KEY: cycles}) | ||
|
||
@property | ||
def _consider_package_dependencies(self) -> bool: | ||
return str(self.consider_package_dependencies).lower() == "true" | ||
|
||
@staticmethod | ||
def _get_module_ancestors(module: str) -> list[str]: | ||
module_ancestors = [] | ||
module_split = module.split(".") | ||
del module_split[-1] | ||
|
||
while module_split: | ||
module_ancestors.append(".".join(module_split)) | ||
del module_split[-1] | ||
|
||
return module_ancestors | ||
|
||
def render_broken_contract(self, check: ContractCheck) -> None: | ||
for cycle in check.metadata.get(self._CYCLES_METADATA_KEY, []): | ||
output.print_error(text=f"Cycle found: {cycle}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be interesting to see what cycles in a real repo look like, and if this is a helpful way to present them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After running some real-life tests we can take a look at the log and conclude how to modify the logging to make the message easier to understand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea! |
||
output.new_line() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
from unittest.mock import MagicMock, patch | ||
from grimp.adaptors.graph import ImportGraph | ||
from importlinter.contracts.tree import TreeContract | ||
from importlinter.domain.contract import ContractCheck | ||
|
||
|
||
def _build_contract() -> TreeContract: | ||
return TreeContract(name="test", session_options=dict(), contract_options=dict()) | ||
|
||
|
||
class TestTreeContractCheck: | ||
""" | ||
Naming convention for modules: | ||
|
||
<level_in_the_directory_tree>_<hierarchy_lvl_in_the_package> | ||
|
||
""" | ||
|
||
def test_tree_indeed(self) -> None: | ||
# Given | ||
graph = ImportGraph() | ||
|
||
for module in ( | ||
"1_a", | ||
"1_a.2_a", | ||
"1_a.2_b", | ||
"1_b", | ||
"1_b.2_a", | ||
"1_b.2_b", | ||
"1_c", | ||
"1_c.2_a", | ||
"1_c.2_b", | ||
): | ||
graph.add_module(module) | ||
|
||
graph.add_import(importer="1_a.2_a", imported="1_b.2_b") | ||
graph.add_import(importer="1_a.2_a", imported="1_a.2_b") | ||
graph.add_import(importer="1_a.2_b", imported="1_c.2_a") | ||
graph.add_import(importer="1_b.2_a", imported="1_c.2_b") | ||
contract = _build_contract() | ||
# When | ||
contract_check = contract.check(graph=graph, verbose=False) | ||
# Then | ||
assert contract_check.kept | ||
|
||
def test_not_a_tree(self) -> None: | ||
# Given | ||
graph = ImportGraph() | ||
|
||
for module in ("1_a", "1_a.2_a", "1_a.2_b", "1_b", "1_b.2_a", "1_b.2_b"): | ||
graph.add_module(module) | ||
|
||
graph.add_import(importer="1_a.2_a", imported="1_b.2_b") | ||
graph.add_import(importer="1_b.2_a", imported="1_a.2_b") | ||
contract = _build_contract() | ||
# When | ||
contract_check = contract.check(graph=graph, verbose=False) | ||
# Then | ||
assert not contract_check.kept | ||
|
||
def test_do_not_consider_package_dependencies(self) -> None: | ||
pass # TODO | ||
|
||
|
||
class TestTreeContractRenderBrokenContract: | ||
@patch("importlinter.contracts.tree.output.print_error") | ||
@patch("importlinter.contracts.tree.output.new_line") | ||
def test_no_cycles(self, new_line_mock: MagicMock, print_error_mock: MagicMock) -> None: | ||
# Given | ||
contract = _build_contract() | ||
contract_check = ContractCheck(kept=True) | ||
# When | ||
contract.render_broken_contract(check=contract_check) | ||
# Then | ||
print_error_mock.assert_not_called() | ||
new_line_mock.assert_not_called() | ||
|
||
@patch("importlinter.contracts.tree.output.print_error") | ||
@patch("importlinter.contracts.tree.output.new_line") | ||
def test_cycle_exists(self, new_line_mock: MagicMock, print_error_mock: MagicMock) -> None: | ||
# Given | ||
contract = _build_contract() | ||
contract_check = ContractCheck( | ||
kept=True, metadata={contract._CYCLES_METADATA_KEY: [["1_b", "1_a"]]} | ||
) | ||
# When | ||
contract.render_broken_contract(check=contract_check) | ||
# Then | ||
print_error_mock.assert_called_once_with(text="Cycle found: ['1_b', '1_a']") | ||
new_line_mock.assert_called_once_with() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reluctant to include networkx as a dependency, it used to be one, and we worked hard to remove it from Grimp (the underlying graph algorithm library, also maintained by me) so that Import Linter had fewer dependencies.
I would probably be more open to us copying the relevant simple cycles code from networkx - with a view to eventually moving the algorithm into Grimp (where it could use faster Rust graph libraries).
That said, it's not an unreasonable approach (though probably suboptimal performance-wise). One option you could consider is for you to create a separate Python library that defines this import linter contract - users could install that (along with networkx) if they want to use it. They could use the custom contract types mechanism to register the tree contract type. We might then be able to incorporate the work into Import Linter in due course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only use
simple_cycles
algorithm fromnetworkx
it may be a good idea to just create such a function than creating an additional dependency. On the other hand, we delegate the maintenance of the functionality to a trusted and supported package. That was my thinking before I read this comment.I guess the dependency has been already discussed and the decision is to not use
networkx
and implement graph-related functionalities from a scratch inGrimp
. I will follow the concept.So the next move will be to implement
simple_cycles
functionality inGrimp
using Python. After some stress tests we will have data about potential algorithm optimization needs. I like the idea of using Rust to optimize. If the feature will not be useful for a huge code bases due to significant computational time, the Rust implementation seems like a really good approach to fix the problem.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that's where we want to get to.
In the interests of not wasting your time unnecessarily, I'm actively working with others on a Rust rewrite which isn't far off. So if you did want to contribute to Grimp on this, it's probably better to hold off until that lands (or just wait for me to implement). If you'd like to see this feature sooner, as I said you could experiment with releasing it as a separate library with a custom contract.