Do not treat missing ancestor as input mismatch #14

jl-wynen · 2024-09-30T14:04:45Z

If new_branch has no ancestors, it should be safe to insert it and use the ancestors of self.

@SimonHeybrock test_setitem_fails_when_grandparents_change is failing. I think this test should pass. But in __setitem__, we get intersection_nodes = set() because

        if branch in self.graph:
            graph = _remove_ancestors(self.graph, branch)
            graph.nodes[branch].clear()

removes all ancestors but 'c' such that

intersection_nodes = set(graph.nodes) & set(new_branch.nodes) - {branch}
# equals
intersection_nodes = {'c'} & {'a1', 'b', 'c'} - {'c'}

Is this how it should behave? Or is there currently a bug?

SimonHeybrock · 2024-10-01T05:11:33Z

src/cyclebane/graph.py

+            new_pred = new_branch.pred[node]
+            if new_pred and graph.pred[node] != new_pred:


More generally, would it make sense to check if all keys in "new" have the same values in existing, basically if a dict update would have no effect?

SimonHeybrock · 2024-10-01T05:17:59Z

This should fix scipp/sciline#180.

If new_branch has no ancestors, it should be safe to insert it and use the ancestors of self.

@SimonHeybrock test_setitem_fails_when_grandparents_change is failing. I think this test should pass. But in __setitem__, we get intersection_nodes = set() because
        if branch in self.graph:
            graph = _remove_ancestors(self.graph, branch)
            graph.nodes[branch].clear()
removes all ancestors but 'c' such that
intersection_nodes = set(graph.nodes) & set(new_branch.nodes) - {branch}
# equals
intersection_nodes = {'c'} & {'a1', 'b', 'c'} - {'c'}
Is this how it should behave? Or is there currently a bug?

I may be wrong, but your call to __setitem__ will completely replace b. It is therefore ok to have different data/input nodes. If you add another node in the first graph which depends on the "old" b then we would want an exception.

Please see around

cyclebane/tests/graph_test.py

Lines 605 to 628 in ee09d24

    
           def test_setitem_raises_on_conflicting_input_nodes_in_ancestor() -> None: 
        
               g1 = nx.DiGraph() 
        
               g1.add_edge('a1', 'b') 
        
               g1.add_edge('b', 'c') 
        
               g1.add_edge('x', 'c') 
        
               g2 = nx.DiGraph() 
        
               g2.add_edge('a2', 'b') 
        
               g2.add_edge('b', 'x') 
        
               graph = cb.Graph(g1) 
        
               with pytest.raises(ValueError, match="Node inputs differ for node 'b'"): 
        
                   graph['x'] = cb.Graph(g2) 
        
           def test_setitem_replaces_nodes_that_are_not_ancestors_of_unrelated_node() -> None: 
        
               g1 = nx.DiGraph() 
        
               g1.add_edge('a', 'b') 
        
               g1.add_edge('b', 'c') 
        
               g1.add_edge('c', 'd') 
        
               graph = cb.Graph(g1) 
        
               g2 = nx.DiGraph() 
        
               g2.add_edge('b', 'c') 
        
               graph['c'] = cb.Graph(g2) 
        
               assert 'a' not in graph.to_networkx()

for existing tests. Do you see any that are missing?

SimonHeybrock · 2024-10-02T05:46:20Z

Having though about this some more, I still have doubts if auto-joining at nodes other than the key in __setitem__ is conceptually sound. However, I was wondering if this is not simply what a update or compose or union function should do (not sure which one, see NetworkX docs)? I have needed one in other circumstances (but managed to work around it), so maybe we should consider adding one?

jl-wynen · 2024-10-02T07:26:05Z

I'm in favour of using more explicitly named functions. But in this case, we should reconsider whether __setitem__ is the right choice. What does a graph contain? That is, what can you insert with __setitem__ and extract with __getitem__? And does the graph really behave like a container of these things?
I struggle to come up with an answer.

SimonHeybrock · 2024-10-02T11:48:27Z

I'm in favour of using more explicitly named functions. But in this case, we should reconsider whether __setitem__ is the right choice. What does a graph contain? That is, what can you insert with __setitem__ and extract with __getitem__? And does the graph really behave like a container of these things? I struggle to come up with an answer.

Not really like a container in the Python sense. I have found it most useful to think about Cyclebane graphs as functions, with input nodes being function arguments, and output nodes are function return values. Then __setitem__ would "extend" the function by attaching another function at an existing function argument. Similar, __detitem__ cuts the function, yielding a new function with arguments that were previously internal variables in the "function".

jl-wynen added 2 commits September 30, 2024 15:48

Do not treat missing ancestor as input mismatch

61574f2

Only bypass ancestors check if new branch has none

c794633

jl-wynen requested a review from SimonHeybrock September 30, 2024 14:04

SimonHeybrock reviewed Oct 1, 2024

View reviewed changes

SimonHeybrock mentioned this pull request Oct 10, 2024

docs: example writing multiple datasets to orso file scipp/essreflectometry#92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not treat missing ancestor as input mismatch #14

Do not treat missing ancestor as input mismatch #14

jl-wynen commented Sep 30, 2024

SimonHeybrock Oct 1, 2024

SimonHeybrock commented Oct 1, 2024

SimonHeybrock commented Oct 2, 2024

jl-wynen commented Oct 2, 2024

SimonHeybrock commented Oct 2, 2024

		new_pred = new_branch.pred[node]
		if new_pred and graph.pred[node] != new_pred:

Do not treat missing ancestor as input mismatch #14

Are you sure you want to change the base?

Do not treat missing ancestor as input mismatch #14

Conversation

jl-wynen commented Sep 30, 2024

SimonHeybrock Oct 1, 2024

Choose a reason for hiding this comment

SimonHeybrock commented Oct 1, 2024

SimonHeybrock commented Oct 2, 2024

jl-wynen commented Oct 2, 2024

SimonHeybrock commented Oct 2, 2024