Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PythonMutator: propagate source locations #1783

Merged
merged 11 commits into from
Jan 22, 2025

Conversation

kanterov
Copy link
Contributor

@kanterov kanterov commented Sep 20, 2024

Changes

Add a mechanism to load Python source locations in the Python mutator. Previously, locations pointed to generated YAML. Now, they point to Python sources instead. Python process outputs "locations.json" containing locations of bundle paths, examples:

{"path": "resources.jobs.job_0", "file": "resources/job_0.py", "line": 3, "column": 5}
{"path": "resources.jobs.job_0.tasks[0].task_key", "file": "resources/job_0.py", "line": 10, "column": 5}
{"path": "resources.jobs.job_1", "file": "resources/job_1.py", "line": 5, "column": 7}

Such locations form a tree, and we assign locations of the closest ancestor to each dyn.Value based on its path. For example, resources.jobs.job_0.tasks[0].task_key is located at job_0.py:10:5 and resources.jobs.job_0.tasks[0].email_notifications is located at job_0.py:3:5, because we use the location of the job as the most precise approximation.

This feature is only enabled if experimental/python is used.

Note: for now, we don't update locations with relative paths, because it has a side effect in changing how these paths are resolved

Example

% databricks bundle validate

Warning: job_cluster_key abc is not defined
  at resources.jobs.examples.tasks[0].job_cluster_key
  in resources/example.py:10:1

Tests

Unit tests and manually

@kanterov kanterov requested a review from pietern September 20, 2024 16:09
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

The line protocol in the locations file looks fine.

bundle/config/experimental.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_mutator.go Outdated Show resolved Hide resolved
@kanterov kanterov force-pushed the kanterov/python-locations branch from def4744 to 43ce278 Compare October 8, 2024 08:24
@kanterov kanterov force-pushed the kanterov/python-locations branch from 96a6cef to d9bf157 Compare January 8, 2025 15:12
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also merge main to make sure it passes with the latest linter settings?

// - resources.jobs.job_0.tasks[0].task_key is located at job_0.py:10:5
//
// - resources.jobs.job_0.tasks[0].email_notifications is located at job_0.py:3:5,
// because we use the location of the job as the most precise approximation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(indent mismatch between above and here)

The entries included (as an example of locations.json) made me think it should match the data structure below. Please include a reference to [pythonLocationEntry] to make it clear this is not the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference, and I had to reformat doc slightly, or "go fmt" was always resulting into a bad indent

}

newLocations = append(newLocations, location)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is hard to parse in combination with the append above.

If I understand correctly, you want to want the output to:

  • Have the Python location as the first element
  • Filter out the locations with generatedFileName for the filename

It would be clearer if newLocations was initialized by assignment and if the loop got a comment saying that it only appends locations that are relevant (i.e. not the ones with the virtual file path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted the code into a separate function, and added elaborate comment

_, err = paths.VisitJobPaths(generated, func(p dyn.Path, kind paths.PathKind, v dyn.Value) (dyn.Value, error) {
putPythonLocation(locations, p, v.Location())
return v, nil
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this condition once:

  1. Variable interpolation runs before running the Python mutators
  2. Path normalization (to make all relative paths relative to the sync root) runs before running the Python mutators

Then this mutator must make sure all paths are relative to the sync root.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a test to confirm that the value for a field that is a path has the right location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test, and included the note you wrote about into the code, so we don't forget when this code is safe to cleanup.

@kanterov kanterov force-pushed the kanterov/python-locations branch from 4ec0f52 to a462c9f Compare January 22, 2025 14:14
Copy link

An authorized user can trigger integration tests manually by following the instructions below:

Trigger:
go/deco-tests-run/cli

Inputs:

  • PR number: 1783
  • Commit SHA: f24fa78e32800f4f6e72d8fb32b8efc2f2c0d2fb

Checks will be approved automatically on success.

@kanterov kanterov requested a review from pietern January 22, 2025 14:31
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kanterov !

@pietern pietern added this pull request to the merge queue Jan 22, 2025
Merged via the queue into databricks:main with commit 3d91691 Jan 22, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants