Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --show-logs in cluster mode on EMR on EC2 #12

Open
dacort opened this issue Apr 6, 2023 · 0 comments
Open

Add support for --show-logs in cluster mode on EMR on EC2 #12

dacort opened this issue Apr 6, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@dacort
Copy link
Contributor

dacort commented Apr 6, 2023

With the recent --show-logs flag, we switch the deploy mode to client so that EMR steps can capture the driver stdout.

Unfortunately, --client mode doesn't work with additional archives provided via the --archives flag or --conf spark.archives parameter. See https://issues.apache.org/jira/browse/SPARK-36088 for more a related issue.

In order to support this for cluster mode, we'd need to parse the step stderr logs to retrieve the Yarn application ID, then fetch the Yarn application logs from S3.

@dacort dacort added the enhancement New feature or request label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant