Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMR Spark job with SigV4 signing #416

Open
joyfulwang opened this issue Mar 1, 2024 · 1 comment
Open

EMR Spark job with SigV4 signing #416

joyfulwang opened this issue Mar 1, 2024 · 1 comment

Comments

@joyfulwang
Copy link

Hi, my team is adding SigV4 signing to all of the read/write requests that our resources send to Elasticsearch. We've successfully added signing to requests from our backend Java service and to a Lambda function. We're now trying to add signing to our EMR Spark jobs, which are using emr-6.6.0, Spark 3.x, Scala 2.12, and opensearch-hadoop (Maven-org-opensearch-client_opensearch-spark-30_2_12). The Elasticsearch cluster is version 7.10

After reading the opensearch-hadoop User Guide and the Configuration Options for Maven-org-opensearch-client_opensearch-spark, I updated our OpenSearch config to the following

private val basicOpenSearchConfig = Map(
    "opensearch.nodes" -> <opensearch_endpoint>,
    "opensearch.nodes.wan.only" -> "true",
    "opensearch.port" -> "443",
    "opensearch.net.ssl" -> "true",
    "opensearch.net.ssl.cert.allow.self.signed" -> "true",
    "opensearch.net.ssl.protocol" -> "SSL",
    "opensearch.aws.sigv4.enabled" -> "true",
    "opensearch.aws.sigv4.region" -> "us-east-1")

After enabling the SigV4 signing config, I tested if the Spark job could read index names from a cluster that has fine-grained access control enabled and got "Unauthorized" as the response. Here's what I've tried for troubleshooting:

  1. Copying aws-java-sdk-bundle-1.12.170.jar into the EMR host during bootstrapping, as recommended by the opensearch-hadoop User Guide. It didn't make a difference, and the lack of this jar in the EMR host also didn't cause any ClassDefNotFound errors
  2. Made sure that the following policy is associated with the IAM role that the EMR cluster is using
{
  "Effect": "Allow",
  "Action": "es:ESHttp*",
  "Resource": "arn:aws:es:us-east-1:<aws_account_id>:domain/<domain_name>/*"
}

Any ideas for troubleshooting?

@Xtansia
Copy link
Collaborator

Xtansia commented Mar 3, 2024

@joyfulwang Have you mapped the EMR job's IAM role to a internal user within ElasticSearch/OpenSearch according to the documentation here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html#fgac-access-control

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants