Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] hudi-common 0.14.0 jar in mavenCentral appears to have corrupt generated avro classes #11602

Open
lucasmo opened this issue Jul 9, 2024 · 5 comments
Labels
priority:critical production down; pipelines stalled; Need help asap. project-build

Comments

@lucasmo
Copy link

lucasmo commented Jul 9, 2024

Describe the problem you faced

When diagnosing a problem with XTable (see apache/incubator-xtable#466), I noticed that avro classes were unable to even be instantiated for schema in a very simple test case when using hudi-common-0.14.0 as a dependency.

However, this issue does not exist when using hudi-spark3.4-bundle_2.12-0.14.0 as a dependency, which contains the same avro autogenerated classes. A good specific example is org/apache/hudi/avro/model/HoodieCleanPartitionMetadata.class.

When compiling hudi locally (tag release-0.14.0, mvn clean package -DskipTests -Dspark3.4, java 1.8), both generated jar files have the correct implementations of avro autogenerated classes.

To Reproduce

Steps to reproduce the behavior:

  1. Download and uncompress hudi-spark3.4-bundle_2.12-0.14.0.jar and hudi-common-0.14.0.jar from mavencentral
  2. Build Hudi locally
  3. Run javap on org/apache/hudi/avro/model/HoodieCleanPartitionMetadata.class in all four of the jars
  4. Note the file size of the text output of javap is 4232 for the file from every single jar aside from hudi-common, which has a javap text file size of 2323.

OR

run the following in Java 11, replacing $PATH_TO_A_HOODIE_AVRO_MODELS_JAR with a path to one of the four jar files

jshell --class-path ~/.m2/repository/org/apache/avro/avro/1.11.3/avro-1.11.3.jar:~/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.17.1/jackson-core-2.17.1.jar:~/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.17.1/jackson-databind-2.17.1.jar:~/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.17.1/jackson-annotations-2.17.1.jar:~/.m2/repository/org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.jar:$PATH_TO_A_HOODIE_AVRO_MODELS_JAR

Then, copy and paste this into the shell:

org.apache.avro.Schema schema = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"HoodieCleanPartitionMetadata\",\"namespace\":\"org.apache.hudi.avro.model\",\"fields\":[{\"name\":\"partitionPath\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"policy\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"deletePathPatterns\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"successDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"failedDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"isPartitionDeleted\",\"type\":[\"null\",\"boolean\"],\"default\":null}]}"); System.out.println("Class for schema: " + org.apache.avro.specific.SpecificData.get().getClass(schema));

On the MavenCentral hudi-common-0.14.0 jar, you should get:

|  Exception java.lang.ExceptionInInitializerError
|        at Class.forName0 (Native Method)
|        at Class.forName (Class.java:398)
...
|  Caused by: java.lang.IllegalStateException: Recursive update
|        at ConcurrentHashMap.computeIfAbsent (ConcurrentHashMap.java:1760)

Expected behavior

The above code snippet prints

Class for schema: class org.apache.hudi.avro.model.HoodieCleanPartitionMetadata

Environment Description

  • Hudi version : 0.14.0

everything else n/a, but duplicated issue on macOS and Ubuntu 22.04.

@lucasmo
Copy link
Author

lucasmo commented Jul 9, 2024

#11378 appears to be caused by this same issue

@ad1happy2go ad1happy2go added project-build priority:critical production down; pipelines stalled; Need help asap. labels Jul 10, 2024
@lucasmo
Copy link
Author

lucasmo commented Jul 13, 2024

Here is a reproducer script:

#!/usr/bin/env bash
MAVEN="https://repo1.maven.org/maven2"

ARTIFACTS="\
org/apache/avro/avro/1.11.3/avro-1.11.3.jar \
com/fasterxml/jackson/core/jackson-core/2.17.1/jackson-core-2.17.1.jar \
com/fasterxml/jackson/core/jackson-databind/2.17.1/jackson-databind-2.17.1.jar \
com/fasterxml/jackson/core/jackson-annotations/2.17.1/jackson-annotations-2.17.1.jar \
org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.jar \
org/apache/hudi/hudi-common/0.14.0/hudi-common-0.14.0.jar \
"

CLASSPATH=""

for artifact in $ARTIFACTS; do
  curl -O "${MAVEN}/${artifact}"
  jar=$(basename "$artifact")
  CLASSPATH="${CLASSPATH}:${jar}"
done

echo $CLASSPATH

echo 'org.apache.avro.Schema schema = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"HoodieCleanPartitionMetadata\",\"namespace\":\"org.apache.hudi.avro.model\",\"fields\":[{\"name\":\"partitionPath\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"policy\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"deletePathPatterns\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"successDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"failedDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"isPartitionDeleted\",\"type\":[\"null\",\"boolean\"],\"default\":null}]}"); System.out.println("Class for schema: " + org.apache.avro.specific.SpecificData.get().getClass(schema));' |\
    jshell --class-path "${CLASSPATH}"

@xushiyan
Copy link
Member

due to build profiles varying wrt spark and flink profiles, we don't expect hudi-common jars in the maven repo to be used for all spark/flink versions which changes avro versions over time, which causes compatibility issues. We expect people only use hudi bundle jars like hudi-spark3.5-bundle, hudi-utilities-slim-bundle, hudi-flink1.18-bundle, etc

@lucasmo
Copy link
Author

lucasmo commented Jul 29, 2024

@xushiyan understood. I am not an xtable developer. However, it seems pretty clear that the issue is with corrupted classes, not a spark version.

I have asked the XTable devs in the linked ticket to comment here. I'm not sure what I can do to make this move forward.

@xushiyan
Copy link
Member

@lucasmo we should be able to fix this in 0.16.0 (tracking in https://issues.apache.org/jira/browse/HUDI-8028)

In the meantime, if you want the hudi-common jar to work, you may build the project with spark 3.4 or 3.5 profile, which will produce a hudi common jar that includes a compatible avro dependency for your spark version (assume you're using spark 3.4 or 3.5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:critical production down; pipelines stalled; Need help asap. project-build
Projects
Status: 🏁 Triaged
Development

No branches or pull requests

3 participants