-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug report] federation query using 2 hive metastores does not work when using gravitino #4932
Comments
@foryou7242 , could you help to clarify the below questions?
|
test1 cluster catalog
Suspicion is that gravitino seems to be using kyuubihivetable for hive meta table connection |
kyuubi hive connector could support multi hive mestatore, because Gravitino will create separate kyuubi hive instance for different catalogs which contains different hive metastore uri, I had tested two hive metastore with a shared HDFS cluster works well in the initial POC phase. and could you share the SQL to create the table? Does querying data works well? |
table is the same as issue because it's an existing table. > show create table portal_test_schema;
CREATE TABLE portal_test_schema (
...
month INT,
day INT,
hour INT
)
PARTITIONED BY (month, day, hour)
LOCATION 'hdfs://test1/test1'
TBLPROPERTIES (
'bucketing_version' = '2',
'discover.partitions' = 'true',
'input-format' = 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat', |
I setup two hivemestatore with sperate hdfs cluster, and couldn't reproduce this issue with following SQLs in both of the two catalogs. @foryou7242 could you try with the simple SQL like following?
|
Did you know that spark-sql uses hdfs, which looks at a different metastore than localhost? the above query results in the following error spark-sql (test)> create table a(a int) location 'hdfs://test1/warehouse/tablespace/external/hive/test.db/a';
24/09/23 18:07:58 ERROR SparkSQLDriver: Failed in [create table a(a int) location 'hdfs://test1/warehouse/tablespace/external/hive/test.db/a']
java.lang.RuntimeException: Failed to load the real sparkTable: test.a
at org.apache.gravitino.spark.connector.catalog.BaseCatalog.loadSparkTable(BaseCatalog.java:459)
at org.apache.gravitino.spark.connector.catalog.BaseCatalog.createTable(BaseCatalog.java:222)
at org.apache.spark.sql.connector.catalog.TableCatalog.createTable(TableCatalog.java:199)
at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:44)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `test`.`a` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:256)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:541)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:526)
at org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.$anonfun$loadTable$1(HiveTableCatalog.scala:166)
at org.apache.kyuubi.spark.connector.hive.HiveConnectorUtils$.withSQLConf(HiveConnectorUtils.scala:274)
at org.apache.kyuubi.spark.connector.hive.HiveTableCatalog.loadTable(HiveTableCatalog.scala:166)
at org.apache.gravitino.spark.connector.catalog.BaseCatalog.loadSparkTable(BaseCatalog.java:456)
... 60 more |
You should set |
@FANNG1 Thank you so much for your help. First of all, the The root cause is that However, I still have the same problem with spark-sql, am I right in understanding that this is an issue that will be fixed in the future?
|
Is |
Version
main branch
Describe what's wrong
I want to use federation query using hive metastore stored in 2 hadoop clusters.
So we added two hive catalogues to metalake.
There is a difference between the location path in the show create table and the actual location information when sql-sql querying.
It seems to be an effect of the actual spark-sql query
spark.sql.metastore.uris
option, so I'm wondering if it's possible to federate query 2 hives?Error message and/or stacktrace
explain query
How to reproduce
gravitino branch main
Additional context
No response
The text was updated successfully, but these errors were encountered: