Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed request to tables with Object Storage Engines #615

Open
wants to merge 20 commits into
base: antalya
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/Storages/ObjectStorage/Azure/Configuration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ void StorageAzureConfiguration::addStructureAndFormatToArgsIfNeeded(
}
}

void StorageAzureConfiguration::setFunctionArgs(ASTs & args) const
void StorageAzureConfiguration::getTableFunctionArguments(ASTs & args) const
{
if (!args.empty())
{ /// Just check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if you implement this function in the way I suggested in #615 (comment), perhaps this whole if statement is no longer necessary?

Expand Down
2 changes: 1 addition & 1 deletion src/Storages/ObjectStorage/Azure/Configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ class StorageAzureConfiguration : public StorageObjectStorage::Configuration
ContextPtr context,
bool with_structure) override;

void setFunctionArgs(ASTs & args) const override;
void getTableFunctionArguments(ASTs & args) const override;

protected:
void fromNamedCollection(const NamedCollection & collection, ContextPtr context) override;
Expand Down
2 changes: 1 addition & 1 deletion src/Storages/ObjectStorage/HDFS/Configuration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ void StorageHDFSConfiguration::addStructureAndFormatToArgsIfNeeded(
}
}

void StorageHDFSConfiguration::setFunctionArgs(ASTs & args) const
void StorageHDFSConfiguration::getTableFunctionArguments(ASTs & args) const
{
if (!args.empty())
{ /// Just check
Expand Down
2 changes: 1 addition & 1 deletion src/Storages/ObjectStorage/HDFS/Configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ class StorageHDFSConfiguration : public StorageObjectStorage::Configuration
ContextPtr context,
bool with_structure) override;

void setFunctionArgs(ASTs & args) const override;
void getTableFunctionArguments(ASTs & args) const override;

private:
void fromNamedCollection(const NamedCollection &, ContextPtr context) override;
Expand Down
2 changes: 1 addition & 1 deletion src/Storages/ObjectStorage/S3/Configuration.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -585,7 +585,7 @@ void StorageS3Configuration::addStructureAndFormatToArgsIfNeeded(
}
}

void StorageS3Configuration::setFunctionArgs(ASTs & args) const
void StorageS3Configuration::getTableFunctionArguments(ASTs & args) const
{
if (!args.empty())
{ /// Just check
Expand Down
2 changes: 1 addition & 1 deletion src/Storages/ObjectStorage/S3/Configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ class StorageS3Configuration : public StorageObjectStorage::Configuration
ContextPtr context,
bool with_structure) override;

void setFunctionArgs(ASTs & args) const override;
void getTableFunctionArguments(ASTs & args) const override;

private:
void fromNamedCollection(const NamedCollection & collection, ContextPtr context) override;
Expand Down
4 changes: 2 additions & 2 deletions src/Storages/ObjectStorage/StorageObjectStorage.h
Original file line number Diff line number Diff line change
Expand Up @@ -247,9 +247,9 @@ class StorageObjectStorage::Configuration

virtual void update(ObjectStoragePtr object_storage, ContextPtr local_context);

virtual void setFunctionArgs(ASTs & /* args */) const
virtual void getTableFunctionArguments(ASTs & /* args */) const
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method setFunctionArgs is not supported by storage {}", getEngineName());
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method getTableFunctionArguments is not supported by storage {}", getEngineName());
}

protected:
Expand Down
13 changes: 10 additions & 3 deletions src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,13 @@ void StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded(ASTPtr
return;

auto * tables = select_query->tables()->as<ASTTablesInSelectQuery>();

if (tables->children.empty())
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Expected SELECT query from table with engine {}, got '{}'",
configuration->getEngineName(), queryToString(query));

auto * table_expression = tables->children[0]->as<ASTTablesInSelectQueryElement>()->table_expression->as<ASTTableExpression>();
if (!table_expression->database_and_table_name)
return;
Expand All @@ -116,7 +123,7 @@ void StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded(ASTPtr

auto table_alias = table_identifier_typed.tryGetAlias();

std::unordered_map<std::string, std::string> engine_to_function = {
static std::unordered_map<std::string, std::string> engine_to_function = {
{"S3", "s3"},
{"Azure", "azureBlobStorage"},
{"HDFS", "hdfs"},
Expand Down Expand Up @@ -153,7 +160,7 @@ void StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded(ASTPtr
queryToString(query));
}

configuration->setFunctionArgs(arguments->children);
configuration->getTableFunctionArguments(arguments->children);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #615 (comment), I suggested the arguments to be created inside getTableFunctionArguments instead of populating a parameter. Any specific reason you did it this way?

If there is one, I am fine with this approach

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid copying after function call. children is a vector, not a pointer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow.

On line 155, you do: auto arguments = std::make_shared<ASTExpressionList>();

Then you are passing children as a reference on 167 and populating it inside the getTableFunctionArguments function.

On line 169, you read the values.

What I am suggesting is that you create the arguments list inside the getTableFunctionArguments function, populate and simply return it. It'll be a shared_ptr, there is no deep copy in this process. Then, the rest is the same.

What am I missing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes not in arguments but in arguments->children
children has type ASTs
https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/IAST.h#L37
It's a variant of vector, not a single pointer
https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/IAST_fwd.h#L14

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, let's make the function return it as suggested in the original comment


function_ast->arguments = arguments;
function_ast->children.push_back(arguments);
Expand All @@ -163,7 +170,7 @@ void StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded(ASTPtr

table_expression->database_and_table_name = nullptr;
table_expression->table_function = function_ast_ptr;
table_expression->children[0].swap(function_ast_ptr);
table_expression->children[0] = function_ast_ptr;

auto settings = select_query->settings();
if (settings)
Expand Down
12 changes: 12 additions & 0 deletions src/Storages/ObjectStorage/StorageObjectStorageCluster.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,18 @@ class StorageObjectStorageCluster : public IStorageCluster
const StorageSnapshotPtr & storage_snapshot,
const ContextPtr & context) override;

/*
In case the table was created with `object_storage_cluster` setting,
modify the AST query object so that it uses the table function implementation
by mapping the engine name to table function name and setting `object_storage_cluster`.
For table like
CREATE TABLE table ENGINE=S3(...) SETTINGS object_storage_cluster='cluster'
coverts request
SELECT * FROM table
to
SELECT * FROM s3(...) SETTINGS object_storage_cluster='cluster'
to make distributed request over cluster 'cluster'.
*/
void updateQueryForDistributedEngineIfNeeded(ASTPtr & query);

const String engine_name;
Expand Down
Loading