You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently encountered an issue while using dplyr::semi_join with Clickhouse. The default code generated by dplyr produces a subquery with dependencies, and this isn't supported in Clickhouse (or am I wrong?). However, I noticed that Clickhouse does support LEFT SEMI JOIN. Consequently, I've wrote the following function to address this:
#' @export
#' @importFrom dbplyr sql_query_semi_join
sql_query_semi_join.ClickhouseConnection <- function(con, x, y, anti, by, where, vars, ..., lvl = 0) {
x <- dbplyr:::dbplyr_sql_subquery(con, x, name = by$x_as, lvl = lvl)
y <- dbplyr:::dbplyr_sql_subquery(con, y, name = by$y_as, lvl = lvl)
on <- dbplyr:::sql_join_tbls(con, by)
JOIN <- ifelse(anti, dplyr::sql("ANTI LEFT JOIN"), dplyr::sql("SEMI LEFT JOIN"))
# Wrap with SELECT since callers assume a valid query is returned
clauses <- list(
dbplyr:::sql_clause_select(con, vars),
dbplyr:::sql_clause_from(x),
dbplyr:::sql_clause(JOIN, y),
dbplyr:::sql_clause("ON", on, sep = " AND", parens = TRUE, lvl = 1)
)
dbplyr:::sql_format_clauses(clauses, lvl, con)
}
Nonetheless, I'm aware that my function uses some internal dbplyr functions, and I'm uncertain about the permissibility of this approach. Could someone provide some directions on how to refine this function for a potential PR?
Thank you in advance.
The text was updated successfully, but these errors were encountered:
Hi Lucas! We appreciate your help! The package comprises two parts: (1) the Clickhouse-Driver and (2) the dbplyr interface. Your code does dplyr-code generation and, thus, targets the second part. We started to develop the package at a time when Clickhouse's Join-support was super basic, not in line with the standards, and highly restricted, but that changed a lot. Consequently, you can do a much better job than we did.
I am thrilled that you want to invest time here, and a pull request is welcome. Ideally, you would adapt the R/dplyr.R file, as this is where the dplyr-specific code is located. But here are a few more settings to keep in mind:
The code generation should be in line with the settings. Not all join algorithms support all join clauses.
Moreover, the function should also consider ANY/ALL (just in case someone changed join_default_strictness).
Please keep in mind that also any_join_distinct_right_table_keys is likely to affect the semantics.
Also test your code with the new planner/analyzer (allow_experimental_analyzer). ClickHouse#42648
Hi,
I recently encountered an issue while using dplyr::semi_join with Clickhouse. The default code generated by dplyr produces a subquery with dependencies, and this isn't supported in Clickhouse (or am I wrong?). However, I noticed that Clickhouse does support LEFT SEMI JOIN. Consequently, I've wrote the following function to address this:
Nonetheless, I'm aware that my function uses some internal dbplyr functions, and I'm uncertain about the permissibility of this approach. Could someone provide some directions on how to refine this function for a potential PR?
Thank you in advance.
The text was updated successfully, but these errors were encountered: