Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New IR -- WIP #24466

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

New IR -- WIP #24466

wants to merge 3 commits into from

Conversation

kasiafi
Copy link
Member

@kasiafi kasiafi commented Dec 13, 2024

See core/trino-main/src/main/java/io/trino/sql/newir/README.md for details.

@cla-bot cla-bot bot added the cla-signed label Dec 13, 2024
@kasiafi kasiafi force-pushed the 526Abstractions branch 2 times, most recently from ed4c70f to 6ea030e Compare December 13, 2024 17:44
@kasiafi
Copy link
Member Author

kasiafi commented Dec 13, 2024

example assembly printout

IR version = 1
%0 = query() : () -> "boolean" ({
    ^query
        %1 = table_scan() : () -> "multiset(row(""f_1"" varchar(25),""f_2"" bigint))" ()
            {table_handle = "{""catalogHandle"":""tpch:normal:default"",""connectorHandle"":{""@type"":""../../plugin/trino-tpch/pom.xml:io.trino.plugin.tpch.TpchTableHandle"",""schemaName"":""tiny"",""tableName"":""nation"",""scaleFactor"":0.01,""constraint"":{""columnDomains"":[]}},""transaction"":[""../../plugin/trino-tpch/pom.xml:io.trino.plugin.tpch.TpchTransactionHandle"",""INSTANCE""]}", column_handles = "[{""@type"":""../../plugin/trino-tpch/pom.xml:io.trino.plugin.tpch.TpchColumnHandle"",""columnName"":""name"",""type"":""varchar(25)""},{""@type"":""../../plugin/trino-tpch/pom.xml:io.trino.plugin.tpch.TpchColumnHandle"",""columnName"":""regionkey"",""type"":""bigint""}]", constraint = "{""columnDomains"":[]}", update_target = "false", use_connector_node_partitioning = "false"}
        %2 = filter(%1) : ("multiset(row(""f_1"" varchar(25),""f_2"" bigint))") -> "multiset(row(""f_1"" varchar(25),""f_2"" bigint))" ({
            ^predicate (%3 : "row(""f_1"" varchar(25),""f_2"" bigint)")
                %4 = field_selection(%3) : ("row(""f_1"" varchar(25),""f_2"" bigint)") -> "bigint" ()
                    {field_name = "f_2"}
                %5 = constant() : () -> "bigint" ()
                    {constant_result = "{""type"":""bigint"",""value"":2}"}
                %6 = comparison(%4, %5) : ("bigint", "bigint") -> "boolean" ()
                    {comparison_operator = "GREATER_THAN"}
                %7 = return(%6) : ("boolean") -> "boolean" ()
                    {ir.terminal = "true"}
            })
        %8 = project(%2) : ("multiset(row(""f_1"" varchar(25),""f_2"" bigint))") -> "multiset(row(""f_1"" varchar(25)))" ({
            ^assignments (%9 : "row(""f_1"" varchar(25),""f_2"" bigint)")
                %10 = field_selection(%9) : ("row(""f_1"" varchar(25),""f_2"" bigint)") -> "varchar(25)" ()
                    {field_name = "f_1"}
                %11 = call(%10) : ("varchar(25)") -> "varchar(25)" ()
                    {resolved_function = "{""signature"":{""name"":{""catalogName"":""system"",""schemaName"":""builtin"",""functionName"":""lower""},""returnType"":""varchar(25)"",""argumentTypes"":[""varchar(25)""]},""catalogHandle"":""system:normal:system"",""functionId"":""lower(varchar(x)):varchar(x)"",""functionKind"":""SCALAR"",""deterministic"":true,""functionNullability"":{""returnNullable"":false,""argumentNullable"":[false]},""typeDependencies"":{},""functionDependencies"":[]}"}
                %12 = row(%11) : ("varchar(25)") -> "row(varchar(25))" ()
                %13 = return(%12) : ("row(varchar(25))") -> "row(varchar(25))" ()
                    {ir.terminal = "true"}
            })
        %14 = output(%8) : ("multiset(row(""f_1"" varchar(25)))") -> "boolean" ({
            ^outputFieldSelector (%15 : "row(""f_1"" varchar(25))")
                %16 = field_selection(%15) : ("row(""f_1"" varchar(25))") -> "varchar(25)" ()
                    {field_name = "f_1"}
                %17 = row(%16) : ("varchar(25)") -> "row(varchar(25))" ()
                %18 = return(%17) : ("row(varchar(25))") -> "row(varchar(25))" ()
                    {ir.terminal = "true"}
            })
            {output_names = "[""_col0""]", ir.terminal = "true"}
    })
    {ir.terminal = "true"}

@kasiafi kasiafi force-pushed the 526Abstractions branch 14 times, most recently from 30ed4b3 to 852f33b Compare December 20, 2024 15:17
@kasiafi kasiafi force-pushed the 526Abstractions branch 12 times, most recently from d71a5b9 to 8f19d4a Compare December 29, 2024 10:53
@kasiafi kasiafi force-pushed the 526Abstractions branch 11 times, most recently from 687d453 to fdaffa9 Compare January 10, 2025 12:31
@kasiafi kasiafi force-pushed the 526Abstractions branch 5 times, most recently from 5383dcf to 0a5463b Compare February 4, 2025 14:49
@kasiafi kasiafi marked this pull request as ready for review February 4, 2025 14:50
@kasiafi kasiafi force-pushed the 526Abstractions branch 3 times, most recently from bf82af6 to 1b12725 Compare February 5, 2025 19:35
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the attributes should be on the top level of the dialect, as opposed to being part of an operation.
IR allows mixing dialects. In particular, an operation can have attributes from another dialect. For example, the trino.query operation has the attribute ir.terminal. The dialect is supposed to understand an attribute outside the context of an operation.

{
// Intermediate result row type.
// Row without fields is supported and represented as EmptyRowType.
// If row fields are present, they must have valid unique names.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the intermediate relation row type has field names, and the fields are referenced by name with the FieldSelection operation. It will be refactored so that the row type is anonymous, and the fields will be referenced by index with the FieldReference operation.

Explanation:
The query program must work correctly with the Memo and Equivalence Classes. For that purpose, the intermediate relation type must be generic. Each operation in an Equivalence Class must derive exactly the same output type, because they all must be compatible with the downstream program.
Due to this limitation, field names in the intermediate row type aren't very useful. We must use generic sequential names, for example f_1, f_2, f_3... Using indexes would be more concise.

Comment on lines +120 to +124
if (leftCriteriaSelector.parameters().size() != 1 ||
!trinoType(leftCriteriaSelector.parameters().getFirst().type()).equals(relationRowType(trinoType(left.type()))) ||
!(trinoType(leftCriteriaSelector.getReturnedType()) instanceof RowType || trinoType(leftCriteriaSelector.getReturnedType()).equals(EMPTY_ROW))) {
throw new TrinoException(IR_ERROR, "invalid left criteria selector for Join operation");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of repetition of this code across different operation classes. It will be extracted and reused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant