-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whole stage generated code for a simple query #23
Comments
Hi @bithw1 I will take a look at your example today or tomorrow. Thank you. |
Re @bithw1 I didn't succeed to reproduce the same generated code. Do you have any specific setup? /* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */ return new SpecificUnsafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
/* 006 */
/* 007 */ private Object[] references;
/* 008 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.Unsaf
eRowWriter[1];
/* 009 */
/* 010 */ public SpecificUnsafeProjection(Object[] references) {
/* 011 */ this.references = references;
/* 012 */ mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(2, 64);
/* 013 */
/* 014 */ }
/* 015 */
/* 016 */ public void initialize(int partitionIndex) {
/* 017 */
/* 018 */ } |
Thanks @bartosz25 . I print out the message from WholeStageCodegenExec#doExecute,
I am using the code of the Spark's master branch, so that, I am running against the latest spark code base.I am not sure whether older'version will print out the similar generated code. |
OK, that explains why we got different results :) I launched the code against Spark 3 preview 2 and also got a single scan stage: [2020-01-07 06:20:10,399] org.apache.spark.internal.Logging DEBUG
/* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */ return new SpecificUnsafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
/* 006 */
/* 007 */ private Object[] references;
/* 008 */ private boolean resultIsNull_0;
/* 009 */ private java.lang.String[] mutableStateArray_0 = new java.lang.String[1];
/* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] mutableStateArray_1 = new org.apache.spark.sql.catalyst.expressions.codegen.Unsaf
eRowWriter[1];
/* 011 */
/* 012 */ public SpecificUnsafeProjection(Object[] references) {
/* 013 */ this.references = references;
/* 014 */
/* 015 */ mutableStateArray_1[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(2, 32);
/* 016 */
/* 017 */ }
/* 018 */ And the plans + result:
Maybe check the preview branch because master seems to be still work in progress (26 127 commits vs 26 004 for the preview), or stay for now with 2.4.4 :) To debug the code generated stage, you can also use this tip instead of adding printing in the framework: https://www.waitingforcode.com/tips/spark-sql/how_show_generated_code :) |
thanks @bartosz25 . hmm...I am kind of surprised that we saw the different output, above is 2.4.0, we should see line 549, which is:
|
Let's confirm first if the output is really different for you if you launch the code against 2.4.0.
Maybe there is no change in WholeStageCodegenExec but somewhere earlier in the planning? I didn't follow what happen on master ever day, I do only when I write new posts ;-) IMO if you try to understand what happens, it's better to test on a stable version, eventually beta if you're really curious :P Could you try then to run the code on top of 2.4.0 to see if you also get different plan than I do? |
Hi, @bartosz25 ,
I have a simple test case that would like to see whole stage generated code
following is snippet of generated code:
I don't understand varaible
filter_mutableStateArray_0
, it is created for the FilterExec(the varable name starts withfilter
), I think this variable should be created for ProjectExec, that should be namedproject_mutableStateArray_0
(it is of type UnsafeRowWriter),I am not sure why this variable is created for FilterExec,
Could you please have a look? Thanks!
The text was updated successfully, but these errors were encountered: