Why spark generate Java code and not scala code? #18

igreenfield · 2019-11-04T17:24:20Z

No description provided.

bartosz25 · 2019-11-09T17:50:35Z

Thank you @igreenfield for such an amazing question! I was looking for the reasons in the documentation and old PR but found any information about that. I've just posted a question on Spark users group. You can follow the conversation on https://mail-archives.apache.org/mod_mbox/spark-user/201911.mbox/browser or if not, I'll keep you up to date on this Issue.

Cheers,
Bartosz.

igreenfield · 2019-11-09T20:00:23Z

@bartosz25 I was looking into the code generation phase and I think that if the code was scala it was easier to reduce the number of code line so many cases of compilation failed due to method grows more then 64KB will disappear.

bartosz25 · 2019-11-10T07:17:51Z

Hi @igreenfield ,

I've some answers from the mailing list:

Long story short, it's all about the compilation performance :)

Regarding your point about 64KB limitation, AFAIK, Spark has a protection against too long methods. First, it's able to split too long function into multiple methods (spark.sql.codegen.methodSplitThreshold). Second, it's also able to desactivate codegen to handle the JVM max method length limit (spark.sql.codegen.hugeMethodLimit).

Did you already have some issues about "too long" generated method which made your pipeline fail? I've never experienced that so I'm really curious to learn new things and maybe help you to overcome the issue by reworking the code?

igreenfield · 2019-11-10T14:05:47Z

Hi @bartosz25
First thanks for the help!!

The compilation performance could be eliminated using a compile server.
Yes, I hit the 64KB limit all the time. my use case is very complex: we are migrating SQL engine into spark. (most cases processNext method) for example

we can schedule a call and I can explain in more details.

another thing, one of the answers:

Also for low level code we can’t use (due to perf concerns) any of the
edges scala has over java, eg we can’t use the Scala collection library,
functional programming, map/flatMap. So using scala doesn’t really buy
anything even if there is no compilation speed concerns.

I think the ability to return more than one object from a function can do the different in splitting the huge methods into smaller ones.

bartosz25 · 2019-11-14T06:12:55Z

Re @igreenfield

At that moment I don't have much time so I won't be able to help you. Sorry for that, late January it should be better. Meantime, maybe you can take a look at my series about Apache Spark customization. I cover them how to alter logical and physical plans, how to add a new parser and so forth. Maybe with that you can write your own code generation which will be much shorter than the code you've just shown me. The articles were published here: https://www.waitingforcode.com/tags/spark-sql-customization

Anyway, I doubt that Spark community agrees on switching code generation to Scala because of a single demand. But you can always take a try and ask directly on the mailing list https://spark.apache.org/community.html

Cheers,
Bartosz.

igreenfield · 2019-11-14T07:25:48Z

Hi, @bartosz25 Thanks! I will be in touch with you in late January.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why spark generate Java code and not scala code? #18

Why spark generate Java code and not scala code? #18

igreenfield commented Nov 4, 2019

bartosz25 commented Nov 9, 2019

igreenfield commented Nov 9, 2019

bartosz25 commented Nov 10, 2019 •

edited

Loading

igreenfield commented Nov 10, 2019 •

edited

Loading

bartosz25 commented Nov 14, 2019 •

edited

Loading

igreenfield commented Nov 14, 2019

Why spark generate Java code and not scala code? #18

Why spark generate Java code and not scala code? #18

Comments

igreenfield commented Nov 4, 2019

bartosz25 commented Nov 9, 2019

igreenfield commented Nov 9, 2019

bartosz25 commented Nov 10, 2019 • edited Loading

igreenfield commented Nov 10, 2019 • edited Loading

bartosz25 commented Nov 14, 2019 • edited Loading

igreenfield commented Nov 14, 2019

bartosz25 commented Nov 10, 2019 •

edited

Loading

igreenfield commented Nov 10, 2019 •

edited

Loading

bartosz25 commented Nov 14, 2019 •

edited

Loading