Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcopy handles formular symbols #524

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/convert/formula.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@

function rcopy(::Type{FormulaTerm}, l::Ptr{LangSxp})
expr = rcopy(Expr, l)
# special case of a simple variable, like in aes(x, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# special case of a simple variable, like in aes(x, y)
# special case of a non-standard evaluation / a "bare" expression / (implicit) one-sided formula in R, like in `aes(x, y)`

if Meta.isexpr(expr, :call) && length(expr.args) == 2 && expr.args[1] == :~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means expr is Expr(:call, :~, x), i.e. :(~x), for some x. So this is keeping one-sided formulas from being passed to @formula, which can't handle one-sided formulas anyway. What I don't understand though is the accompanying comment here: "special case of a simple variable, like in aes(x, y)." I don't see how that example applies to this situation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the concrete example which fails for me in current RCall. Having an expression like aes(x, y) runs into this formula problem that simple symbols like x and y in this case are not yet supported

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this example, aes(x, y) appears inside of an R formula? Is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more like aes(x, y) is "lowered" to aes_(~x, ~y) via non standard evaluation because R uses one sided formulas to represent symbols/expressions. (The lowering rule is part of the function definition in ggplot, so it's maybe better described as an implicit macro.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be honest, I don't actually know how RCall works internally. Is the following more or less accurate for this situation? The R surface syntax is aes(x, y) which gets evaluated in R as aes_(~x, ~y), so by the time this expression gets to Julia, Julia sees Expr(:call, :aes_, Expr(:call, :~, :x), Expr(:call, :~, :y)) and tries to turn it into aes_(::FormulaTerm, ::FormulaTerm), but that fails because FormulaTerm doesn't support one-sided formulas. Assuming that's accurate, this change seems safe to me and is arguably a bugfix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the behavior is now

julia> using RCall

R> install.packages("^C

R> library("ggplot2")

julia> R"aes(x, y"^C

R> df <- data.frame(x=1,y=1)

julia> R"~x"
RObject{LangSxp}
~x


julia> rcopy(R"~x")
:(~x)

julia> rcopy(R"~y")
:(~y)

julia> rcopy(R"aes(df, x, y)")
OrderedCollections.OrderedDict{Symbol, Any} with 3 entries:
  :x         => :(~df)
  :y         => :(~x)
  Symbol("") => :(~y)

julia> R"aes(df, x, y)"
RObject{VecSxp}
Aesthetic mapping: 
* `x` -> `df`
* `y` -> `x`
* ``  -> `y`

julia> rcopy(R"y~x")
FormulaTerm
Response:
  y(unknown)
Predictors:
  x(unknown)

julia> rcopy(R"~x+y")
:(~(x + y))

So this PR converts a one-sided formula into a symbol representation of that formula. @ararslan do you have an opinion on whether it would be better to strip the tilde? I'm a bit conflicted myself and could argue for either direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any examples in mind of when stripping the tilde would not be desirable? If not then stripping it seems sensible to me, but otherwise would there still be a way to recover the original behavior in those cases after this PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there are cases where you would want to preserve a raw one-sided formula as an explicit one-sided formula. I can't think of any:

  • you usually don't want to evaluate in R something already copied into Julia without thinking about how to convert it back. In other words, preserving the original literal expression isn't really valuable
  • if you want to create the closest Julia analog to a one-sided formula (either 0 or the left-hand side or a MatrixTerm), then the tilde needs to be stripped to programmatically manipulate the symbols anyway.

Going the other way: the tilde in one-sided formulae in R is generally analogous to Expr in Julia, so it seems having it be Expr in Julia eliminates the need for the tilde.

I guess we can solve the dilemma by adding a note to the documentation that one-sided formulae in R are converted to Expr in Julia without a tilde as a representation of unevaluated code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the explanation, that all sounds good to me. I think the documentation aspect will be particularly important in case anybody ends up surprised by this behavior for whatever reason, however unlikely. Consider me on board with this change!

return expr

Check warning on line 43 in src/convert/formula.jl

View check run for this annotation

Codecov / codecov/patch

src/convert/formula.jl#L43

Added line #L43 was not covered by tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return expr
return expr.args[2]

(so that we drop the tilde)

end
# complex formular
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# complex formular
# complex formula

@eval StatsModels.@formula($expr)
end

Expand Down
Loading