Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: num_features for TemplateExpressionSpec doesn't work as written in API Ref #811

Open
jc-umana-FI opened this issue Jan 20, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@jc-umana-FI
Copy link

What happened?

Heyo PySR team! The num_features parameter in TemplateExpressionSpec is described in the API Reference as taking a Python dictionary as input, but doesn't behave that way, yielding a TypeError as shown below:

The Code & The Error

Code yielding TypeError:

# Create data
X = np.random.randn(1000, 5)
y = np.sin(X[:, 0] + X[:, 3] + X[:, 4]) * X[:, 2]**2

# Define template: we want f(x1, x4, x5) + g(x3)
# We'll obtain independent expressions for f and g 
# where the respective variables of each are expressed
# as #1 for the first variable, #2 for the second, 
# and so on.
# While the output can be evaluated, it is NOT compatible with
# the exporting features for typesetting and modeling in other libraries.
# 
template_py = TemplateExpressionSpec(
    function_symbols=["f", "g"],
    combine="""((; f, g), (a, b, c, d, e)) -> let
    _f = f(a, d, e) 
    _g = g(c)
    _out = _f + _g
    end""",
    num_features={"f": 2, "g": 1},
)

model = PySRRegressor(
    expression_spec=template_py,
    binary_operators=["+", "*", "-", "/"],
    unary_operators=["sin"],
    model_selection="score",
    maxsize=25,
    tournament_selection_n=15,
    tournament_selection_p=0.9,
    niterations=50,
    populations=100,
    population_size=80,
    should_simplify=True,
    topn=10,
    # annealing=True,
    # alpha=2,
    #full_objective=jl.seval(objective),
)
model.fit(X, y)

ERROR:
JuliaError: TypeError: in typeassert, expected Symbol, got a value of type String

The Fix

This error is fixed if you rewrite the dictionary in Julia rather than Python, but I haven't managed to get it to pass a Python dict as described in the documentation. Here's the template with the single line change that allows the model to run:

template_jl = TemplateExpressionSpec(
    function_symbols=["f", "g"],
    combine="""((; f, g), (a, b, c, d, e)) -> let
    _f = f(a, d, e) 
    _g = g(c)
    _out = _f + _g
    end""",
    num_features=jl.seval("""Base.Dict(Symbol("f") => 3, Symbol("g") => 1)"""),
)

The ExpressionSpec feature is a great addition, but it has some drawbacks which are bypassed by using a custom objective instead, though not without writing a bit of Julia.

The expression output when using the templates is a tad confusing at first (outputting the numbers corresponding to variables in an expression, and restarting the numbering in the other expression) and it would be good to note that in the relevant section of the documentation.

Version

1.3.1

Operating System

Linux

Package Manager

pip

Interface

Jupyter Notebook

Relevant log output

JuliaError: TypeError: in typeassert, expected Symbol, got a value of type String
Stacktrace:
 [1] merge(a::@NamedTuple{}, itr::PyDict{Any, Any})
   @ Base ./namedtuple.jl:372
 [2] _pysr_create_template_structure(function_symbols::AbstractVector, combine::Function, num_features::Union{Nothing, AbstractDict})
   @ Main ./none:10
 [3] pyjlany_call(self::typeof(_pysr_create_template_structure), args_::Py, kwargs_::Py)
   @ PythonCall.JlWrap ~/.julia/packages/PythonCall/Nr75f/src/JlWrap/any.jl:43
 [4] _pyjl_callmethod(f::Any, self_::Ptr{PythonCall.C.PyObject}, args_::Ptr{PythonCall.C.PyObject}, nargs::Int64)
   @ PythonCall.JlWrap ~/.julia/packages/PythonCall/Nr75f/src/JlWrap/base.jl:73
 [5] _pyjl_callmethod(o::Ptr{PythonCall.C.PyObject}, args::Ptr{PythonCall.C.PyObject})
   @ PythonCall.JlWrap.Cjl ~/.julia/packages/PythonCall/Nr75f/src/JlWrap/C.jl:63

Extra Info

No response

@jc-umana-FI jc-umana-FI added the bug Something isn't working label Jan 20, 2025
@MilesCranmer
Copy link
Owner

Thanks for the report! Yes this looks like a bug.

I’m the meantime note that you don’t need to provide num_features; it gets inferred by Julia. Only if that inference fails is it needed.

@jc-umana-FI
Copy link
Author

Thanks for the response!

I'm actually wondering if the inference does really even take. Between this method and using a custom objective, the hall of fame still returns expressions that don't meet the "variable necessity" constraints of the forms I try to impose.

For example, in the above model, what I'd want is for the hall of fame to only return expressions that contain some combination of variables a, d, and e in the f part of the function, no less than those three variables. The reason I tried to use num_features is because my hall of fame was still producing constant expressions or expressions containing 1 or 2 variables out of the necessary 3. Granted, the chosen "best" expression does meet all the criteria that I set, but I'd rather the model only output expressions that are actually comparable and meet the criteria for number of variables.

Do you have a proposed fix for this? Essentially, I've been trying to change what is presented in the hall of fame to relevant expressions only, but also constraining the search space further could accomplish the same thing.

@MilesCranmer
Copy link
Owner

Oh if the inference doesn’t work, there will be an error. So it must already be getting it correctly.

If it never uses all 3 features, maybe it just doesn’t need to? Like maybe they are heavily correlated or something? The model is incentivised to favour simplicity and accuracy and nothing else, so it won’t try to add more features than are actually needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants