[BUG]: `num_features` for TemplateExpressionSpec doesn't work as written in API Ref #811

jc-umana-FI · 2025-01-20T06:34:35Z

What happened?

Heyo PySR team! The num_features parameter in TemplateExpressionSpec is described in the API Reference as taking a Python dictionary as input, but doesn't behave that way, yielding a TypeError as shown below:

The Code & The Error

Code yielding TypeError:

# Create data
X = np.random.randn(1000, 5)
y = np.sin(X[:, 0] + X[:, 3] + X[:, 4]) * X[:, 2]**2

# Define template: we want f(x1, x4, x5) + g(x3)
# We'll obtain independent expressions for f and g 
# where the respective variables of each are expressed
# as #1 for the first variable, #2 for the second, 
# and so on.
# While the output can be evaluated, it is NOT compatible with
# the exporting features for typesetting and modeling in other libraries.
# 
template_py = TemplateExpressionSpec(
    function_symbols=["f", "g"],
    combine="""((; f, g), (a, b, c, d, e)) -> let
    _f = f(a, d, e) 
    _g = g(c)
    _out = _f + _g
    end""",
    num_features={"f": 2, "g": 1},
)

model = PySRRegressor(
    expression_spec=template_py,
    binary_operators=["+", "*", "-", "/"],
    unary_operators=["sin"],
    model_selection="score",
    maxsize=25,
    tournament_selection_n=15,
    tournament_selection_p=0.9,
    niterations=50,
    populations=100,
    population_size=80,
    should_simplify=True,
    topn=10,
    # annealing=True,
    # alpha=2,
    #full_objective=jl.seval(objective),
)
model.fit(X, y)

ERROR:
JuliaError: TypeError: in typeassert, expected Symbol, got a value of type String

The Fix

This error is fixed if you rewrite the dictionary in Julia rather than Python, but I haven't managed to get it to pass a Python dict as described in the documentation. Here's the template with the single line change that allows the model to run:

template_jl = TemplateExpressionSpec(
    function_symbols=["f", "g"],
    combine="""((; f, g), (a, b, c, d, e)) -> let
    _f = f(a, d, e) 
    _g = g(c)
    _out = _f + _g
    end""",
    num_features=jl.seval("""Base.Dict(Symbol("f") => 3, Symbol("g") => 1)"""),
)

The ExpressionSpec feature is a great addition, but it has some drawbacks which are bypassed by using a custom objective instead, though not without writing a bit of Julia.

The expression output when using the templates is a tad confusing at first (outputting the numbers corresponding to variables in an expression, and restarting the numbering in the other expression) and it would be good to note that in the relevant section of the documentation.

Version

1.3.1

Operating System

Linux

Package Manager

pip

Interface

Jupyter Notebook

Relevant log output

JuliaError: TypeError: in typeassert, expected Symbol, got a value of type String
Stacktrace:
 [1] merge(a::@NamedTuple{}, itr::PyDict{Any, Any})
   @ Base ./namedtuple.jl:372
 [2] _pysr_create_template_structure(function_symbols::AbstractVector, combine::Function, num_features::Union{Nothing, AbstractDict})
   @ Main ./none:10
 [3] pyjlany_call(self::typeof(_pysr_create_template_structure), args_::Py, kwargs_::Py)
   @ PythonCall.JlWrap ~/.julia/packages/PythonCall/Nr75f/src/JlWrap/any.jl:43
 [4] _pyjl_callmethod(f::Any, self_::Ptr{PythonCall.C.PyObject}, args_::Ptr{PythonCall.C.PyObject}, nargs::Int64)
   @ PythonCall.JlWrap ~/.julia/packages/PythonCall/Nr75f/src/JlWrap/base.jl:73
 [5] _pyjl_callmethod(o::Ptr{PythonCall.C.PyObject}, args::Ptr{PythonCall.C.PyObject})
   @ PythonCall.JlWrap.Cjl ~/.julia/packages/PythonCall/Nr75f/src/JlWrap/C.jl:63

Extra Info

No response

The text was updated successfully, but these errors were encountered:

MilesCranmer · 2025-01-20T07:18:42Z

Thanks for the report! Yes this looks like a bug.

I’m the meantime note that you don’t need to provide num_features; it gets inferred by Julia. Only if that inference fails is it needed.

jc-umana-FI · 2025-01-20T07:40:24Z

Thanks for the response!

I'm actually wondering if the inference does really even take. Between this method and using a custom objective, the hall of fame still returns expressions that don't meet the "variable necessity" constraints of the forms I try to impose.

For example, in the above model, what I'd want is for the hall of fame to only return expressions that contain some combination of variables a, d, and e in the f part of the function, no less than those three variables. The reason I tried to use num_features is because my hall of fame was still producing constant expressions or expressions containing 1 or 2 variables out of the necessary 3. Granted, the chosen "best" expression does meet all the criteria that I set, but I'd rather the model only output expressions that are actually comparable and meet the criteria for number of variables.

Do you have a proposed fix for this? Essentially, I've been trying to change what is presented in the hall of fame to relevant expressions only, but also constraining the search space further could accomplish the same thing.

MilesCranmer · 2025-01-20T08:18:16Z

Oh if the inference doesn’t work, there will be an error. So it must already be getting it correctly.

If it never uses all 3 features, maybe it just doesn’t need to? Like maybe they are heavily correlated or something? The model is incentivised to favour simplicity and accuracy and nothing else, so it won’t try to add more features than are actually needed.

jc-umana-FI added the bug Something isn't working label Jan 20, 2025

jc-umana-FI assigned MilesCranmer Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: `num_features` for TemplateExpressionSpec doesn't work as written in API Ref #811

[BUG]: `num_features` for TemplateExpressionSpec doesn't work as written in API Ref #811

jc-umana-FI commented Jan 20, 2025

MilesCranmer commented Jan 20, 2025

jc-umana-FI commented Jan 20, 2025

MilesCranmer commented Jan 20, 2025

[BUG]: num_features for TemplateExpressionSpec doesn't work as written in API Ref #811

[BUG]: num_features for TemplateExpressionSpec doesn't work as written in API Ref #811

Comments

jc-umana-FI commented Jan 20, 2025

What happened?

The Code & The Error

The Fix

Version

Operating System

Package Manager

Interface

Relevant log output

Extra Info

MilesCranmer commented Jan 20, 2025

jc-umana-FI commented Jan 20, 2025

MilesCranmer commented Jan 20, 2025

[BUG]: `num_features` for TemplateExpressionSpec doesn't work as written in API Ref #811

[BUG]: `num_features` for TemplateExpressionSpec doesn't work as written in API Ref #811