Skip to content

Commit

Permalink
Merge tag 'v0.16.4' into store-options
Browse files Browse the repository at this point in the history
  • Loading branch information
MilesCranmer committed Dec 14, 2023
2 parents 6c92e1c + d39c0a6 commit 208307d
Show file tree
Hide file tree
Showing 10 changed files with 109 additions and 56 deletions.
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
repos:
# General linting
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
# General formatting
- repo: https://github.com/psf/black
rev: 23.3.0
rev: 23.11.0
hooks:
- id: black
- id: black-jupyter
Expand All @@ -20,12 +20,12 @@ repos:
- id: nbstripout
# Unused imports
- repo: https://github.com/hadialqattan/pycln
rev: "v2.2.2"
rev: "v2.4.0"
hooks:
- id: pycln
# Sorted imports
- repo: https://github.com/PyCQA/isort
rev: "5.12.0"
rev: "5.13.0"
hooks:
- id: isort
additional_dependencies: [toml]
Binary file added docs/images/Planar_relation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/interactive-docs.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Interactive Reference
# Interactive Reference

<!-- Display content from `astroautomata.com/pysr_interactive` -->

Expand Down
111 changes: 71 additions & 40 deletions docs/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,60 @@

## Pre-defined

All Base julia operators that take 1 or 2 float32 as input,
and output a float32 as output, are available. A selection
of these and other valid operators are stated below.
First, note that pretty much any valid Julia function which
takes one or two scalars as input, and returns on scalar as output,
is likely to be a valid operator[^1].
A selection of these and other valid operators are stated below.

**Binary**

`+`, `-`, `*`, `/`, `^`, `greater`, `mod`, `logical_or`,
`logical_and`
- `+`
- `-`
- `*`
- `/`
- `^`
- `cond`
- Equal to `(x, y) -> x > 0 ? y : 0`
- `greater`
- Equal to `(x, y) -> x > y ? 1 : 0`
- `logical_or`
- Equal to `(x, y) -> (x > 0 || y > 0) ? 1 : 0`
- `logical_and`
- Equal to `(x, y) -> (x > 0 && y > 0) ? 1 : 0`
- `mod`

**Unary**

`neg`,
`square`,
`cube`,
`exp`,
`abs`,
`log`,
`log10`,
`log2`,
`log1p`,
`sqrt`,
`sin`,
`cos`,
`tan`,
`sinh`,
`cosh`,
`tanh`,
`atan`,
`asinh`,
`acosh`,
`atanh_clip` (=atanh((x+1)%2 - 1)),
`erf`,
`erfc`,
`gamma`,
`relu`,
`round`,
`floor`,
`ceil`,
`round`,
`sign`.
- `neg`
- `square`
- `cube`
- `exp`
- `abs`
- `log`
- `log10`
- `log2`
- `log1p`
- `sqrt`
- `sin`
- `cos`
- `tan`
- `sinh`
- `cosh`
- `tanh`
- `atan`
- `asinh`
- `acosh`
- `atanh_clip`
- Equal to `atanh(mod(x + 1, 2) - 1)`
- `erf`
- `erfc`
- `gamma`
- `relu`
- `round`
- `floor`
- `ceil`
- `round`
- `sign`

## Custom

Expand All @@ -52,15 +66,32 @@ you can define with by passing it to the `pysr` function, with, e.g.,
PySRRegressor(
...,
unary_operators=["myfunction(x) = x^2"],
binary_operators=["myotherfunction(x, y) = x^2*y"]
binary_operators=["myotherfunction(x, y) = x^2*y"],
extra_sympy_mappings={
"myfunction": lambda x: x**2,
"myotherfunction": lambda x, y: x**2 * y,
},
)
```


Make sure that it works with
`Float32` as a datatype. That means you need to write `1.5f3`
instead of `1.5e3`, if you write any constant numbers.
`Float32` as a datatype (for default precision, or `Float64` if you set `precision=64`). That means you need to write `1.5f3`
instead of `1.5e3`, if you write any constant numbers, or simply convert a result to `Float64(...)`.

Your operator should work with the entire real line (you can use
abs(x) for operators requiring positive input - see `log_abs`); otherwise
the search code will experience domain errors.
PySR expects that operators not throw an error for any input value over the entire real line from `-3.4e38` to `+3.4e38`.
Thus, for invalid inputs, such as negative numbers to a `sqrt` function, you may simply return a `NaN` of the same type as the input. For example,

```julia
my_sqrt(x) = x >= 0 ? sqrt(x) : convert(typeof(x), NaN)
```

would be a valid operator. The genetic algorithm
will preferentially selection expressions which avoid
any invalid values over the training dataset.


<!-- Footnote for 1: -->
<!-- (Will say "However, you may need to define a `extra_sympy_mapping`":) -->

[^1]: However, you will need to define a sympy equivalent in `extra_sympy_mapping` if you want to use a function not in the above list.
10 changes: 10 additions & 0 deletions docs/papers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@
# information to generate the "Research Showcase"

papers:
- title: Discovery of a Planar Black Hole Mass Scaling Relation for Spiral Galaxies
authors:
- Benjamin L. Davis (1)
- Zehao Jin (1)
affiliations:
1: Center for Astrophysics and Space Science, New York University Abu Dhabi
link: https://arxiv.org/abs/2309.08986
abstract: Supermassive black holes (SMBHs) are tiny in comparison to the galaxies they inhabit, yet they manage to influence and coevolve along with their hosts. Evidence of this mutual development is observed in the structure and dynamics of galaxies and their correlations with black hole mass ($M_\bullet$). For our study, we focus on relative parameters that are unique to only disk galaxies. As such, we quantify the structure of spiral galaxies via their logarithmic spiral-arm pitch angles ($\phi$) and their dynamics through the maximum rotational velocities of their galactic disks ($v_\mathrm{max}$). In the past, we have studied black hole mass scaling relations between $M_\bullet$ and $\phi$ or $v_\mathrm{max}$, separately. Now, we combine the three parameters into a trivariate $M_\bullet$--$\phi$--$v_\mathrm{max}$ relationship that yields best-in-class accuracy in prediction of black hole masses in spiral galaxies. Because most black hole mass scaling relations have been created from samples of the largest SMBHs within the most massive galaxies, they lack certainty when extrapolated to low-mass spiral galaxies. Thus, it is difficult to confidently use existing scaling relations when trying to identify galaxies that might harbor the elusive class of intermediate-mass black holes (IMBHs). Therefore, we offer our novel relationship as an ideal predictor to search for IMBHs and probe the low-mass end of the black hole mass function by utilizing spiral galaxies. Already with rotational velocities widely available for a large population of galaxies and pitch angles readily measurable from uncalibrated images, we expect that the $M_\bullet$--$\phi$--$v_\mathrm{max}$ fundamental plane will be a useful tool for estimating black hole masses, even at high redshifts.
image: Planar_relation.png
date: 2023-10-03
- title: Interpretable machine learning methods applied to jet background subtraction in heavy-ion collisions
authors:
- Tanner Mengel (1)
Expand Down
9 changes: 5 additions & 4 deletions docs/tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,16 @@ I run from IPython (Jupyter Notebooks don't work as well[^1]) on the head node o

1. Use the default parameters.
2. Use only the operators I think it needs and no more.
3. Set `niterations` to some very large value, so it just runs for a week until my job finishes. If the equation looks good, I quit the job early.
4. Increase `populations` to `3*num_cores`.
5. Set `ncyclesperiteration` to maybe `5000` or so, until the head node occupation is under `10%`.
3. Increase `populations` to `3*num_cores`.
4. If my dataset is more than 1000 points, I either subsample it (low-dimensional and not much noise) or set `batching=True` (high-dimensional or very noisy, so it needs to evaluate on all the data).
5. While on a laptop or single node machine, you might leave the default `ncyclesperiteration`, on a cluster with ~100 cores I like to set `ncyclesperiteration` to maybe `5000` or so, until the head node occupation is under `10%`. (A larger value means the workers talk less frequently to eachother, which is useful when you have many workers!)
6. Set `constraints` and `nested_constraints` as strict as possible. These can help quite a bit with exploration. Typically, if I am using `pow`, I would set `constraints={"pow": (9, 1)}`, so that power laws can only have a variable or constant as their exponent. If I am using `sin` and `cos`, I also like to set `nested_constraints={"sin": {"sin": 0, "cos": 0}, "cos": {"sin": 0, "cos": 0}}`, so that sin and cos can't be nested, which seems to happen frequently. (Although in practice I would just use `sin`, since the search could always add a phase offset!)
7. Set `maxsize` a bit larger than the final size you want. e.g., if you want a final equation of size `30`, you might set this to `35`, so that it has a bit of room to explore.
8. Set `maxdepth` strictly, but leave a bit of room for exploration. e.g., if you want a final equation limited to a depth of `5`, you might set this to `6` or `7`, so that it has a bit of room to explore.
8. I typically don't use `maxdepth`, but if I do, I set it strictly, while also leaving a bit of room for exploration. e.g., if you want a final equation limited to a depth of `5`, you might set this to `6` or `7`, so that it has a bit of room to explore.
9. Set `parsimony` equal to about the minimum loss you would expect, divided by 5-10. e.g., if you expect the final equation to have a loss of `0.001`, you might set `parsimony=0.0001`.
10. Set `weight_optimize` to some larger value, maybe `0.001`. This is very important if `ncyclesperiteration` is large, so that optimization happens more frequently.
11. Set `turbo` to `True`. This may or not work, if there's an error just turn it off (some operators are not SIMD-capable). If it does work, it should give you a nice 20% speedup.
12. For final runs, after I have tuned everything, I typically set `niterations` to some very large value, and just let it run for a week until my job finishes (genetic algorithms tend not to converge, they can look like they settle down, but then find a new family of expression, and explore a new space). If I am satisfied with the current equations (which are visible either in the terminal or in the saved csv file), I quit the job early.

Since I am running in IPython, I can just hit `q` and then `<enter>` to stop the job, tweak the hyperparameters, and then start the search again.
I can also use `warm_start=True` if I wish to continue where I left off (though note that changing some parameters, like `maxsize`, are incompatible with warm starts).
Expand Down
7 changes: 3 additions & 4 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,11 @@ theme:

nav:
- index.md
- options.md
- examples.md
- operators.md
- tuning.md
- Examples:
- examples.md
- papers.md
- options.md
- papers.md
- Reference:
- api.md
- api-advanced.md
Expand Down
4 changes: 4 additions & 0 deletions pysr/export_sympy.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@
"ceil": sympy.ceiling,
"sign": sympy.sign,
"gamma": sympy.gamma,
"max": lambda x, y: sympy.Piecewise((y, x < y), (x, True)),
"min": lambda x, y: sympy.Piecewise((x, x < y), (y, True)),
"round": lambda x: sympy.ceiling(x - 0.5),
"cond": lambda x, y: sympy.Heaviside(x, H0=0) * y,
}


Expand Down
10 changes: 9 additions & 1 deletion pysr/julia_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,15 @@ def install(julia_project=None, quiet=False, precompile=None): # pragma: no cov
],
)
# Try installing again:
julia.install(quiet=quiet)
try:
julia.install(quiet=quiet)
except julia.tools.PyCallInstallError:
warnings.warn(
"PyCall.jl failed to install on second attempt. "
+ "Please consult the GitHub issue "
+ "https://github.com/MilesCranmer/PySR/issues/257 "
+ "for advice on fixing this."
)

Main, init_log = init_julia(julia_project, quiet=quiet, return_aux=True)
io_arg = _get_io_arg(quiet)
Expand Down
4 changes: 2 additions & 2 deletions pysr/version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = "0.16.3"
__symbolic_regression_jl_version__ = "0.22.4"
__version__ = "0.16.4"
__symbolic_regression_jl_version__ = "0.22.5"

0 comments on commit 208307d

Please sign in to comment.