Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update random_seed behavior in config to be less confusing. #1309

Merged
merged 8 commits into from
Sep 11, 2024
10 changes: 10 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,20 @@ Dependency Changes
API Changes
-----------

- Changed the behavior of random_seed. (See below) For most use cases, this is essentially a bug
fix, but if users were relying on the old behavior, you may need to change your config file to
work with the new behavior. See `Image Field Attributes` for more details about the new
behavior. (#1309)


Config Updates
--------------

- Changed the behavior of random_seed to be less confusing. Now the first random_seed is always
converted into a sequence based on obj_num, and later ones (if any) in a list are not.
If you want a non-standard seed sequence, you should now put it in a list somewhere after
the first item. The first item will always evaluate as an integer value and create a sequence
based on that indexed by obj_num. (#1309)


New Features
Expand Down
12 changes: 3 additions & 9 deletions docs/config_image.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,10 @@ Some attributes that are allowed for all image types are:
* ``random_seed`` = *int_value* or *list* (optional) The random seed to use for random numbers in the simulation.

* The typical use case is that this is a simple integer value. Then, internally we scramble this value in a deterministic way and use a sequence of seeds based on the scrambled value so that the output is deterministic even when using multiple processes to build each image.
* If ``random_seed`` is a string, then it is parsed as a integer and treated in the same way as above. This allows you to do a simple calculation using the ``$`` Eval shorthand to e.g. compute the initial seed from some other kind of ID number you might have in the config file. It also means you don't have to be careful in your YAML writing to not accidentally write e.g. ``'1234'`` rather than ``1234``.
* If ``random_seed`` is a dict, it should parse as an *int_value*, but in this case, GalSim will not convert it into a sequence based on a single value. Rather, it will respect your specification and evaluate the seed for each object as you specifiy. This let's you do advanced things like use a different cadence for your random numbers (e.g. potentially to repeat the same seed for multiple objects; cf. `Demo 7`).
* If ``random_seed`` is a list, then multiple random number generators will be available for each object according to the multiple seed specifications. This is normally used to have one random number behave normally for noise and such, and another one repeat with some cadence (e.g. repeat for each image in an exposure to make sure you generate the same PSFs for multiple CCDs in an exposure). Whenever you want to use an rng other than the first one, add ``rng_num`` to the field and set it to the number of the rng you want to use in this list. Each item in the list is parsed as above.
* If ``random_seed`` is a list, then the first one will be converted as described above, but the later ones will not. Rather, it will respect your specification and evaluate the seed for each object as you specifiy. This will create multiple random number generators, according to the multiple seed specifications. This is normally used to have one random number behave normally for noise and such, and another one (or more) repeat with some other cadence (e.g. repeat for each image in an exposure to make sure you generate the same PSFs for multiple CCDs in an exposure). See `Demo 7` and `Demo 13` for examples of this. Whenever you want to use an rng other than the first one, add ``rng_num`` to the field and set it to the number of the rng you want to use in this list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specify that rng_num starts at 0.


.. warning::

Within this ``random_seed`` list, any items that are either an int or str will be
evaluated once and converted into a sequence in the normal way. So if you want an item
to be some complex Eval operation, you should make it a dict with an explicit
``type: Eval``, rather than use the ``$`` shorthand notation.
.. note::
We use 0-based indexing for ``rng_num``, so the first item in the ``random_seed`` list is ``rng_num=0``, the second one is ``rng_num=1``, etc.

* The default behavior, if ``random_seed`` is not given, is to get a seed from the system (/dev/urandom if possible, otherwise based on the time).

Expand Down
33 changes: 17 additions & 16 deletions examples/demo7.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ def main(argv):

file_name = os.path.join('output','cube_phot.fits.gz')

random_seed = 553728
gal_random_seed = 553728
noise_random_seed = galsim.BaseDeviate(553728).raw()
sky_level = 1.e4 # ADU / arcsec^2
pixel_scale = 0.28 # arcsec
nx = 64
Expand Down Expand Up @@ -222,20 +223,25 @@ def main(argv):
logger.debug(' Start work on image %d',i)
t1 = time.time()

# Initialize the random number generator we will be using.
rng = galsim.UniformDeviate(random_seed+k+1)
# Initialize the random number generators we will be using.
# We have one for the galaxy parameters, and then a different one
# for the noise in each of the two stamps.
# This ensures that the results of this script match those of the config run.
gal_rng = galsim.UniformDeviate(gal_random_seed+k+1)
fft_noise_rng = galsim.UniformDeviate(noise_random_seed+2*k+1)
phot_noise_rng = galsim.UniformDeviate(noise_random_seed+2*k+2)

# Generate random variates:
flux = rng() * (gal_flux_max-gal_flux_min) + gal_flux_min
flux = gal_rng() * (gal_flux_max-gal_flux_min) + gal_flux_min

# Use a new variable name, since we'll want to keep the original unmodified.
this_gal = gal.withFlux(flux)

hlr = rng() * (gal_hlr_max-gal_hlr_min) + gal_hlr_min
hlr = gal_rng() * (gal_hlr_max-gal_hlr_min) + gal_hlr_min
this_gal = this_gal.dilate(hlr)

beta_ellip = rng() * 2*math.pi * galsim.radians
ellip = rng() * (gal_e_max-gal_e_min) + gal_e_min
beta_ellip = gal_rng() * 2*math.pi * galsim.radians
ellip = gal_rng() * (gal_e_max-gal_e_min) + gal_e_min
gal_shape = galsim.Shear(e=ellip, beta=beta_ellip)
this_gal = this_gal.shear(gal_shape)

Expand Down Expand Up @@ -273,27 +279,22 @@ def main(argv):

# Add Poisson noise
sky_level_pixel = sky_level * pixel_scale**2
fft_image.addNoise(galsim.PoissonNoise(rng, sky_level=sky_level_pixel))
fft_image.addNoise(galsim.PoissonNoise(fft_noise_rng, sky_level=sky_level_pixel))

t4 = time.time()

# The next two lines are just to get the output from this demo script
# to match the output from the parsing of demo7.yaml.
rng = galsim.UniformDeviate(random_seed+k+1)
rng(); rng(); rng(); rng();

# Repeat for photon shooting image.
# The max_extra_noise parameter indicates how much extra noise per pixel we are
# willing to tolerate. The sky noise will be adding a variance of sky_level_pixel,
# so we allow up to 1% of that extra.
final.drawImage(phot_image, method='phot', max_extra_noise=sky_level_pixel/100,
rng=rng)
rng=phot_noise_rng)
t5 = time.time()

# For photon shooting, galaxy already has Poisson noise, so we want to make
# For photon shooting, the galaxy already has Poisson noise, so we want to make
# sure not to add that noise again! Thus, we just add sky noise, which
# is Poisson with the mean = sky_level_pixel
pd = galsim.PoissonDeviate(rng, mean=sky_level_pixel)
pd = galsim.PoissonDeviate(phot_noise_rng, mean=sky_level_pixel)
# DeviateNoise just adds the action of the given deviate to every pixel.
phot_image.addNoise(galsim.DeviateNoise(pd))
# For PoissonDeviate, the mean is not zero, so for a background-subtracted
Expand Down
33 changes: 26 additions & 7 deletions examples/demo7.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@
# - stamp : draw_method (fft or phot)
# - stamp : gsparams
# - output : file_name with .gz, .bz2 or .fz extension automatically uses compression.
# - Multiple random seeds (particular so one can repeat the same values for multiple images)
# - rng_num specification in various fields


# Define the PSF profiles
Expand Down Expand Up @@ -135,6 +137,13 @@ gal :
e : { type : Random , min : 0.0 , max : 0.8 }
beta : { type : Random }

# This line tells GalSim to use the image_num-indexed rng for all the random parameters
# in this field of the config. Specifying rng_num anywhere in the config dict will apply
# to all items in that field at that level or lower, unless overridden by another rng_num
# specification at a lower level. (See image.random_seed below for how we set the seed
# for this rng.)
rng_num: 1


# Define some other information about the images
image :
Expand All @@ -158,13 +167,23 @@ image :
noise :
sky_level : 1.e4 # ADU / arcsec^2

# This is pretty contrived. It is really only relevant for this particular demo.
# The usual case is to just set a single number, which really means that each object
# gets a sequential value starting with this number. In this case we want to use the
# same value for both the fft and phot images to get the same values for the random
# parameters. To implement this, we tell it to use the image_num for the sequencing key
# rather than the usual obj_num.
random_seed : { type : Sequence , first : 553728 , index_key : 'image_num' }
# The usual way we set the random_seed is to just set a single number, which really means that
# each object gets a sequential value starting with something based on this number.
# This is fine for the noise, but it wouldn't work for the galaxy parameters, which we want
# to be the same for both fft and photon-shooting images.
# To get an rng that gives the same values for both, we need a second one indexed by
# image_num, rather than obj_num. This will repeat the seed values for the two galaxies
# in each image, which we want to have identical properties.
# Note: Starting in verson 2.6, GalSim always converts the first random_seed to a sequence
# indexed by obj_num, so each galaxy gets a new rng. Any more complicated seed specifications
# need to be done in rng_num=1 or higher (i.e. any but the first item in the list).
random_seed :
# rng_num=0 is implicitly converted to a sequence indexed by obj_num. Used for noise here.
- 553728

# rng_num=1 is the one we use for the galaxy parameters so they are the same for
# both fft and photon shooting images. We set rng_num=1 in the galaxy field above.
- { type : Sequence , first : 553728 , index_key : 'image_num' }


# Define some image properties that are specific to the stamp generation phase:
Expand Down
44 changes: 31 additions & 13 deletions galsim/config/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@

from ..utilities import SimpleGenerator, single_threaded
from ..random import BaseDeviate
from ..errors import GalSimConfigError, GalSimConfigValueError, GalSimError
from ..errors import GalSimConfigError, GalSimConfigValueError, GalSimError, galsim_warn

max_queue_size = 32767 # This is where multiprocessing.Queue starts to have trouble.
# We make it a settable parameter here really for unit testing.
Expand Down Expand Up @@ -392,18 +392,36 @@ def ParseRandomSeed(config, param_name, base, seed_offset):
# start a sequential sequence of seeds for each object.
# However, we allow for random_seed to be a gettable parameter, so for the
# normal case, we just convert it into a Sequence.
if isinstance(config[param_name], (int, str)):
first = ParseValue(config, param_name, base, int)[0]
seed_rng = BaseDeviate(first)
new_first = seed_rng.raw()
# Note: This new "first" seed is actually the seed value to use for anything at file or
# image scope using the obj_num of the first object in the file or image.
# Seeds for objects will start at 1 more than this.
config[param_name] = {
'type' : 'Sequence',
'index_key' : 'obj_num',
'first' : new_first
}
# We only do this for the first item in the list though. Any items after the first
# are left as they are, so the user can do something more complicated if they want.
if param_name in ['random_seed', 0] and not (
isinstance(config[param_name], dict) and config[param_name].get('default', False)):
if (isinstance(config[param_name], dict)
and config[param_name].get('type',None) == 'Sequence'):
# Not really a deprecation, but similar.
# Warn users of the change in behavior.
galsim_warn('You have a custom Sequence as the (first) random_seed. '
'As of GalSim version 2.6, this will be parsed as an integer '
'and converted to a new sequence indexed by obj_num, '
'rather than whatever sequence you have specified. '
'You probably want to put your custom random_seed sequence '
'as the second item in the random_seed list and use rng_num=1.')

first = ParseValue(config, param_name, base, int)[0]
seed_rng = BaseDeviate(first)
new_first = seed_rng.raw()
# Note: This new "first" seed is actually the seed value to use for anything at file or
# image scope using the obj_num of the first object in the file or image.
# Seeds for objects will start at 1 more than this.
config[param_name] = {
'type' : 'Sequence',
'index_key' : 'obj_num',
'first' : new_first,
# This is out indicator that the dict has been converted from an initial
# integer value to generate the sequence. After the first time, we don't
# want to redo this step.
'default' : True
}

index_key = base['index_key']
if index_key == 'obj_num':
Expand Down
12 changes: 11 additions & 1 deletion tests/test_config_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -2332,7 +2332,7 @@ def test_multirng():
np.testing.assert_array_equal(im.array, images2[n].array)
np.testing.assert_array_equal(im.array, images3[n].array)

# Finally, test invalid rng_num
# Test invalid rng_num
config4 = galsim.config.CopyConfig(config)
config4['image']['world_pos']['rng_num'] = -1
with assert_raises(galsim.GalSimConfigError):
Expand All @@ -2354,6 +2354,16 @@ def test_multirng():
with assert_raises(galsim.GalSimConfigError):
galsim.config.BuildImage(config7)

# Check that a warning is given if the user uses a Sequence for the first random seed.
config8 = galsim.config.CopyConfig(config)
config8['image']['random_seed'] = {
'type': 'Sequence',
'repeat': 3
}
with assert_warns(galsim.GalSimWarning):
galsim.config.BuildImage(config8)


@timer
def test_sequential_seeds():
"""Test using sequential seeds for successive images.
Expand Down
32 changes: 17 additions & 15 deletions tests/test_config_output.py
Original file line number Diff line number Diff line change
Expand Up @@ -1459,7 +1459,7 @@ def test_eval_full_word():
# is basically the same.
'random_seed' : [
# Used for noise and nobjects.
{ 'type' : 'Sequence', 'index_key' : 'obj_num', 'first' : 1234 },
12345,
# Used for objects. Repeats sequence for each image in file
{
'type' : 'Eval',
Expand Down Expand Up @@ -1561,12 +1561,14 @@ def test_eval_full_word():
data0 = np.genfromtxt('output/test_eval_full_word_0.dat', names=True, deletechars='')
data1 = np.genfromtxt('output/test_eval_full_word_1.dat', names=True, deletechars='')

assert len(data0) == 18 # 9 obj each for first two exposures
assert len(data1) == 24 # 12 obj each for next two exposures
data00 = data0[:9]
data01 = data0[9:]
data10 = data1[:12]
data11 = data1[12:]
n1 = 11 # 11 obj each for first two exposures
n2 = 14 # 14 obj each for next two exposures
assert len(data0) == 2*n1
assert len(data1) == 2*n2
data00 = data0[:n1]
data01 = data0[n1:]
data10 = data1[:n2]
data11 = data1[n2:]

# Check exposure = image_num
np.testing.assert_array_equal(data00['exposure'], 0)
Expand All @@ -1575,21 +1577,21 @@ def test_eval_full_word():
np.testing.assert_array_equal(data11['exposure'], 3)

# Check obj_num
np.testing.assert_array_equal(data00['num'], range(0,9))
np.testing.assert_array_equal(data01['num'], range(9,18))
np.testing.assert_array_equal(data10['num'], range(18,30))
np.testing.assert_array_equal(data11['num'], range(30,42))
np.testing.assert_array_equal(data00['num'], range(0,n1))
np.testing.assert_array_equal(data01['num'], range(n1,2*n1))
np.testing.assert_array_equal(data10['num'], range(2*n1,2*n1+n2))
np.testing.assert_array_equal(data11['num'], range(2*n1+n2,2*n1+2*n2))

# Check that galaxy properties are identical within exposures, but different across exposures
for key in ['pos.x', 'pos.y', 'ra.rad', 'dec.rad', 'flux', 'size']:
np.testing.assert_array_equal(data00[key], data01[key])
np.testing.assert_array_equal(data10[key], data11[key])
assert np.all(np.not_equal(data00[key], data10[key][:9]))
assert np.all(np.not_equal(data00[key], data10[key][:n1]))

# PSFs should all be different, but only in the mean
assert np.all(np.not_equal(data00['psf_fwhm'], data01['psf_fwhm']))
assert np.all(np.not_equal(data10['psf_fwhm'], data11['psf_fwhm']))
assert np.all(np.not_equal(data00['psf_fwhm'], data10['psf_fwhm'][:9]))
assert np.all(np.not_equal(data00['psf_fwhm'], data10['psf_fwhm'][:n1]))
np.testing.assert_array_almost_equal(data00['psf_fwhm'] - np.mean(data00['psf_fwhm']),
data01['psf_fwhm'] - np.mean(data01['psf_fwhm']))
np.testing.assert_array_almost_equal(data10['psf_fwhm'] - np.mean(data10['psf_fwhm']),
Expand All @@ -1599,13 +1601,13 @@ def test_eval_full_word():
# be in the Gaussian noise.
im00, im01 = galsim.fits.readMulti('output/test_eval_full_word_0.fits')
assert np.all(np.not_equal(im00.array, im01.array))
assert abs(np.mean(im00.array - im01.array)) < 0.1
assert abs(np.mean(im00.array - im01.array)) < 0.5
assert 13.5 < np.std(im00.array - im01.array) < 15 # should be ~10 * sqrt(2)
assert np.max(np.abs(im00.array)) > 200 # Just verify that many values are quite large

im10, im11 = galsim.fits.readMulti('output/test_eval_full_word_1.fits')
assert np.all(np.not_equal(im10.array, im11.array))
assert abs(np.mean(im10.array - im11.array)) < 0.1
assert abs(np.mean(im10.array - im11.array)) < 0.5
assert 13.5 < np.std(im10.array - im11.array) < 15
assert np.max(np.abs(im10.array)) > 200

Expand Down
Loading