Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
r_prag01 committed May 21, 2023
2 parents 2dfd4a0 + 530f2d4 commit f43397b
Show file tree
Hide file tree
Showing 31 changed files with 2,171 additions and 65 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/gen-test-rsc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Generate Linux test resources

on:
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8"]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -r requirements.txt -r requirements_test.txt
python tests/generate_resources.py
- name: Generate Test Data
uses: actions/upload-artifact@v3
with:
name: test_data
path: tests/resources/
2 changes: 2 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
API
=====
This section contains the technical documentation and docstrings of all public functions of `pflacco` and is organized by the respective modules.

.. toctree::
:maxdepth: 2

Expand Down
130 changes: 130 additions & 0 deletions docs/source/cell_mapping.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
Cell Mapping Features
=====================
The idea of cell mapping is that a continuous search space is partitioned in every dimension and thus achieving a discretization of the original search space into cells.
This discretization of the original sample into cells allows the computation of features, which help to characterize the global structure or multimodality of an optimization problem.
Based on this approach, three different feature sets can be computed: angle (``calculate_cm_angle``), convexity (``calculate_cm_conv``) and gradient homogeneity (``calculate_cm_grad``).

Prerequisites
-------------
As will be clearer in the remainder of this page, all cell mapping features rely on the aforementioned cells. These cells are the product of the discretization of the decision space, where
each dimension is divided into ``blocks``. For the next example, we assume to have a two dimensional objective function/optimization problem.
Then, we could define ``block = 3`` or ``block = [3, 4]``. The former would yield :math:`3 \cdot 3 = 9` cells and the latter :math:`3 \cdot 4 = 12` cells.

It is important that:

* **each dimension has at least 3 blocks** and
* **each cell has at least 3 observations**.

Otherwise, the resulting feature values would not have any inherent predictive power.
As a result, all cell mapping features can only be calculated when the sample size :math:`n` of a :math:`d` dimensional problem instance fulfills the following requirements:

.. math::
3 \cdot d^{blocks} \leq n
Angle
-----
The initial idea of the angle features (``calculate_cm_angle``) is that the best and worst values within the cells might return some insight of the underlying function’s landscape.
If those two observations lie in opposite directions, it indicates a trend within the cell.
In that case the angle between the vectors from cell center to worst value and cell center to best value would be close to 180°.
The angles of all cells from the grid will then be aggregated using the mean and the standard deviation.

.. code-block:: python3
from pflacco.sampling import create_initial_sample
from pflacco.classical_ela_features import calculate_cm_angle
# Arbitrary objective function
def objective_function(x):
return sum(x**2)
dim = 2
# Create inital sample using latin hyper cube sampling
X = create_initial_sample(dim, sample_type = 'lhs')
# Calculate the objective values of the initial sample
# using an arbitrary objective function
y = X.apply(lambda x: objective_function(x), axis = 1)
# Compute cell mapping angle feature set from the convential ELA features
cm_angle = calculate_cm_angle(X, y, blocks = 3)
Furthermore, the standard deviation and mean of the lengths of the two above-mentioned vectors (i.e. distances from the center of a cell to the best/worst observation within that cell) are used as additional features. In case of simple functions (such as the sphere function), the variation should be low as the majority of the cells should have similar distances — due to the fact that they usually lie close to the borders of the cells. In case of very multimodal functions, the variation should be rather high as cells with local optima result in contrary distances (short distances of the best values and long distances of the worst values) compared to cells without any local optima.

Since interactions between cells are ignored, i.e. these features are computed locally per cell, the features are considered to be independent from the search space dimensionality.

.. image:: ./figures/angle.svg


(Inspired by Kerschke, P. et al., 2014 [#r1]_)

Cell Convexity
--------------
For this feature set (``calculate_cm_conv``), all possible combinations of three (linearly) neighbouring cells within the grid are computed.
Per default, only horizontally and vertically neighbouring cells are considered. By adding ``cm_conv_diag = True`` to function call, diagonally neighbouring cells are considered as well.

.. code-block:: python3
from pflacco.sampling import create_initial_sample
from pflacco.classical_ela_features import calculate_cm_conv
# Arbitrary objective function
def objective_function(x):
return sum(x**2)
dim = 3
# Create inital sample using latin hyper cube sampling
X = create_initial_sample(dim, sample_type = 'lhs')
# Calculate the objective values of the initial sample
# using an arbitrary objective function
y = X.apply(lambda x: objective_function(x), axis = 1)
# Compute cell mapping convexity feature set from the convential ELA features
cm_conv = calculate_cm_conv(X, y, blocks = 3)
During the computation of the cell mapping convexity features, only the cells’ representatives are considered. Based on those prototypes, the concavity or convexity of the landscape is approximated.

Given the function evaluations of the three neighbouring cells, this feature computes the convex-combination between f(x\ :sub:`1`) and f(x\ :sub:`3`). That value is then compared to the corresponding value of f(x\ :sub:`2`).
The figure below illustrates the resulting decision, i.e. whether a combination indicates convexity or concavity. Just place the value of f(x\ :sub:`2`) above x\ :sub:`2` and infer the corresponding decision.

.. image:: ./figures/convexity.svg


(Inspired by Kerschke, P. et al., 2014 [#r1]_)

Gradient Homogeneity
--------------------
For every point within a cell’s sample, the nearest neighbor is identified and afterwards, the normalized vectors, which are always rotated towards the better points, are computed.
Then, all normalized vectors are summed up and divided by the maximal possible vector length (i.e. the number of points).
In case of rather randomly distributed objective values, the fraction should be close to zero as this would indicate vectors, which are pointing in different directions.
In case of a strong trend the value should be close to one (i.e., all vectors point into the same direction).

.. code-block:: python3
from pflacco.sampling import create_initial_sample
from pflacco.classical_ela_features import calculate_cm_grad
# Arbitrary objective function
def objective_function(x):
return sum(x**2)
dim = 3
# Create inital sample using latin hyper cube sampling
X = create_initial_sample(dim, sample_type = 'lhs')
# Calculate the objective values of the initial sample
# using an arbitrary objective function
y = X.apply(lambda x: objective_function(x), axis = 1)
# Compute cell mapping convexity feature set from the convential ELA features
cm_grad = calculate_cm_grad(X, y, blocks = 3)
Those values are then aggregated over all cells — again, using the mean and the standard deviation. Simple unimodal functions shall thus generate very high mean values.

.. image:: ./figures/gradienthomogeneity.svg


(Inspired by Kerschke, P. et al., 2014 [#r1]_)

.. rubric:: Literature Reference

.. [#r1] Kerschke, P., Preuss, M., Hernandez, C., Schuetze, O., Sun, J.-Q., Grimme, C., Rudolph, G., Bischl, B., and Trautmann, H. (2014): “Cell Mapping Techniques for Exploratory Landscape Analysis”, in: EVOLVE — A Bridge between Probability, Set Oriented Numbers, and Evolutionary Computation V, pp. 151—131, Springer (http://dx.doi.org/10.1007/978-3-319-07494-8_9).
46 changes: 46 additions & 0 deletions docs/source/classical_ela_features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Classical ELA Features
======================
The term *Exploratory Landscape Analysis (ELA)* features (as introduced by Mersmann et al., 2011 [#r1]_) summarizes a group of characteristics, which quantifies certain properties of a continuous optimization problem.
In its original version, ELA covered a total of 50 features - grouped into six so-called low-level properties (Convexity, Curvature, y-Distribution, Levelset, Local Search and Meta Model).
These (numerical values) were used to characterize (usually categorical and expert-designed) high-level properties, such as the Global Structure, Multimodality or Variable Scaling.
The figure below visualizes the connections between the low- and high-level properties.

.. image:: ./figures/ela1.png
:align: center


(Inspired by Mersmann et al., 2011 [#r1]_)

A detailed description of the features can be found in Mersmann et al. (2011) [#r1]_.
Below you find a code example.

.. code-block:: python3
from pflacco.sampling import create_initial_sample
from pflacco.classical_ela_features import *
# Arbitrary objective function
def objective_function(x):
return sum(x**2)
dim = 3
# Create inital sample using latin hyper cube sampling
X = create_initial_sample(dim, sample_type = 'lhs')
# Calculate the objective values of the initial sample
# using an arbitrary objective function
y = X.apply(lambda x: objective_function(x), axis = 1)
# Compute the 3 feature sets from the classical ELA features which are solely based on the initial sample
ela_meta = calculate_ela_meta(X, y)
ela_distr = calculate_ela_distribution(X, y)
ela_level = calculate_ela_level(X, y)
# Compute the remaining 3 feature sets from the classical ELA features which do require additional function evaluations
ela_local = calculate_ela_local(X, y, f = objective_function, dim = dim, lower_bound = -1, upper_bound = 1)
ela_curv = calculate_ela_curvate(X, y, f = objective_function, dim = dim, lower_bound = -1, upper_bound = 1)
ela_conv = calculate_ela_conv(X, y, f = objective_function)
.. rubric:: Literature Reference

.. [#r1] Mersmann et al. (2011), “Exploratory Landscape Analysis”, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 829—836. ACM (http://dx.doi.org/10.1145/2001576.2001690).
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
copyright = '2022, Raphael Patrick Prager'
author = 'Raphael Patrick Prager'
version = '1.2'
release = '1.2.0'
release = '1.2.1'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
30 changes: 30 additions & 0 deletions docs/source/dispersion.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Dispersion Features
===================
The dispersion features compare the dispersion, i.e. the (aggregated) pairwise distances, of all points in the initial design with the dispersion among the best points in the initial design.
Per default, this set of “best points” is based on the 2%, 5% and 10% quantile of the objectives. Those dispersions are then compared based on the ratio as well as on the difference.
For a complete overview of the features, please refer to the documentation of :func:`pflacco.classical_ela_features.calculate_dispersion` and the work of Lunacek and Whitley (2014) [#r1]_.

Below you find a code example.

.. code-block:: python3
from pflacco.sampling import create_initial_sample
from pflacco.classical_ela_features import calculate_dispersion
# Arbitrary objective function
def objective_function(x):
return sum(x**2)
dim = 3
# Create inital sample using latin hyper cube sampling
X = create_initial_sample(dim, sample_type = 'lhs')
# Calculate the objective values of the initial sample
# using an arbitrary objective function
y = X.apply(lambda x: objective_function(x), axis = 1)
# Compute disp feature set from the convential ELA features
ic = calculate_dispersion(X, y)
.. rubric:: Literature Reference

.. [#r1] Lunacek, M. and Whitley, D. (2014), “The Dispersion Metric and the CMA Evolution Strategy”, in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 477—484, ACM (http://dx.doi.org/10.1145/1143997.1144085).
19 changes: 19 additions & 0 deletions docs/source/feature_sets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Feature Sets
============

The following subsections give a small conceptional introduction into the different feature sets of pflacco.
This section is highly derivative of the work of Prof. Dr. Pascal Kerschke who graciously agreed to provide most of textual and visual content in this section.

The technical documentation of the feature sets below can be found in the :doc:`api` section.



.. toctree::
:maxdepth: 2

classical_ela_features
cell_mapping
dispersion
information_content
nbc
pca
Loading

0 comments on commit f43397b

Please sign in to comment.