3_method.tex

\section{MOEA/D with priority Functions}

%MOEA/D-RAD, MOEA/D with online Resource Allocation by Diversity Metric, is the variant of MOEA/D proposed in this work. This algorithm uses the maximum relative diversity loss, MRDL, for determining the values of the priority function. 


\begin{algorithm}[h]
	\caption{MOEA/D with priority functions}\label{alg1}
	\begin{algorithmic}[1]
		
		\State Initialize the weight vectors $\lambda_i$, the neighborhood $B_i$, the priority value $u_i$ every subproblem $i=1,...,N$.
		
		\While{\textit{Termination criteria}}
		\For {1 to N}
		\If{$\textit{rand()} < u_i$}
		\State Generate an offspring $y$ for subproblem $i$.
		\State Update the population by $y$.
		\EndIf
		\EndFor
		\State  Evaluate and after $\Delta T$ iterations, keep updating \textit{\textbf{u}} by a priority function.
		\EndWhile
	\end{algorithmic}
	%\vspace{-1em}
\end{algorithm}

In this study we use the basic framework in algorithm~\ref{alg1} with priority functions of MOEA/D-GRA. In contrast to MOEA/D-GRA we only consider the basic algorithm and no other variant. The benefit of using MOEA/D-GRA is that it has a simple code structure and represents well the class of variants of MOEA/D with resource allocation without a population archive. In consequence any priority functions might be easily integrated to the MOEA/D framework. 

This basic algorithm is similar to the MOEA/D-DE~\cite{zhang2009performance} with exception of lines 4 and 7. Line 4 allocation computational resources to a solution based on their priority value $u_i$, while line 7 updates the priority function values. 

Line 4 deals with the selection of solutions given their priority function values, while the line 7 deals with the calculation of the priority function values. All other procedures and parameters are the same as in MOEA/D-DE~\cite{li2009multiobjective}. We highlight that the neighborhood is only calculated in the initialization period.

%\subsection{Priority Functions}

The selection of priority functions provides an important way to control MOEA/D. They allow ways of designing MOEA/D variants that might focus on desired characteristics, such as diversity, performance contribution, convergence to a specific region of the PF or others. This is possible because different methods can be used as priority functions to create the vector $u$ in algorithm~\ref{alg1}. 

In this work we chose to study diversity on the objective space, diversity in the decision space, and the Relative Improvement, from MOEA/D-DRA as our priority functions. Next we give a brief explanation of why we chose to consider these methods as well as a random (control) method and we describe how to calculate them in details.

Independently of the method used to calculate the priority function, we initialize the value of the vector $u=1$, as in MOEA/D-DRA. As in DRA and GRA we have a learning period of $\Delta T$ iterations. Here $\Delta T=20$ for artificial benchmarks, as in MOEA/D-GRA~\cite{zhou2016all}, while for the real-world problems, we chose $\Delta T=2$, by trial and error. A sensitivity analysis should be performed for deciding suitable initial values for $u$ and for $\Delta T$.

 It should also be noted that if the priority function values results in less than 3 subproblems being updated in one iteration, we reset the priority vector $u = 1$  and all subproblems will be chosen for offspring reproduction at the that iteration.

\subsection{Priority Function - Norm of the difference of current solutions and its parents} 

\begin{algorithm}[t]
	\caption{2-Norm}\label{alg3}
	\begin{algorithmic}[1]
		
		\State Input: $X^{t}$ decision vectors of solutions; $X^{t-1}$, decision vectors from the previous solutions; N, the population size.
		\For {i=1 to N} 
			\State u[i] = $||X^{t}_i$ - $X^{t-1}_i||$
		\EndFor
		\State u = scale (u) // between 0 and 1\\
	\Return u
	\end{algorithmic}
\end{algorithm}

To consider diversity on the decision space, we propose a priority function based on the norm of the difference between the current solution and its parent. Algorithm~\ref{alg3} gives the details on implementation.

The priority function proposed that considers diversity on the objective space is based on the (2-)Norm of the difference of the current solution to its parent.

\begin{equation}
 \text{Norm}_i = ||\text{current solution}_i - \text{parent solution}_i||.
\end{equation}

Then, we scale the values to be between 0 and 1, by using the next equation, based on the values from the entire solution set,

\begin{equation}\label{scaling}
 \text{Norm}_i = (\text{Norm}_i - \text{min  Norm}) / (\text{max Norm} - \text{min Norm})
\end{equation}

The idea of using the Norm as priority function is that by considering diversity as the priority function more resources are given to incumbent solutions that are similar to their parents, forcing them to update more often and leading to a higher exploration of the decision space.


\subsection{Priority Function - MRDL} 


\begin{algorithm}[t]
	\caption{MRDL}\label{alg2}
	\begin{algorithmic}[1]
		
		\State Input: old MRDL (initial value is 0); $Y^t$, objective function values from the incumbent solutions; $Y^{t-1}$, objective function values from the incumbent solutions of the previous iteraction; N, the population size.		
		\For {i=1 to N}
		\State find index $h$ where  ($Y^{t-1}_h \succeq Y^t_i$) and $||Y^{t-1}_h - Y^t_i  ||$ is minimal.
		\If {If none is found} 
		\State MRDL[i] = $-\infty$
		\Else
		\State $d.conv = Y^t_i - Y^{t-1}_h$.
		\For {j=1 to N}
		\State $p \prime = Y^{t-1}_j - Y^{t-1}_h$
		\State $c \prime = Y^t_j - Y^t_i$
		\State $proj_{d.conv}*p \prime = \dfrac{sum(conv \cdot p \prime)}{(p \prime \times p \prime)}*p \prime$
		
		\State $ proj_{d.conv}*c \prime = \dfrac{sum(conv \cdot c \prime)}{(c \prime \times c \prime)}*c \prime$
		
		\State $RDL_j = \dfrac{ ||p \prime - proj_{d.conv}p \prime|| }{||c \prime - proj_{d.conv}c \prime||}$\\
		
		\EndFor
		MRDL[i] = maximum $RDL$
		\EndIf
		\EndFor
		\State u = 1 - scale (MRDL - old MRDL) // between 0 and 1\\
	\Return u, MRDL
	\end{algorithmic}
\end{algorithm}

To consider diversity on the objective space, we propose a priority function based on the the Maximum Relative Diversity Loss, MRDL~\cite{gee2015online}.


The diversity on objective space as a priority function  is based on the Maximum Relative Diversity Loss, MRDL~\cite{gee2015online}.  The idea of using MRDL is that by measuring diversity on the objective space, more resources are given to incumbent solutions that have similar objective function values between two consecutive iteractions. Therefore, it is expected that this will lead to a higher exploration of the objective space. Algorithm~\ref{alg2} gives the details on implementation.

The calculation of MRDL depends on the concept of weak dominance~\cite{zitzler2003performance}. A solution $a$ weakly dominates $b$ if in all objectives $a \geq b$ (note that $a \succeq a$).
 
Let $N$ be the number of incumbent solutions and the objective values of iteraction $t$ be $Y^t$ and the objectives values of iteraction $t-1$ be $Y^{t-1}$. For each incumbent solution $i$, find index $h \in Y^{T-1}$. This index is the index of a parent that weak dominates the solution $i$. If $h$ is not found (no parent weak dominates the solution) the MRDL value for this solution is set to $-\infty$. Given $i$ and $h$, for each subproblem, the value of Relative Diversity Loss (RDL) is given by 

\begin{equation}
RDL = \dfrac{ ||p \prime - proj_{d.conv}p \prime|| }{||c \prime - proj_{d.conv}c \prime||}.
\end{equation}


%TODO: brief explaination of the idea of RDL
RDL is a diversity measurement quantity that indicates the amount of diversity loss of an individual solution between two consecutive iterations. High values of RDL imply a reduction of the solution spread, since the further an objective vector of a solution is from the convergence direction, the more it contributes in terms of diversity in the objective space~\cite{gee2015online}. The maximum value of RDL is the MRDL of the solution $i$.


% For every $c_i$, where $c_i$ is the objective function value from incumbent solution $i=1$ to $N$, find an index $h \IN Y^{T-1}$, that holds
%
%\begin{align}\label{first}
%	\begin{split}
%		P_h \succeq C_i\text{ and}\\
%		\text{min } ||P_h - c_i||.,
%	\end{split}
%\end{align}
%
%where $p_h$ weak dominates $c_i$ in the objective space. $p_h$ is the objective function value from a parent solution that holds the equation~\ref{first}. With the index \emph{h}, we can calculate the convergence direction for each incumbent solution as
%
%\begin{equation}\label{second}
%	d.conv = c_i - p_h.
%\end{equation}
%
%Then for each objective function value from incumbent solution $j$, $j=1$ to $N$, (\emph{loop (2)}), we calculate
%
%\begin{align}\label{third}
%	\begin{split}
%		p \prime = P_j - P_h\\
%		c \prime = c_j - c_i.
%	\end{split}
%\end{align}
%
%With those values, we can calculate two \emph{projection directions} related to each objective function value $j$ as
%
%
%\begin{align}\label{fourth}
%\begin{split}
%proj_{d.conv}*p \prime = \dfrac{sum(conv \cdot p \prime)}{(p \prime \times p \prime)}*p \prime\\
%proj_{d.conv}*c \prime = \dfrac{sum(conv \cdot c \prime)}{(c \prime \times c \prime)}*c \prime.
%\end{split}
%\end{align}
%
%With these two directions, we might calculate the Relative Diversity Loss (RDL) for each of convergence direction (see~\ref{second}), (this finishes (\emph{loop (1)})). We calculate RDL as
%
%\begin{equation}
%RDL_j = \dfrac{ ||p \prime - proj_{d.conv}p \prime|| }{||c \prime - proj_{d.conv}c \prime||}.
%\end{equation}
%
%We equal $MRDL_i$ to max $RDL_j$ (this finishes (\emph{loop (1)})).
%
%We subtract the MRDL values from interaction $g$ from the MRDL from iteraction $g-1$). We set the initial value of MRDL as zero (at the first time that MRDL is calculated there is no real subtraction).
%
%\begin{equation}
% MRDL_{final} = MRDL_g - MRDL_{g-1}.
%\end{equation}
%
% Finally, we scale $MRDL_{final}$ using the same principle used in the 2-Norm, see equation~\ref{scaling}. 
%\end{align} 
%Now, we move on how to calculate MRDL. Prior to scalarizing it between $0$ and $1$ to fit the algorithm~\ref{alg1} we calculate MRDL for every individual of the population. The following equation describes how to calculate the priority function given the MRDL, $\Gamma^{p \rightarrow c}$.
%
%
%\vspace{-1em}
%\begin{equation}
%\Gamma_{i}^{p \rightarrow c} = \underset{i=1,...,k}{\max} \Gamma_{d.conv_{y}}^{p \rightarrow c}.
%\end{equation}
%
%
%
%At every iteraction and for each incumbent solution, we need $k$ convergence directions (shown later) and we compute the Relative Diversity Loss (RDL) for each of these $k$ convergence directions. The maximum value among these $k$ convergence directions is chosen as the MRDL for that incumbent solution.
%
%\begin{equation}
%\label{rdl}
%\Gamma_{d.conv_{y}}^{p \rightarrow c} = \dfrac{ ||p \prime - proj_{d.conv_{y}}p \prime|| }{||c \prime - proj_{d.conv_{y}}c \prime||}.
%\end{equation}
%
%To calculate RDL of a solution to the whole population, the following equation is used. It considers every incumbent solution related to a subproblem $i$, from the whole population. This reduction is given by a division between the shortest distance of a parent, $p$,  and offspring, $c$, to the line of convergence direction
%
%%The numerator in~\ref{rdl} is the closest distance between the parent solution ($p$) to the convergence direction $(c_r - p_r)$. While, the denominator in~\ref{rdl} is the closest distance between the offspring solution ($c$) to the convergence direction $(c_r - p_r)$. %The projections of $p\prime$ and $c\prime$ objective vectors onto the convergence direction $d.conv_{y}$ are $proj_{d.conv_{y}}p \prime$ and $proj_{d.conv_{y}}c \prime$.
%
%$p\prime$ and $c\prime $ are given by:
%\vspace{-1em}
%\begin{equation}
%\begin{split}
%p\prime = p - p_r,\\
%c\prime = c - c_s,\\
%\end{split}
%\end{equation}
%
%with $p_r$ and $c_s$ being the parent, and offspring objective vectors used to calculate the convergence direction in equation~\ref{1}. Index $s$ is equal to index $j$ used to calculate $conv_{y}$. The same principle is valid for index $r$. The vector projection between two vectors is defined as next.
%
%\begin{align}
%\begin{split}
%proj_{d.conv_{y}}p \prime = \frac {{d.conv_{y}} \cdot {p \prime}} {(p \times p)^2}{{p \prime}},\\
%proj_{d.conv_{y}}c \prime = \frac {{d.conv_{y}} \cdot {c \prime}} {(c \times c)^2}{{c \prime}}.
%\end{split}
%\end{align}
%
%While the norm of $p \prime - proj_{d.conv_{y}}p \prime$ is calculated as follows using the 2-Norm of this difference. The same procedure is done for $c \prime - proj_{d.conv_{y}}c \prime$.
%
%To estimate the convergence direction, $d.conv_{y}$, we need to have an offspring, $c_j$, that dominates at least one parent. Select a parent, $p_h$, solution that is closest to this offspring in the objective space.  For every weakly dominated parent, one convergence direction is calculated as in the next equation. As in the study by Zitzler et al.~\cite{zitzler2003performance} study, weak dominance ($A \succeq B$) means that any solution in set B is weakly dominated by a solution in set A. However, this does not rule out equality, because $A \succeq A$ for all approximation sets $A$.
%\begin{equation}
%\label{1}
%d.conv_{y} = c_j - p_h
%\end{equation}
%
%Index $j$ (for indexing offsprings, $c_j$) is selected from the set $D_c$. Index $h$ is explained later.
%
%\vspace{-1em}
%\begin{equation}
%\label{D}
%D_c = \{d| \exists c_d \prec p_k, k \in {1,..., N}, d \in [1,..., |C|]\}
%z\end{equation}
%
%$N$ is the parent population size, $|C|$ is the size of the offspring population $C$. In equation~\ref{D}, the offspring $c_d$ must weakly dominate at least one parent solution. Index $h$ (for indexing parents, $p_h$) comes from the following two equations.
%\begin{equation}
%h = \underset{k \in D_p }{argmin} || p_k - c_j ||
%\end{equation}
%
%\begin{equation}
%\label{D_p}
%D_p = \{k| \exists c_j \prec p_k, k \in {1,..., N}\}
%\end{equation}
%
%$D_p$ in equation~\ref{D_p} denotes the index set of parent solutions which are weakly dominated by $c_j$ ($j$ index comes from equation~\ref{D}).
%


\subsection{Priority Function - Relative Improvement}  

\begin{algorithm}[t]
	\caption{Relative Improvement}\label{alg5}
	\begin{algorithmic}[1]
		
		\State Input:   $Y^t$, objective function values from the incumbent solutions; $Y^{t-\Delta T}$, objective function values from incumbent solution of iteration $t -\Delta T$, $u$ from the previous $\Delta T$ iteration;
		\For {i=1 to N}
		\State $\delta[i] = \frac{Y^t[i] - Y^{t-1}[i]}{Y^t[i]}$
		\If {$\delta[i] > 0.001$} 
		\State u[i] = $(0.95 + 0.05 \cdot \frac{\delta[i]}{0.001}) \cdot u[i]$
		\Else 
		\State u[i] = 1
		\EndIf
		
		
		\EndFor
		
		\State u / (max(u) + $1.0 x 10^{-50}$)\\
		\Return u
	\end{algorithmic}
\end{algorithm}

Here we give a brief description of the Relative Improvement (R.I.), the priority function used in MOEA/D-DRA, MOEA/D-GRA and others. This priority function aims to measure subproblem hardness and then it helps allocating more resource to subproblems that have  improved more over the next few iterations. Algorithm~\ref{alg5} gives the details on implementation of the equation~\ref{priority}. 

We highlight that R.I. was first introduced in the context of the unconstrained MOEA competition in the CEC 2009~\cite{zhang2009performance}, being the winner of that competition~\cite{zhang2008multiobjective}. Also, in this competition the UF benchmark functions were introduced.

\subsection{Priority Function - Random}

\begin{algorithm}[t]
	\caption{Random}\label{alg4}
	\begin{algorithmic}[1]
		
		\State Input:  N, the population size.
		\For {i=1 to N}
		\State u[i] = random value between 0 and 1
		\EndFor
		\Return u
	\end{algorithmic}
\end{algorithm}

The random priority function is used as basis for comparison. Given no information besides the size of the population, we define the vector of priority $u$ from a uniform distribution. Algorithm~\ref{alg4} gives the details on implementation. 
%
%
%
%Does is help? When we consider more sophisticated priority functions, are we making any improvement?
%
% Using no information. We select solutions at random.
% 
% if it is best
%
%bigger population might be worse, since if a subpopulation better results were found - to much effort for nothing
%smaller population might be better, since if a subpopulation better results were found - effort resource better spent
%subpopulation from a bigger population is better than smaller population with the subpopulation size?