From bba0fb22ae7e6acadbe4c58ef2e6ebea5535d1be Mon Sep 17 00:00:00 2001
From: Mikis Stasinopoulos <dmh.stasinopoulos@gmail.com>
Date: Mon, 21 Oct 2024 12:00:21 +0100
Subject: [PATCH] selection more

---
 vignettes/selection.qmd | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/vignettes/selection.qmd b/vignettes/selection.qmd
index 7e11580..ce1adc1 100644
--- a/vignettes/selection.qmd
+++ b/vignettes/selection.qmd
@@ -32,7 +32,7 @@ g({\theta}_{ki})  &=& b_0 + s_1({x}_{1i})  +   \ldots,  s_p({x}_{pi})
 \end{split} 
 $$ {#eq-GAMLSS}
 where ${D}( )$ is the assumed distribution which depends on parameters $\theta_{1i}, \ldots, \theta_{ki}$ and where all the parameters can be functions of the explanatory variables $({x}_{1i}, \ldots, {x}_{pi})$.
-In reality we do not know the distribution ${D}( )$  and also we do not know **which**  and **how** the variables   $({x}_{1i}, \ldots, {x}_{pi})$ effect the parameters $\theta_{1i}, \ldots, \theta_{ki}$. So the model selection in a distributional regression model takes the form of; 
+In reality we do not know the distribution ${D}( )$  and also we do not know **which**  and **how** the variables   $({x}_{1i}, \ldots, {x}_{pi})$ effect the parameters $\theta_{1i}, \ldots, \theta_{ki}$. So the model selection in a distributional regression model could takes the form ;
 
 
 * select the _best_ fitting distribution;
@@ -40,8 +40,25 @@ In reality we do not know the distribution ${D}( )$  and also we do not know **w
 * select the _relevant_ variables for the parameters and how they effect the parameters. 
 
 
+So a  **general algorithm** for searching for a _best_ model could be;
+
+- **START** by defining a set of appropriate distributions for the response ${D_J()}$ for $j=1,\ldots, J.$
+
+- **FOR** $J$ in   $j=1,\ldots, J$
+
+- **SELECT** appropriate   variables $({x}_{1i}, \ldots, {x}_{pi})$.
+
+- **SELECT** the distribution  $\hat{D}_J()$ and variables with a  minimum values of a selected criterion.
+
+The selection criterion could be a criterion as AIC defined on the training data or a criterion defined in the **out of bag** data. While the above algorithm could work reasonable with data having a relative small number or explanatory variables could be very slow for data with a lot of explanatory variables.   Cutting some corners could improve the speed of the algorithm.
+
+
+
+
 ## Select a distribution 
 
+### The ranse of the response 
+
 The first thing to take into the account in the selection of the distribution is that the distribution should be defined in the range of the response variable. @fig-responseType shows the different possibilities depending on whether the response is `continuous`, `discrete` of `factor`  If the response is continuous and has negative values a distribution in the real line is appropriate. For positive responses a positive real line distribution is appropriate. For bounded continuous response we have the options to transform the response to values between 0 and 1 or to create an appropriate truncated distribution. For count response the consideration is whether the counts are finite or not. For infinity counts a distribution similar to the Poisson distribution can be used. For finite counts binomial type distributions can be used. The case in which the  response is a categorical variable (factor) is called  `classification` regression.  If the factor is an `ordered` factor appropriate models exist but we will not deal with them here.  For unordered factor responses a binomial distribution can be use if the classification is binary otherwise a multinomial distribution. Note that   for classification problems, there is a vast literature in machine learning to deal with the problem.
 
 
@@ -66,6 +83,11 @@ flowchart LR
   K --> N[binary]
 ``` 
 
+
+### Select appropriate distribution
+
+
+
 ## Select appropriate variables\
 
 hu