Releases: UniprJRC/FSDA
V8.4.0 (R2020b release)
TRANSFORMATIONS IN REGRESSION
We have enriched the properties of the data transformations of the Yeo and Johnson (2000) for negative and positive responses, which we introduced in R2020a. More specifically, we intervened on the smoothness condition that the second derivative of zYJ(lambda) with respect to y be smooth at y = 0, along Atkinson et al (2019) and (2020), to allow two values of the transformations parameter: lambdaN for negative observations and lambdaP for non-negative ones. Now, function ScoreYJall computes:
-
a global t test associated with the constructed variable for lambda=lambdaP=lambdaN.
-
a t test for positive observations.
-
a t test for negative observations.
-
a F test for the joint presence of the two constructed variables described in points 2) and 3.
-
the F test based on the maximum liklihood estimate of lambdaP and lambdaN
New function ScoreYJall which computes the score tests described in points 1)-5) above.
New function ScoreYJmle which computes, in the case of extended Yeo and Johnson transformation, the likelihood ratio test of H0: lambdaP=lamabdaP0 and lambdaN=lambdaNeg0.
Added option usefmin in function boxcoxR. This option uses the solver (fminsearch or fminunc) to find MLE of the two transformation parameters for extended Yeo and Johnson family (Atkinson et al. 2020).
New function fanBIC which takes in input the output of FSRfan and using BIC and smoothness index enables to automatically choose in an efficient and robust way, the best value of the transformation parameter.
New function fanBICpn which enables to automatically choose the best values of the transformation parameters for positive and negative observations
New function normYJpn which extends the companion functions normBoxCox and normYJ to the case of extended Yeo and Johnson transformation.
TIME SERIES
New function SETARX which implements Threshold autoregressive models with two regimes
ROBUST CLUSTERING
New tools for dealing with the 14 Gaussian parsimonious clustering models (GPCM).
In function genSigmaGPCM new option pa.exactrestriction has been added. If pa.exactrestriction is true the covariance matrices are generated with the exact values of the restrictions specified in pa.cdet, pa.shw and pa.swb. In function MixSim optional input structure sph now can be called with field sph.exactrestriction
In function tclust the fourth input restrfactor can be a structure which can contain the type of Gaussian Parsimonious Clustering Model - GPCM (restrfactor.pars), the
scalars in the interval [1 Inf) which specifies the
the restriction which have to be applied to the determinants (restrfactor.cdet), to the elements of the shape matrices inside each group (restrfactor.shw) and across groups (restrfactor.shb).
New functions tclustICgpcm, tclustICsolGPCM, tclustICplotGPCM, and carbikeplotGPCM which extend functions tclustIC, tclustICsol, tclustICplot and carbikeplot to the case of the 14 GPCM.
GRAPHICS
New plots waterfallchart (which implements the waterfall chart (see https://en.wikipedia.org/wiki/Waterfall_chart) and new function funnelchart which implements the funnel chart (see
https://en.wikipedia.org/wiki/Funnel_chart)
new function scatterboxplot (which creates scatter diagram with marginal boxplots).
Improvment to functions
Now spmplot accepts as input a table. In the case the names of the tables are automatically added at the margins. Similarly, in function corrNominal when option datamatrix is true it is possible to supply as first argument a table.
DATASETS
New datasets balancesheets and facemasks added in the datasets regression section and datasets clustering section respectively
2020a
ROBUST REGRESSION
New set of routines for minimum density power divergence estimators (mdpd, mdpdR, mdpdReda, PDrho, PDpsi, PDwei, PDpsider, PDpsix, PDbdp, PDeff, PDc ), discussed in https://www.mdpi.com/1099-4300/22/4/399)
New function simulateLM to simulate linear regression data with prespecified value of R2, prespecified correlation among the explanatory variables and type of distribution.
New function VIOM which computes weights estimates under a Variance-Inflation Outlier Model using MLE or Restricted MLE (REMLE).
ROBUST CLUSTERING
New function tclustregeda which enables to monitor the regression clustering classification for different levels of trimming.
Improved function tclustregIC which enables to compute the BIC (and other information criteria) for different values of restriction factors and different number of groups, for classification or mixture likelihood and regression clustering.
Modified function tclustICsol now accepts input from tclustregIC to show the yXplot of the best solutions.
New function ctlcurves to select the appropriate number of groups in robust clustering.
New function mdrrsplot which plots the random starts trajectories and enables to brush them. The companion function mmdrsplot referred to multivariate analysis has been improved.
TRANFORMATIONS IN REGRESSION
New function boxcoxR which computes the profile log Likelihood for a range of values of the transforamtion parameter (lambda) and computes the MLE of lambda in the
supplied range. Supported families Box Cox, Yeo and Johnson and extended Yeo and Johnson (Atkinson et al. 2020).
DOCUMENTATION
Improved menu for the automatic installation of the FSDA html help files.
UTILITIES
New function exactcdf for finding the exact cdf of each element in a vector x with respect to the empirical distribution, represented by another vector.
New functions twdpdf and twdrnd to compute the pdf of the Tweedie distribution and generate random numbers from it.
2019b
This is the first release which is distributed from Mathworks marketplace and from github platform.
TRANSFORMATION IN REGRESSION
New function tBothSides which enables to transform both sides of a (nonlinear) regression model.
New function boxcoxR which finds MLE of lambda in linear regression (and confidence interval) using Box Cox or Yeo and Johnson family.
ROBUST TIME SERIES ANALYSIS
New functions LTStsVarSel.m which enables to perform variable selection in the robust time series model LTSts.m. In functions LTSts.m, simulateTS.m and forecastTS.m it is now possible to add an autoreressive component.
UTILITIES
New function existFS which checks whether a file exists and puts the answer in a cached persistent variable
DOCUMENTATION
Added file getting_started.mlx in subfolder doc of the main root of FSDA for packaging the FSDA toolbox,
FSDA ver. 2019a
Please download and run the setup file installFSDA.m with administrative privileges to automatically do the following:
- Copy all the .html files inside (docroot/FSDA)
- Run file addFSDA2path.m
- Launch buildocsearchdb
- Install the apps
New features in FSDA 2019a
CLUSTER ANALYSIS
Function tclustreg
has been considerably enhanced. Now the function includes: (i) robust BIC, (ii) possibility of constraining the determinants of the covariance matrices of the explanatory variables, (iii) options for treating datasets with concentrated noise, making use of concentration steps appropriately modified using observation weighting and thinning methods.
New function tclustregIC
which (if present) uses the Parallel Computing toolbox to compute robust BIC for mixture and classification likelihood for different values of k (number of groups) and different values of c (restriction factor for the variances of the residuals), for a prespecified level of trimming.
New function for constraining the determinants restrdeter
. This function has its own interest but is called in every concentration step of function tclust
in case determinant restriction is needed.
Routines for constraining the determinants (restrdeterGPCM
), the shape matrices (restrshapeGPCM
) and to impose common rotation matrices (common principal components) in presence of equal shape (cpcE.m
) or varying shape (cpcV.m
) and a general routine to impose constraints in the family of the 14 Gaussian Parsimonious Clustering Models (restrSigmaGPCM
).
Routine to generate data based on the 14 Gaussian Parsimonious Clustering Models (genSigmaGPCM
). This routine can be called directly from function MixSim
in order to generate each of the 14 Gaussian Parsimonious Clustering Models with a prespecified level of overlap (see option sph
inside MixSim
).
Routine GowerIndex
to compute matrix of similarity indexes using Gower metric.
DATASETS
New datasets added to the collection:
animals, P12119085, P17049075, fondi_large, JohnDraper data, gasoline data, ms212
. See pages datasets_reg
and datasets_mult
for a description of these datasets.
GRAPHICS
Possibility of brushing using rownames
. Rownames also appear in the associated scatter plot matrix, both for regression and multivariate analysis: se new examples in resfwdplot
and malfwdplot
.
New function aceplot
to visualize the results of the output produced by functions ace
and avas
.
Option RowNamesLabels
has been added to add2spm and to add2yX
to label the units.
MULTIVARITATE
Function FSMeda
is now much faster; the original function FSMeda
has been kept, renamed FSMedaeasy
, because the algorithm is much easier to follow.
REGRESSION
New functions: (i) ace
which implements the alternating conditional expectations algorithm to find the transformations of y and X that maximise the proportion of variation in y explained by X and (ii) avas
which uses a (nonparametric) variance-stabilizing transformation for the response variable.
New function smothr
to smooth values imposing various constraints (e.g. monotonicity, circularity,..). This function calls the supersmoother
routine of Friedman.
New function rlssmo
to compute a running line smoother with global cross validation.
New function supsmu
to smooth scatterplots using Friedman's supersmoother
algorithm.
Function RobCov
now includes the estimator covrobc
(a corrected version of the covariance matrix of robust beta coefficients). A new motivating example shows a case why covrobc
should be always used.
UTILITIES
New function repDupValWithMean
that enable to replace values of y including non unique elements in vector x with local means.
UTILITIES HELP
Function publishFS
is fourthly improved. This function automatically transforms structuerd .m files into MATLAB pure style files. In the HTML help files now the right click of the mouse (similarly to pure Mathworks pages) enables to execute, select or find help (F1 key) for all the versions of MATLAB starting from 2017a.
STATISTICAL UTILITIES
New function genr8
to generate random numbers which are coherent across different software platforms.
New function exactcdf
to find exact cdf values of each element of an input vector x with respect to an empirical distribution.
New function wthin
which thins a uni/bi-dimensional dataset.
New function ctsub
which computes numerical integration from x(1) to z(i) of y=f(x)
New functions (i) vervaatsim
(to simulate precisely from a Vervaat perpetuity
distribution) (ii) vervaarxdf
(to obtain the pdf or the cdf of a Vervaat perpetuity distribution) and (iii) vervaatrnd
(to simulate random variates from the Vervaat perpetuity distribution).
Please download and run the setup file installFSDA.m with administrative privileges to automatically do the following:
- Copy all the .html files inside (docroot/FSDA)
- Run file addFSDA2path.m
- Launch buildocsearchdb
- Install the apps
FSDA 2018b
(1) New function qqplotFS that enables to create a qqplot of residuals with confidence bands
(2) New function mtR which generates the same random numbers produced by R software with Mersenne Twister mt19937ar
(3) New functions associated with Rocke biweght estimator. See for example RKrho, RKpsi, RKpsider, RKwei, RKbdp, RKeff.
(4) Routines FSR, FSRmdr, FSRbsb extended to time series (see new functions FSRts, FSRtsmdr, FSRtsbsb and regressts)
(5) New function verlessthanFS. It is a faster version of MATLAB function verlessthan.
(6) New datasets added to the collection.
(7) New routine publishBibliography to create in a automatic way the bibliography from the citations present inside the .m files.
FSDA 2018a
(1) New function tclusteda that helps choosing the best tclust model. It computes tclustfor different values of the trimming factor and produces plots that allow to find the optimal level of trimming. This function uses the parallel processing toolbox, if available.
(2) Extension of the score test. New function ScoreYJpn that computes the score test for Yeo Johnson transformation separately for positive and negative observations. FSRfan now accepts the new option family "YJpn" and it is possible to monitor the score test for both positive and negative observations (output arguments out.Scorep and out.Scoren).
(4) New functions for time series analysis. simulateTS simulates a time series with trend (up to third order), seasonality (constant or of varying amplitude) with a different number of harmonics and a level shift. forecastTS produces forecasts with confidence bands for a time series estimated with function LTSts.
(4) CorAna has an improved display of results. New function CorAnaplot draws a rich Correspondence Analysis graph with different types of confidence ellipses for selected rows and columns.
(5) New function verlessthanExt. It is a faster version of MATLAB function verlessthan.
(6) Documentation of yXplot considerably improved. New options added (xlimx, ylimy, namey, nameX).
(7) MixSimreg extended to account for multiple parameter distribution (betadistrib option)
(8) histFS has a new optional argument (weights) for plotting a weighted histogram.
(9) options labenv has been added to mmdrsplot.
(10) option axesellipse added to ellipse
(11) New output argument OldAndNewIndexes used in function UnitsSameCluster, to track the indexes permutations used to rearch a desired cluster labelling.
FSDA 2017b
Major statistical release. Highlights:
FSDA has introduced two new categories of tools, one for (robust) time series analysis; another for analyzing categorical data and contingency tables. More precisely:
(1) Function LTSts extends LTS estimator to time series. A related new graphical plot associated to a time series, wedgeplot, provides information on the presence of outliers and level shifts.
(2) CorAna performs correspondence analysis; SparseTableTest computes independence test for large and sparse contingency tables; CressieRead computes the power divergence family of tests, to check the discrepancy/distance between observed and expected frequencies in a contingency table; rcontFS generates a random two-way table with given marginal totals; barnardtest computes the Barnard test, corrNominal measures strength of association between two unordered (nominal) categorical variables. Similarly for ordinal data with corrOrdinal. crosstab2datamatrix recreates the original data matrix X from contingency table N. This group of functions is complemented by file examples_categorical.m as in style of FSDA.
The two categories of functions will be progressively enriched.
Other new functions which are included are boxtest (test of equality of covariance matrices used for example in tkmeans), GYfilt (Gervini and Yohai, univariate outlier identifier), mmdrsplot (interactive plot of the trajectories of minimum Mahalanobis distances from different starting points), overlapmap to plot the ordered pairwise overlap values between components, dempk to perform a merging of components found by tkmeans, ncpci to compute a non centrality parameter confidence interval.
Finally, spmplot has been enriched to superimpose ellipses, density and contour functions to data and extract single panels from the scatter matrix.
FSDA 2017a
Major statistical release. Highlights:
CLUSTERING
Function tclustreg now includes trimmed Cluster Weighted Restricted Models.
New function tclustIC for the automatic selection of the best number of groups.
New function tlclustICsol to extract a set of relevant solutions (and associated tclustICplot).
New function UnitsSameCluster to to control the labels of the clusters which contain predefined units.
New function to compare two partitions (Fowlkes and Mallows index)
Updated routine simdatasetreg to generate new outlier patterns
STATISTICAL UTILITIES
New routines for density estimation and thinning, for univariate and bivariate data (used in tclustreg).
bwe, rthin, wthin, WNChygepdf.
UTILITIES
New functions wraptextFS, removeextraspacesLF
MONITORING ROBUST ESTIMATORS
New functions mveeda, MMmulteda
Smulteda, Sregeda
TRANSFORMATIONS IN REGRESSION
New function ScoreYJ which implements the score test for the Yeo and Johnson transformation. This new transformation has also been embedded inside function FSRfan.
SAMPLING AND COMBINATORIAL
updated functions randsampleFS and subsets.
New routines for thinning.
NEW mlx files
.mlx files introduced for examples_MixSim
GRAPHICS
New function to create the car-bike plot to find the most relevant solutions (carbikeplot).
Functions resfwdplot, malfwdplot generalized in order to take as input the output of procedures which monitor robust estimators.
The FSDA help folder now contains XML files associated to the functions documentation. This is in view of generating/updating automatically or using a GUI the functions documentation, in html as well as in the function head
FSDA 2016a
Major statistical release. Highlights:
New features added to the tclust function, including determinant restriction and new adjusted BIC criterion for the estimation of the number of groups.
-Added functions for reweighting FSR and FSRB (FSRr and FSRBr).
-Functions FSR, FSRB and FSRH redesigned; a routing implementing the core of the Forward Search algorithm (FSRcore) introduced to avoid code redundancies.
-New function, winsor, to winsor data.
-New function FSMbsb, which will replace FSMbbm.
-New function randindexFS, to evaluate the quality of different clusterings.
-New routines poolClose and poolPrepare introduced to conveniently open and close a pool of parallel workers.
-Several new robust functions to generate, for example, the Tukey Biweigh rho function (HUrho), the tuning constant associated to a certain efficiency (HUeff), the psi functions (HUpsi), its derivative (HUpsider), etc. For a full list, see functions under utilities_stats folder.
Major highlights in the documentations and examples:
-New function, publishFS, introduced to generate documentation pages directly from the .m files.
-New function, makecontentsfileFS, introduced to create a the list of files present in a FSDA folder and/or subfolders. It extends MATLAB function makecontentsfile.
-.mlx files introduced for examples_multivariate and examples_regression
FSDA 2015b
New on 2015b release of the FSDA:
-Function simdataset.m modified to allow the user to simulate outliers from different distributions and b contamination schemes and/or contaminate existing datasets.
-New Bayesian regression analysis routines: FSRB.m, FSRBeda.m, FSRBmdr.m, regressB.m.
-In FSReda.m: monitoring of confidence intervals of beta and sigma2.
-In FSRBeda.m: monitoring of HPD (highest posterior density regions) of beta and sigma2.
-New functions for inverse gamma computation: inversegampdf.m, inversegamcdf.m, inversegaminv.m.
-Added functions to monitor units forming subset in heterosckedastic and Bayesian regression: FSRHbsb.m, FSRBbsb.m.
-Added new datasets for Bayesian examples.
-Added option for the robust transformation in the Yeo-Johnson family.
-addFSDA2path.m: modified for compatibility with unix platforms and to address changes in the folder organization of FSDA functions.
-Added routines to compute and visualize robust bivariate boxplot (function boxplotb.m).
-New routine for automatic outlier detection in heteroskedastic regression (FSRH.m).