-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmts_sample.Rd
74 lines (60 loc) · 2.27 KB
/
mts_sample.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mts_sample.R
\name{mts_sample}
\alias{mts_sample}
\title{Sample time series for an \emph{mts} time series object}
\usage{
mts_sample(
mts = NULL,
sampleSize = 5000,
seed = NULL,
keepOutliers = FALSE,
width = 5,
thresholdMin = 3
)
}
\arguments{
\item{mts}{\emph{mts} object.}
\item{sampleSize}{Non-negative integer giving the number of rows to choose.}
\item{seed}{Integer passed to \code{\link[base]{set.seed}} for reproducible sampling.}
\item{keepOutliers}{Logical specifying a graphics focused sampling algorithm
that retains outliers (see Details).}
\item{width}{Integer width of the rolling window used for outlier detection.}
\item{thresholdMin}{Numeric threshold for outlier detection.}
}
\value{
A subset of the given \emph{mts} object.
An \emph{mts} time series object with fewer timesteps.
(A list with \code{meta} and \code{data} dataframes.)
}
\description{
Reduce the number of records (timesteps) in the \code{data}
dataframe of the incoming \code{mts} through random sampling.
}
\details{
When \code{keepOutliers = FALSE}, random sampling is used to provide
a statistically relevant subsample of the data.
}
\section{Outlier Detection}{
When \code{keepOutliers = TRUE}, a customized sampling algorithm is used that
attempts to create subsets for use in plotting that create plots that are
visually identical to plots using all data. This is accomplished by
preserving outliers and only sampling data in regions where overplotting
is expected.
The process is as follows:
\enumerate{
\item{find outliers using \code{MazamaRollUtils::findOutliers()}}
\item{create a subset consisting of only outliers}
\item{sample the remaining data}
\item{merge the outliers and sampled data}
}
This algorithm works best when the \emph{mts} object has only one or two
timeseries.
The \code{width} and \code{thresholdMin} parameters determine the number of
outliers detected. For hourly data, a \code{width} of 5 and a \code{thresholdMin}
of 3 or 4 seem to find many visually obvious outliers.
Users attempting to optimize plotting speed for lengthy time series are
encouraged to experiment with these two parameters along with
\code{sampleSize} and review the results visually.
See \code{MazamaRollUtils::findOutliers()}.
}