Multisite daily Stochastic Weather Generator
This tutorial describes the Stochastic Weather Generator described in the Interpretable Seasonal Hidden Markov Model for spatio-temporal stochastic rain generation in France paper by Emmanuel Gobet (CMAP - École Polytechnique), David Métivier (MISTEA – INRAE) and Sylvie Parey (R&D – EDF). It provides a step-by-step construction of the Seasonal Hidden Markov Model (SHMM), the interpretation of the hidden states as Weather regimes over France and eventually the validation of the model with simulations.
Set up
Package and functions
There are several ways to add
a package before using
, one way is for this tutorial to copy-paste (it might take a while):
import Pkg
+Multisite daily Stochastic Weather Generator · StochasticWeatherGenerators Multisite daily Stochastic Weather Generator
This tutorial describes the Stochastic Weather Generator described in the paper Interpretable Seasonal Hidden Markov Model for spatio-temporal stochastic rain generation in France by Emmanuel Gobet (CMAP - École Polytechnique), David Métivier (MISTEA – INRAE) and Sylvie Parey (R&D – EDF). The paper is available on HAL at this link. It provides a step-by-step construction of the Seasonal Hidden Markov Model (SHMM), the interpretation of the hidden states as Weather regimes over France and eventually the validation of the model with simulations.
Set up
Package and functions
For Julia new user There are several ways to add
a package before using
, one way is for this tutorial to copy-paste (it might take a while):
import Pkg
Pkg.add(["CSV", "JLD", "DelimitedFiles", "DataFrames", "DataFramesMeta", "StatsBase", "Random", "Distributions", "StatsPlots", "LaTeXStrings"])
using CSV, JLD, DelimitedFiles # File Read/Load/Save
using DataFrames, DataFramesMeta # DataFrames
@@ -135,7 +135,7 @@
@time "FitMLE SHMM (Slice)" hmm_slice = fit_mle_all_slices(hmm_random, Y, Y_past; n2t=n2t, robust=true, rand_ini=true, Dirichlet_α=0.8, history=false, n_random_ini=1, Yₜ_extanted=[-12, -7, 0, 6, 13]);
-θᴬ_slice, θᴮ_slice = fit_θ!(hmm_slice, 𝐃𝐞𝐠);
FitMLE SHMM (Slice): 25.807252 seconds (195.00 M allocations: 30.020 GiB, 9.92% gc time, 24.03% compilation time)
+θᴬ_slice, θᴮ_slice = fit_θ!(hmm_slice, 𝐃𝐞𝐠);
FitMLE SHMM (Slice): 26.396966 seconds (194.89 M allocations: 30.013 GiB, 8.82% gc time, 23.07% compilation time)
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
@@ -143,7 +143,7 @@
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
Fit with Baum Welch using the slice estimate as a starting point
With the Slice estimate as a good starting point for the full (seasonal) Baum Welch EM algorithm we fit the model!
@time "FitMLE SHMM (Baum Welch)" hmm_fit, θq_fit, θy_fit, hist, histo_A, histo_B = fit_mle(hmm_slice, θᴬ_slice, θᴮ_slice, Y, Y_past,
maxiter=10000, robust=true; display=:final, silence=true, tol=1e-3, θ_iters=true, n2t=n2t);
EM converged in 74 iterations, logtot = -116791.10200745627
-FitMLE SHMM (Baum Welch): 59.463831 seconds (188.68 M allocations: 33.083 GiB, 6.99% gc time, 12.66% compilation time)
Uncomment to load previously computed hmm
# hmm_infos = load("save_tuto_path/hmm_fit.jld")
+FitMLE SHMM (Baum Welch): 62.576305 seconds (188.68 M allocations: 33.083 GiB, 7.93% gc time, 12.63% compilation time)
Uncomment to load previously computed hmm
# hmm_infos = load("save_tuto_path/hmm_fit.jld")
# hmm_fit = hmm_infos["hmm"]
# hist = hmm_infos["hist"]
# θq_fit = hmm_infos["Q_param"]
@@ -155,7 +155,7 @@
xticks!(pA[k], vcat(dayofyear_Leap.(Date.(2000, 1:12)), 366), vcat(string.(monthabbr.(1:12)), ""), xlims=(0, 367), ylims=(0, 1))
end
pallA = plot(pA..., size=(1000, 500))
-end
Rain probabilities
begin
+end
Rain probabilities
begin
mm = 1
jt = D
pB = [plot(legendfont=14, title="$(station_name[j])", titlefontsize=17, tickfont=14, legendfontsize = 14) for j in 1:jt]
@@ -171,7 +171,7 @@
ylims!(pB[j], (0, 1))
end
pallB = plot(pB[staid_lat]..., size=(3000 / 2, 1000 / 1), layout=(2, 5))
-end
Spatial Rain probability
memory_past_cat = 1
1
h = 1 (day before dry) or 2 (day before wet) $\mathbb{P}(Y = \text{Rain}\mid Z = k, H = h)$ with h = memory_past_cat
Inference of the historical hidden states
Viterbi algorithm
ẑ = viterbi(hmm_fit, Y, Y_past; n2t=n2t)
+end
Spatial Rain probability
memory_past_cat = 1
1
h = 1 (day before dry) or 2 (day before wet) $\mathbb{P}(Y = \text{Rain}\mid Z = k, H = h)$ with h = memory_past_cat
Inference of the historical hidden states
Viterbi algorithm
ẑ = viterbi(hmm_fit, Y, Y_past; n2t=n2t)
data_stations_z = map(data_stations) do df
@transform(df, :z = [fill(missing, local_order); ẑ])
@@ -204,7 +204,7 @@
hline!((1:year_nb) .+ 0.5, c=:black, legend=:none, lw=4)
ylims!(0.5, year_nb + 0.5)
pviterbi = yticks!(1:year_nb, string.(year_range[select_year]), size=(1000, 600))
-end
Adding Rain amounts to the model
Marginal distribution
We fit the marginals of the rain amount $R>0$ at each station $s$ and for each hidden state $Z$ independently. We use a mixture of exponential functions
\[g(r) = w \dfrac{e^{-{\frac {r}{\vartheta_1}}}}{\vartheta_1} + (1-w) \dfrac{e^{-{\frac {r}{\vartheta_2}}}}{\vartheta_2}.\]
whose parameters $w(t)$, $\vartheta_1(t)$ and $\vartheta_2(t)$ are smooth periodic functions of the day of the year.
@time "FitMLE RR" mix_allE = fit_mle_RR.(data_stations_z, K, local_order, mix₀=StochasticWeatherGenerators.mix_ini(T));
FitMLE RR: 90.308760 seconds (335.87 M allocations: 47.641 GiB, 3.70% gc time, 4.75% compilation time)
Thanks to Distributions.jl PR #1389 (September 2nd, 2021) and Julia multiple dispatch, the quantile function of Mixtures can be very efficiently computed.
Rain correlations
We fit a Gaussian copula to each pair of stations for joint rainy days only.
!!! warning For some hidden states corresponding to dry weather, it might happen that for some pair of stations, there are no simultaneous rain occurrences in a rain category $Z = k$. In that case a missing
coefficient is returned.
begin
+end
Adding Rain amounts to the model
Marginal distribution
We fit the marginals of the rain amount $R>0$ at each station $s$ and for each hidden state $Z$ independently. We use a mixture of exponential functions
\[g(r) = w \dfrac{e^{-{\frac {r}{\vartheta_1}}}}{\vartheta_1} + (1-w) \dfrac{e^{-{\frac {r}{\vartheta_2}}}}{\vartheta_2}.\]
whose parameters $w(t)$, $\vartheta_1(t)$ and $\vartheta_2(t)$ are smooth periodic functions of the day of the year.
@time "FitMLE RR" mix_allE = fit_mle_RR.(data_stations_z, K, local_order, mix₀=StochasticWeatherGenerators.mix_ini(T));
FitMLE RR: 93.888089 seconds (335.87 M allocations: 47.641 GiB, 4.08% gc time, 4.89% compilation time)
Thanks to Distributions.jl PR #1389 (September 2nd, 2021) and Julia multiple dispatch, the quantile function of Mixtures can be very efficiently computed.
Rain correlations
We fit a Gaussian copula to each pair of stations for joint rainy days only.
!!! warning For some hidden states corresponding to dry weather, it might happen that for some pair of stations, there are no simultaneous rain occurrences in a rain category $Z = k$. In that case a missing
coefficient is returned.
begin
Σ²RR = cov_RR(data_stations_z, K)
if K == 4
Σ²RR[2][6, 3] = Σ²RR[4][6, 3]
@@ -218,18 +218,18 @@
end
┌ Warning: ΣS_k[2], CartesianIndex{2}[CartesianIndex(6, 3), CartesianIndex(3, 6)] are missing
└ @ StochasticWeatherGenerators ~/work/StochasticWeatherGenerators.jl/StochasticWeatherGenerators.jl/src/rain/correlations.jl:80
┌ Warning: For Embrun j=6 and Marignane j=3 the hidden state Z=2 and Z=4 are pretty similar (dry), so we replace the `missing` coefficient of Z=2 with the one of Z = 4
-└ @ Main tuto_paper.md:455
Simulation
Now we are ready to generate samples from the SWG model.
Nb
is the number of realization
Nb = 1000
1000
Sample the (seasonal) HMM model and output the sequence of hidden states and multi-site dry/wet.
begin
+└ @ Main tuto_paper.md:456
Simulation
Now we are ready to generate samples from the SWG model.
Nb
is the number of realization
Nb = 1000
1000
Sample the (seasonal) HMM model and output the sequence of hidden states and multi-site dry/wet.
begin
zs = zeros(Int, N, Nb)
ys = zeros(Bool, N, D, Nb)
@time "Simulations Z, Y" for i in 1:Nb
zs[:, i], ys[:, :, i] = rand(hmm_fit, n2t; y_ini=Yall[1:local_order, :], z_ini=1, seq=true)
end
-end
Simulations Z, Y: 60.768468 seconds (328.74 M allocations: 32.194 GiB, 7.36% gc time, 1.29% compilation time)
Given the hidden states and dry/wet, it generates the rain amounts at each station (correlated with a Gaussian Copula).
begin
+end
Simulations Z, Y: 61.258844 seconds (328.74 M allocations: 32.194 GiB, 7.77% gc time, 1.35% compilation time)
Given the hidden states and dry/wet, it generates the rain amounts at each station (correlated with a Gaussian Copula).
begin
rs = zeros(D, N, Nb)
@time "Simulations RR>0" for i in 1:Nb
rs[:, :, i] = rand_RR(mix_allE, n2t, zs[:, i], ys[:, :, i]', Σ²RR)
end
-end
Simulations RR>0: 227.581107 seconds (285.46 M allocations: 37.954 GiB, 1.33% gc time, 0.72% compilation time)
Results
Spell distribution
select_month
to choose the month where to compute the spell distributions (summer month, winter, etc.) select_month = 1:12
corresponds to all months.
select_month = 1:12
+end
Simulations RR>0: 237.014362 seconds (285.46 M allocations: 37.953 GiB, 1.24% gc time, 0.73% compilation time)
Results
Spell distribution
select_month
to choose the month where to compute the spell distributions (summer month, winter, etc.) select_month = 1:12
corresponds to all months.
select_month = 1:12
idx_months = [findall(x -> month.(x) == m, data_stations[1][1+local_order:end, :DATE]) for m in 1:12]
@@ -256,7 +256,7 @@
[ylabel!(p_spell_dry[j], "PMF", ylabelfontsize=12) for j in staid_lat[[1, 6]]]
[title!(p_spell_dry[j], station_name[j], titlefontsize=13) for j = 1:D]
pall_spell_dry = plot(p_spell_dry[staid_lat]..., size=(3000 / 2.5, 1000 / 1.5), layout=(2, 5), left_margin=0.5cm, bottom_margin=0.275cm)
-end
Wet spell
begin
+end
Wet spell
begin
dry_or_wet = 2 # wet
p_spell_wet = [plot(ylims=(1e-4, 1e-0), tickfont=11, legendfontsize=13) for j = 1:D]
for j = 1:D
@@ -275,7 +275,7 @@
[ylabel!(p_spell_wet[j], "PMF", ylabelfontsize=12) for j in staid_lat[[1, 6]]]
[title!(p_spell_wet[j], station_name[j], titlefontsize=13) for j = 1:D]
pall_spell_wet = plot(p_spell_wet[staid_lat]..., size=(3000 / 2.5, 1000 / 1.5), layout=(2, 5), left_margin=0.5cm, bottom_margin=0.275cm)
-end
Rain
Interpretation: Mean Rain per weather regime $R > 0 \mid Z = k$.
We plot the empirical (strictly) positive mean rain amounts per weather regime. The results are smoothed using a cyclic_moving_average
with a time window of $\pm 15$ days and the Epanechnikov kernel.
begin
+end
Rain
Interpretation: Mean Rain per weather regime $R > 0 \mid Z = k$.
We plot the empirical (strictly) positive mean rain amounts per weather regime. The results are smoothed using a cyclic_moving_average
with a time window of $\pm 15$ days and the Epanechnikov kernel.
begin
p_rainpercat = [plot(tickfont=12, ylabelfontsize=14, titlefontsize=14, legendfontsize=13) for j = 1:D]
for j = 1:D
df_j = @chain data_stations_z[j] begin
@@ -301,7 +301,7 @@
) for j in 1:D]
[title!(p_rainpercat[j], station_name[j]) for j = 1:D]
plt_rain_cat_mix = plot(p_rainpercat[staid_lat]..., size=(3000 / 2.2, 1000 / 1.5), layout=(2, 5), left_margin=25px)
-end
Univariate Rain distributions
Historical vs Nb simulations distribution
begin
+end
Univariate Rain distributions
Historical vs Nb simulations distribution
begin
p_uniR = [plot(yaxis=:log10, ylims=(1e-4, 1e-0), tickfont=11, legendfontsize=13, titlefontsize=13) for j = 1:D]
for j = 1:D
dists_RR_positive_j = conversion_factor * [filter(!iszero, rs[j, :, i]) for i in 1:Nb]
@@ -319,7 +319,7 @@
[title!(p_uniR[j], station_name[j]) for j = 1:D]
pall_R = plot(p_uniR[staid_lat]..., size=(3000 / 2.5, 1000 / 1.5), layout=(2, 5), bottom_margin=11px, left_margin=15px)
-end
Monthly quantile amount
rh = reduce(hcat, [df[1+local_order:end, :RR] for df in data_stations])
+end
Monthly quantile amount
rh = reduce(hcat, [df[1+local_order:end, :RR] for df in data_stations])
month_rain_simu = [cum_monthly(rs[j, :, i], idx_all) for j in 1:D, i in 1:Nb];
@@ -346,7 +346,7 @@
[title!(p_month_RR[j], station_name[j], titlefontsize=12) for j = 1:D]
pall_month_RR = plot(p_month_RR[staid_lat]..., size=(1190, 500), layout=(2, 5), left_margin=19px)
-end
Correlations
Rain event dry/wet
cor_bin_hist = cor(reduce(hcat, [df.bin for df in data_stations]));
+end
Correlations
Rain event dry/wet
cor_bin_hist = cor(reduce(hcat, [df.bin for df in data_stations]));
cor_bin_mean_simu = mean(cor(ys[:, :, i]) for i in 1:Nb);
@@ -358,7 +358,7 @@
[ylims!(plots_cor_bin[i], -0.1, 1) for i in 1:1]
annotate!(0.2, 0.7, "MSE ≃ $(round(mean(abs2, vec_triu(cor_bin_hist) - vec_triu(cor_bin_mean_simu)), digits = 4))")
plot_cor_bin = plot(plots_cor_bin...)
-end
The largest pair correlation error for rain occurence comes from the pair
println("$(station_name[findmax(cor_bin_hist - cor_bin_mean_simu)[2][1]]) and $(station_name[findmax(cor_bin_hist - cor_bin_mean_simu)[2][2]])")
CHASSIRON and LA HAGUE
Rain amount
cor_hist = cor(reduce(hcat, [df.RR for df in data_stations]));
+end
The largest pair correlation error for rain occurence comes from the pair
println("$(station_name[findmax(cor_bin_hist - cor_bin_mean_simu)[2][1]]) and $(station_name[findmax(cor_bin_hist - cor_bin_mean_simu)[2][2]])")
CHASSIRON and LA HAGUE
Rain amount
cor_hist = cor(reduce(hcat, [df.RR for df in data_stations]));
corT_hist = corTail(reduce(hcat, [df.RR for df in data_stations]));
@@ -377,7 +377,7 @@
[xlims!(plots_cor[i], -0.1, 1) for i in 1:2]
[ylims!(plots_cor[i], -0.1, 1) for i in 1:2]
plot(plots_cor..., size=(800, 400), left_margin=15px)
-end
The largest pair correlation error for rain (zero and non zero amounts) comes from the pair
println("$(station_name[findmax(cor_hist - cor_mean_simu)[2][1]]) and $(station_name[findmax(cor_hist - cor_mean_simu)[2][2]])")
EMBRUN and MARIGNANE
Gaussian copula hypothesis
For a pair of stations, we transform the marginal $R_s>0$ to $\mathcal{N}(0,1)$. We compare the obtained bi-variate Normal distribution with the Mahalanobis distance to the theoretical $\chi^2(2)$-distriubtion.
corΣ = cov2cor.(Σ²RR)
+end
The largest pair correlation error for rain (zero and non zero amounts) comes from the pair
println("$(station_name[findmax(cor_hist - cor_mean_simu)[2][1]]) and $(station_name[findmax(cor_hist - cor_mean_simu)[2][2]])")
EMBRUN and MARIGNANE
Gaussian copula hypothesis
For a pair of stations, we transform the marginal $R_s>0$ to $\mathcal{N}(0,1)$. We compare the obtained bi-variate Normal distribution with the Mahalanobis distance to the theoretical $\chi^2(2)$-distriubtion.
corΣ = cov2cor.(Σ²RR)
begin
j1 = 10
j2 = 8
@@ -403,4 +403,4 @@
title!("$(station_name[j1]) vs $(station_name[j2])")
xlims!(0, 22)
ylims!(0, 22)
-end
This page was generated using Literate.jl.
Settings
This document was generated with Documenter.jl version 1.4.1 on Monday 24 June 2024. Using Julia version 1.10.4.
+end
This page was generated using Literate.jl.
Settings
This document was generated with Documenter.jl version 1.5.0 on Tuesday 2 July 2024. Using Julia version 1.10.4.
StochasticWeatherGenerators.jl
Documentation for StochasticWeatherGenerators.jl
In construction! Note that the main functions to fit HMM, AR etc are currently in SmoothPeriodicStatsModels.jl. This will change when these packages are rebased.
API
Fit function
SmoothPeriodicStatsModels.fit_mle_stations
— Functionfit_mle_stations(df::DataFrame, K, T, degree, local_order)
Given a DataFrame df
with known hidden states column z ∈ 1:K
. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0
the model is also autoregressive with its past.
StochasticWeatherGenerators.fit_mle_RR
— Functionfit_mle_RR(df::DataFrame, K, local_order; maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(T))
mix_allE = fit_mle_RR.(data_stations, K, local_order)
StochasticWeatherGenerators.fit_TN
— Functionfit_TN(df_full::DataFrame, 𝐃𝐞𝐠, T; kwargs...)
Fit the variable TN
(daily minimum temperature). In fact it fits the difference ΔT = TX - TN
to ensure a positive difference between TX
and TN
StochasticWeatherGenerators.fit_AR1
— Functionfit_AR1(df_full::DataFrame, X, 𝐃𝐞𝐠, T, K)
Fit a Seasonal AR(1) model of period T
and with K
hidden states for the variable X
of the DataFrame df_full
. $X_{n+1} = \mu(t_n) + \phi(t_n) X_t + \sigma(t_n)\xi$
Climate indexes
StochasticWeatherGenerators.VCX3
— FunctionVCX3(df; y_col, nb = 3)
Yearly Max of nb = 3
days sliding mean for y
for every year. By default, y_col
is the first column not with a Date
type
using DataFrames, Dates, RollingFunctions
+StochasticWeatherGenerators.jl · StochasticWeatherGenerators StochasticWeatherGenerators.jl
Documentation for StochasticWeatherGenerators.jl
Warning In construction! Note that the main functions to fit HMM, AR etc are currently in SmoothPeriodicStatsModels.jl. This will change when these packages are rebased.
API
Fit function
SmoothPeriodicStatsModels.fit_mle_stations
— Functionfit_mle_stations(df::DataFrame, K, T, degree, local_order)
Given a DataFrame df
with known hidden states column z ∈ 1:K
. The rain occurrences of the new station are fitted conditionally to the hidden state. For local_order>0
the model is also autoregressive with its past.
sourceStochasticWeatherGenerators.fit_mle_RR
— Functionfit_mle_RR(df::DataFrame, K, local_order; maxiter=5000, tol=2e-4, robust=true, silence=true, warm_start=true, display=:none, mix₀=mix_ini(T))
mix_allE = fit_mle_RR.(data_stations, K, local_order)
sourceStochasticWeatherGenerators.fit_TN
— Functionfit_TN(df_full::DataFrame, 𝐃𝐞𝐠, T; kwargs...)
Fit the variable TN
(daily minimum temperature). In fact it fits the difference ΔT = TX - TN
to ensure a positive difference between TX
and TN
sourceStochasticWeatherGenerators.fit_AR1
— Functionfit_AR1(df_full::DataFrame, X, 𝐃𝐞𝐠, T, K)
Fit a Seasonal AR(1) model of period T
and with K
hidden states for the variable X
of the DataFrame df_full
. $X_{n+1} = \mu(t_n) + \phi(t_n) X_t + \sigma(t_n)\xi$
sourceClimate indexes
StochasticWeatherGenerators.VCX3
— FunctionVCX3(df; y_col, nb = 3)
Yearly Max of nb = 3
days sliding mean for y
for every year. By default, y_col
is the first column not with a Date
type
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
-VCX3(df)
sourceVCX3(y, idxs; nb = 3)
Yearly Max of nb = 3
days sliding mean for y
. Here idxs
can be a vector of vector (or range) corresponds to the index of every year.
using DataFrames, Dates, RollingFunctions
+VCX3(df)
sourceVCX3(y, idxs; nb = 3)
Yearly Max of nb = 3
days sliding mean for y
. Here idxs
can be a vector of vector (or range) corresponds to the index of every year.
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
-VCX3(df.Temperature, idx_year)
sourceStochasticWeatherGenerators.cum_monthly
— Functioncum_monthly(y::AbstractArray, idxs)
using DataFrames, Dates, RollingFunctions
+VCX3(df.Temperature, idx_year)
sourceStochasticWeatherGenerators.cum_monthly
— Functioncum_monthly(y::AbstractArray, idxs)
using DataFrames, Dates, RollingFunctions
time_range = Date(1956):Day(1):Date(2019,12,31)
year_range = unique(year.(time_range))
df = DataFrame(:DATE => time_range, :Temperature => 20 .+ 5*randn(length(time_range)))
idx_year = [findall(x-> year.(x) == m, df[:, :DATE]) for m in year_range]
idx_month = [findall(x-> month.(x) == m, df[:, :DATE]) for m in 1:12]
idx_all = [intersect(yea, mon) for yea in idx_year, mon in idx_month]
-cum_monthly(y, idx_all)
sourceStochasticWeatherGenerators.corTail
— FunctioncorTail(x::AbstractMatrix, q = 0.95)
Compute the (symmetric averaged) tail index matrix M
of a vector x, i.e. M[i, j] = (ℙ(x[:,j] > Fxⱼ(q) ∣ x[:,i] > Fxᵢ(q)) + ℙ(x[:,i] > Fxᵢ(q) ∣ x[:,j] > Fxⱼ(q)))/2 where Fx(q) is the CDF of x. Note it uses the same convention as cor
function i.e. observations in rows and features in colums.
sourceStochasticWeatherGenerators.longuest_spell
— Functionlonguest_spell(y::AbstractArray; value=0)
Compute the length of the longuest consecutive sequence of value
in y
sourceStochasticWeatherGenerators.pmf_spell
— Functionpmf_spell(y::AbstractVector, value)
Return the distribution of spells (consecutive sequence of with the same value) length of value
in y
sourceSimulations
StochasticWeatherGenerators.rand_RR
— Functionrand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)
Generate a (nonhomegenous) sequence of length length(n2t)
of rain amounts conditionally to a given dry/wet matrix y
and (hidden) state sequence z
. Univariate distribution are given by mixs
while correlations are given by covariance matrix Σk.
sourceStochasticWeatherGenerators.rand_cond
— Functionrand_cond(ϵ, z, θ_uni, θ_cor, n2t, T)
Genererate a random variable conditionnaly to another one Using Copula
\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho (a - \mu_2), (1-\rho^2)\sigma_1^2 \right)\]
For two N(0,1)
\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\rho a , (1-\rho^2) \right)\]
sourceCorrelation utilities
For temperature
StochasticWeatherGenerators.cor_groupby
— FunctionCompute and fit the `cor` between two `var` with a smooth function for each `z`.
sourceStochasticWeatherGenerators.cor_groupbyTXTN
— FunctionCompute and fit the `cor` between `:TX` and `:TX-:TN` with a smooth function for each `z`.
sourceStochasticWeatherGenerators.cov_ar1
— FunctionFit residual to constant (in time) cov matrices for each weather regime Example:
cov_ar1(data_stations, ar1sTX, :TX, K)
sourceFor rain
StochasticWeatherGenerators.cov_RR
— Functioncov_RR(dfs::AbstractArray{<:DataFrame}, K)
Each df must have :DATE, :RR, :z (same :z for each df)
Σ²RR = cov_rain(data_stations, K)
sourceStochasticWeatherGenerators.Σ_Spearman2Pearson
— FunctionΣ_Spearman2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Spearman correlation #TODO Add ref
sourceStochasticWeatherGenerators.Σ_Kendall2Pearson
— FunctionΣ_Kendall2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Kendall correlation #TODO Add ref
sourceStochasticWeatherGenerators.joint_rain
— Functionjoint_rain(M::AbstractMatrix, j1::Integer, j2::Integer, r = 0)
Select all the rows of M
with values for (two) colums above some value r
.
sourceMap utilities
StochasticWeatherGenerators.distance_x_to_y
— Functiondistance_x_to_y
Distance in km between two stations.
sourceStochasticWeatherGenerators.dms_to_dd
— Functiondms_to_dd(l)
Convert Degrees Minutes Seconds to Decimal Degrees. Inputs are strings of the form
- LAT : Latitude in degrees:minutes:seconds (+: North, -: South)
- LON : Longitude in degrees:minutes:seconds (+: East, -: West)
sourceData manipulation
StochasticWeatherGenerators.collect_data_ECA
— Functioncollect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)
path
gives the path where all data files are stored in a vector
sourcecollect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
path
gives the path where all data files are stored in a vector- Filter the
DataFrame
s.t. date_start ≤ :DATE ≤ date_end
- var = "RR", "TX" etc.
portion_valid_data
is the portion of valid data we are ok with. If we don't want any missing, fix it to 1
.skipto
and header
for csv
files with meta informations/comments at the beginning of files. See CSV.jl
.return_nothing
if true
it will return nothing
is the file does not exists or does not have enough valid data.
sourceStochasticWeatherGenerators.select_in_range_df
— Functionselect_in_range_df(datas, start_Date, interval_Date, [portion])
Select station with some data availability in dates and quality (portion of valid data). Input is a vector
(array) of DataFrame
(one for each station for example) or a Dict
of DataFrame
. If 0 < portion ≤ 1
is specified, it will authorize some portion of data to be missing.
sourceStochasticWeatherGenerators.shortname
— Functionshortname(name::String)
Experimental function that returns only the most relevant part of a station name.
long_name = "TOULOUSE-BLAGNAC"
-shortname(long_name) # "TOULOUSE"
sourceGeneric utilities
StochasticWeatherGenerators.my_color
— Functionmy_color(k, K)
Convenience for plot colors pattern and hidden states to blue for k=1 (∼wetter) and orange for k=K (∼driest)
sourceSettings
This document was generated with Documenter.jl version 1.4.1 on Monday 24 June 2024. Using Julia version 1.10.4.
+cum_monthly(y, idx_all)
StochasticWeatherGenerators.corTail
— FunctioncorTail(x::AbstractMatrix, q = 0.95)
Compute the (symmetric averaged) tail index matrix M
of a vector x, i.e. M[i, j] = (ℙ(x[:,j] > Fxⱼ(q) ∣ x[:,i] > Fxᵢ(q)) + ℙ(x[:,i] > Fxᵢ(q) ∣ x[:,j] > Fxⱼ(q)))/2 where Fx(q) is the CDF of x. Note it uses the same convention as cor
function i.e. observations in rows and features in colums.
StochasticWeatherGenerators.longuest_spell
— Functionlonguest_spell(y::AbstractArray; value=0)
Compute the length of the longuest consecutive sequence of value
in y
StochasticWeatherGenerators.pmf_spell
— Functionpmf_spell(y::AbstractVector, value)
Return the distribution of spells (consecutive sequence of with the same value) length of value
in y
Simulations
StochasticWeatherGenerators.rand_RR
— Functionrand_RR(mixs::AbstractArray{<:MixtureModel}, n2t::AbstractVector, z::AbstractVector, y::AbstractMatrix, Σk::AbstractArray)
Generate a (nonhomegenous) sequence of length length(n2t)
of rain amounts conditionally to a given dry/wet matrix y
and (hidden) state sequence z
. Univariate distribution are given by mixs
while correlations are given by covariance matrix Σk.
StochasticWeatherGenerators.rand_cond
— Functionrand_cond(ϵ, z, θ_uni, θ_cor, n2t, T)
Genererate a random variable conditionnaly to another one Using Copula
\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\mu_1 + \dfrac{\sigma_1}{\sigma_2}\rho (a - \mu_2), (1-\rho^2)\sigma_1^2 \right)\]
For two N(0,1)
\[X_1 \mid X_2 = ϵ \sim \mathcal{N}\left(\rho a , (1-\rho^2) \right)\]
Correlation utilities
For temperature
StochasticWeatherGenerators.cor_groupby
— FunctionCompute and fit the `cor` between two `var` with a smooth function for each `z`.
StochasticWeatherGenerators.cor_groupbyTXTN
— FunctionCompute and fit the `cor` between `:TX` and `:TX-:TN` with a smooth function for each `z`.
StochasticWeatherGenerators.cov_ar1
— FunctionFit residual to constant (in time) cov matrices for each weather regime Example:
cov_ar1(data_stations, ar1sTX, :TX, K)
For rain
StochasticWeatherGenerators.cov_RR
— Functioncov_RR(dfs::AbstractArray{<:DataFrame}, K)
Each df must have :DATE, :RR, :z (same :z for each df)
Σ²RR = cov_rain(data_stations, K)
StochasticWeatherGenerators.Σ_Spearman2Pearson
— FunctionΣ_Spearman2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Spearman correlation #TODO Add ref
StochasticWeatherGenerators.Σ_Kendall2Pearson
— FunctionΣ_Kendall2Pearson(M::AbstractMatrix)
Compute the Pearson correlation coefficient i.e. the classic one from the Kendall correlation #TODO Add ref
StochasticWeatherGenerators.joint_rain
— Functionjoint_rain(M::AbstractMatrix, j1::Integer, j2::Integer, r = 0)
Select all the rows of M
with values for (two) colums above some value r
.
Map utilities
StochasticWeatherGenerators.distance_x_to_y
— Functiondistance_x_to_y
Distance in km between two stations.
StochasticWeatherGenerators.dms_to_dd
— Functiondms_to_dd(l)
Convert Degrees Minutes Seconds to Decimal Degrees. Inputs are strings of the form
- LAT : Latitude in degrees:minutes:seconds (+: North, -: South)
- LON : Longitude in degrees:minutes:seconds (+: East, -: West)
Data manipulation
StochasticWeatherGenerators.collect_data_ECA
— Functioncollect_data_ECA(STAID::Integer, path::String, var::String="RR"; skipto=19, header = 18)
path
gives the path where all data files are stored in a vector
collect_data_ECA(STAID, date_start::Date, date_end::Date, path::String, var::String="RR"; portion_valid_data=1, skipto=19, header = 18, return_nothing = true)
path
gives the path where all data files are stored in a vector- Filter the
DataFrame
s.t.date_start ≤ :DATE ≤ date_end
- var = "RR", "TX" etc.
portion_valid_data
is the portion of valid data we are ok with. If we don't want any missing, fix it to1
.skipto
andheader
forcsv
files with meta informations/comments at the beginning of files. SeeCSV.jl
.return_nothing
iftrue
it will returnnothing
is the file does not exists or does not have enough valid data.
StochasticWeatherGenerators.select_in_range_df
— Functionselect_in_range_df(datas, start_Date, interval_Date, [portion])
Select station with some data availability in dates and quality (portion of valid data). Input is a vector
(array) of DataFrame
(one for each station for example) or a Dict
of DataFrame
. If 0 < portion ≤ 1
is specified, it will authorize some portion of data to be missing.
StochasticWeatherGenerators.shortname
— Functionshortname(name::String)
Experimental function that returns only the most relevant part of a station name.
long_name = "TOULOUSE-BLAGNAC"
+shortname(long_name) # "TOULOUSE"
Generic utilities
StochasticWeatherGenerators.my_color
— Functionmy_color(k, K)
Convenience for plot colors pattern and hidden states to blue for k=1 (∼wetter) and orange for k=K (∼driest)
Settings
This document was generated with Documenter.jl version 1.5.0 on Tuesday 2 July 2024. Using Julia version 1.10.4.