-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient Bayesian Calibration code #1
Comments
Test for chiller COP and capacity calibration, using the optimized Since it seems to never complete using 7-days hourly data, below I change it to 3-days 6-hourly data. library(eplusr)
library(epScan)
# use an example file from EnergyPlus v8.8 for demonstration here
path_idf <- file.path(eplus_config(8.8)$dir, "ExampleFiles", "RefBldgLargeOfficeNew2004_Chicago.idf")
path_epw <- file.path(eplus_config(8.8)$dir, "WeatherData", "USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw")
bc <- bayes_job(path_idf, path_epw)
# set parameters
bc$input("CoolSys1 Chiller 1", paste("chiller evaporator", c("inlet temperature", "outlet temperature", "mass flow rate")), "hourly")
bc$output("CoolSys1 Chiller 1", "chiller electric power", "hourly")
bc$param(
`CoolSys1 Chiller 1` = list(reference_cop = c(4, 6), reference_capacity = c(2.5e6, 3.0e6)),
.names = c("cop1", "cap1"), .num_sim = 5
)
# check sample values
bc$samples()
# run simulations
bc$eplus_run(dir = tempdir(), run_period = list("example", 7, 1, 7, 3), echo = FALSE)
# set simulation data
bc$data_sim("6 hour")
# use the seed model to get field data
## clone the seed model
seed <- bc$seed()$clone()
## remove existing RunPeriod objects
seed$RunPeriod <- NULL
## set run period as the same as in `$eplus_run()`
seed$add(RunPeriod = list("test", 7, 1, 7, 3))
seed$SimulationControl$set(
`Run Simulation for Sizing Periods` = "No",
`Run Simulation for Weather File Run Periods` = "Yes"
)
## save the model to tempdir
seed$save(tempfile(fileext = ".idf"))
## run
job <- seed$run(bc$weather(), echo = FALSE)
## get data
fan_power <- epScan:::report_dt_aggregate(job$report_data(name = bc$output()$variable_name, all = TRUE), "6 hour")
fan_power <- eplusr:::report_dt_to_wide(fan_power)
# set field data
bc$data_field(fan_power[, -"Date/Time"])
# check stan input
stan_data <- bc$data_bc()
system.time(res <- bc$stan_run(iter = 500, chains = 2)) |
It completes after more than half hour. Definitely not a good results. Warnings# Warning messages:
# 1: There were 500 transitions after warmup that exceeded the maximum tree
# depth. Increase max_treedepth above 10. See
# http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
# 2: There were 1 chains where the estimated Bayesian Fraction of Missing I
# nformation was low. See
# http://mc-stan.org/misc/warnings.html#bfmi-low
# 3: Examine the pairs() plot to diagnose sampling problems
# 4: The largest R-hat is 2.99, indicating chains have not mixed.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#r-hat
# 5: Bulk Effective Samples Size (ESS) is too low, indicating posterior mea
# ns and medians may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#bulk-ess
# 6: Tail Effective Samples Size (ESS) is too low, indicating posterior var
# iances and tail quantiles may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#tail-ess trace plotrstan::stan_trace(res$fit) histogramrstan::stan_hist(res$fit) |
rstan::check_hmc_diagnostics(res$fit)
# Divergences:
# 0 of 500 iterations ended with a divergence.
#
# Tree depth:
# 500 of 500 iterations saturated the maximum tree depth of 10 (100%).
# Try increasing 'max_treedepth' to avoid saturation.
#
# Energy:
# E-BFMI indicated possible pathological behavior:
# Chain 2: E-BFMI = 0.189
# E-BFMI below 0.2 indicates you may need to reparameterize your model. |
FYI: https://mc-stan.org/misc/warnings.html Maximum treedepth exceededWarnings about hitting the maximum treedepth are not as serious as warnings about divergent transitions. While divergent transitions are a validity concern, hitting the maximum treedepth is an efficiency concern. Configuring the No-U-Turn-Sampler (the variant of HMC used by Stan) involves putting a cap on the depth of the trees that it evaluates during each iteration (for details on this see the Hamiltonian Monte Carlo Sampling chapter in the Stan manual). This is controlled through a maximum depth parameter max_treedepth. When the maximum allowed tree depth is reached it indicates that NUTS is terminating prematurely to avoid excessively long execution time. Transitions that hit the maximum treedepth are plotted in yellow in the pairs() plot. Recommendations:Increase the maximum allowed treedepth. In RStan, max_treedepth is one of the parameters that you can include in the optional control list passed to the stan or sampling functions. For example, to set max_treedepth to 15 (the default is 10) you would do this: stan(..., control = list(max_treedepth = 15)) BFMI lowYou may see a warning that says some number of chains had an estimated Bayesian Fraction of Missing Information (BFMI) that was too low. This implies that the adaptation phase of the Markov Chains did not turn out well and those chains likely did not explore the posterior distribution efficiently. For more details on this diagnostic, see https://arxiv.org/abs/1604.00695. Recommendations:Look at the pairs plot to see which primitive parameters are correlated with the energy__ margin. There should be a negative relationship between lp__ and energy__ in the pairs plot, which is not a concern because lp__ is the logarithm of the posterior kernel rather than a primitive parameter. |
I was wondering whether it is due to the way the problem is formulated. The "field data used now" is the simulation output. However, in the Stan code, we model a discrepancy term and an observation error term that does not really exist in reality since we are using simulated data. Perhaps we need to add discrepancy and noise to the "field data" |
So it means that there is no "observation error" term in this case. What kind of noise you would suggest to add to the simulated data? |
I'm thinking to add Gaussian noise. Discrepancy error is the one that is difficult to simulate. Might want to consider using the test dataset in bc-stan instead as an illustration of Bayesian calibration |
OK. Will use the test dataset in bc-stan to see if things get better. At the mean time, I added a Gaussian noise with 5% mean and sd for testing. field_data <- fan_power[, -"Date/Time"][
, lapply(.SD, function (x) x + rnorm(length(x), mean = mean(x) * 0.05, sd = 0.05 * sd(x)))][
, lapply(.SD, function (x) {x[x < 0] <- 0; x})
] |
|
Oh yes. You are right. |
Actually, this wired trace plot was caused by a bug in epScan, which did not supply standardized library(eplusr)
library(epScan)
# use an example file from EnergyPlus v8.8 for demonstration here
path_idf <- file.path(eplus_config(8.8)$dir, "ExampleFiles", "RefBldgLargeOfficeNew2004_Chicago.idf")
path_epw <- file.path(eplus_config(8.8)$dir, "WeatherData", "USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw")
bc <- bayes_job(path_idf, path_epw)
# set parameters
bc$input("CoolSys1 Chiller 1", paste("chiller evaporator", c("inlet temperature", "outlet temperature", "mass flow rate")), "hourly")
bc$output("CoolSys1 Chiller 1", "chiller electric power", "hourly")
bc$param(
`CoolSys1 Chiller 1` = list(reference_cop = c(4, 6), reference_capacity = c(2.5e6, 3.0e6)),
.names = c("cop1", "cap1"), .num_sim = 5
)
# check sample values
bc$samples()
# run simulations
bc$eplus_run(dir = tempdir(), run_period = list("example", 7, 1, 7, 3), echo = FALSE)
# set simulation data
bc$data_sim("6 hour")
# use the seed model to get field data
## clone the seed model
seed <- bc$seed()$clone()
## remove existing RunPeriod objects
seed$RunPeriod <- NULL
## set run period as the same as in `$eplus_run()`
seed$add(RunPeriod = list("test", 7, 1, 7, 3))
seed$SimulationControl$set(
`Run Simulation for Sizing Periods` = "No",
`Run Simulation for Weather File Run Periods` = "Yes"
)
## save the model to tempdir
seed$save(tempfile(fileext = ".idf"))
## run
job <- seed$run(bc$weather(), echo = FALSE)
## get data
fan_power <- epScan:::report_dt_aggregate(job$report_data(name = bc$output()$variable_name, all = TRUE), "6 hour")
fan_power <- eplusr:::report_dt_to_wide(fan_power)
# add Gaussian noice
fan_power <- fan_power[, -"Date/Time"][
, lapply(.SD, function (x) x + rnorm(length(x), sd = 0.05 * sd(x)))][
, lapply(.SD, function (x) {x[x < 0] <- 0; x})
]
# set field data
data_field <- bc$data_field(fan_power)
# check stan input
stan_data <- bc$data_bc()
# run stan
res <- bc$stan_run(iter = 300, chains = 3) trace plotrstan::stan_trace(res$fit) histogramrstan::stan_trace(res$fit) prediction## get yf
yf <- data_field$output[, .SD, .SDcols = -c("case", "Date/Time")][, xf := .I]
data.table::setnames(yf, c("yf", "xf"))
## get predictive inference y_pred and convert back to original scale
y_pred <- data.table::melt.data.table(res$y_pred,
variable.name = "xf", value.name = "pred", variable.factor = FALSE
)
y_pred[, xf := as.integer(gsub("V", "", xf))]
## combine
plot_data <- yf[y_pred, on = "xf"]
## plot
ggplot(data = plot_data, aes(y = pred, x = xf, group = xf)) +
geom_boxplot(outlier.size = 0.2) +
geom_point(data = yf, aes(x = xf, y = yf), color = "#D55E00", size = 0.8) |
Evaluate improvement in code efficiency for the following
The text was updated successfully, but these errors were encountered: