Skip to content

Commit

Permalink
site updates with tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
kyle-messier committed Sep 26, 2024
1 parent 24a627c commit eaab0fd
Show file tree
Hide file tree
Showing 10 changed files with 1,672 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .quarto/idx/presentations.qmd.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion .quarto/preview/lock

This file was deleted.

3 changes: 3 additions & 0 deletions .quarto/xref/INDEX
Original file line number Diff line number Diff line change
Expand Up @@ -123,5 +123,8 @@
},
"publications/Clark_etal_2024.qmd": {
"Clark_etal_2024.html": "34efc6d8"
},
"presentations/tutorials/censored-MLE.Rmd": {
"censored-MLE.html": "f3f06cc0"
}
}
535 changes: 535 additions & 0 deletions images/GeoTox-v1-01.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
114 changes: 114 additions & 0 deletions images/GeoTox-v1-02.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions presentations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ date: today
date-format: long
---

### Tutorials

1. [Maximum Likelihood Estimation (MLE) and Censored Data Analysis](presentations/tutorials/censored-MLE.html)

### 2023

1. [Ichan School of Medicine, Department of Environmental Medicine and Public Health Grand Rounds Seminar, 2023-11-08](presentations/MSSM_GrandRounds/KPM_MSSM.pdf)
Expand Down
140 changes: 140 additions & 0 deletions presentations/tutorials/censored-MLE.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: "MLE Simulation and Censored Data Analysis"
output: html_document
author: "Kyle P Messier, PhD"
---

```{r, setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(dplyr)
```
# Maximum Likelihood Estimation (MLE) and Censored Data

In this brief tutorial, we will explore how to estimate parameters for a normal distribution given observed and censored data. Censored data is encountered in many fields, including environmental science, where we often have data below the method detection limit (MDL). That is generally a good thing for public health as it means the concentration of the contaminant is below a level that we can detect; however, it is a challenge for statisticians and data analysts as more sophisticated methods are needed. Here, we show how to produce these estimates using base R code and functions.

# Simulate ~ N(0,1) data

```{r}
x <- rnorm(10^6)
ggplot(data.frame(x = x), aes(x = x)) + geom_histogram()
```

# Create a negative log-likelihood function for normal data

The negative Log-Likelihood function for a normal distribution is the negative of the sum of the log of the density function of the normal distribution.

```{r}
rMLE <- function(params, y){
mu = params[1]
sigma = params[2]
return(-1 * sum(dnorm(y, mean = mu, sd = sigma, log = TRUE)))
}
```

# 100 percent observed MLE

We use starting values kind of far away from true values to make sure the optimization converges to the right values for this easy case.

```{r}
mle0 <- optim(par = c(2,3), rMLE, y = x)
results <- mle0$par
print(results)
```

Try with a different mean and standard deviation. Also, we start with starting values quite different to make sure the optimization converges to the right values for this easy case

```{r}
x3 <- rnorm(10^6, mean = 3, sd = 6)
mle3 <- optim(par = c(0,1), rMLE, y = x3)
results3 <- mle3$par
print(results3)
```

## Censored MLE function


```{r}
cMLE <- function(params, y.obs, y.cens = NULL){
mu = params[1]
sigma = params[2]
obs_ll <- sum(dnorm(y.obs, mean = mu, sd = sigma, log = TRUE))
cens_ll <- sum(pnorm(y.cens, mean = mu, sd = sigma, log.p = TRUE, lower.tail = TRUE))
nll <- -1 * (obs_ll + cens_ll)
return(nll)
}
```

## Check that the censored ML is working.

We will do a loop by censoring more and more data below the MDL. The MLE estimates should deviate more and more from the observed mean and standard deviation as we censor more data.

```{r}
MDL <- -1.5
p <- seq(0.1, 1, by = 0.1)
results <- list()
for (i in 1:length(p)){
# random sample of p percent of the data
idx <- c(1:length(x)) %in% sample.int(length(x), length(x) * p[i])
cens_data <- rep(MDL, length(x) * (1-p[i]))
obs_data <- x[idx]
mle <- optim(par = c(0,1), cMLE, y.obs = obs_data, y.cens = cens_data)
results[[i]] <- mle$par
print(paste("mu = ",results[[i]][1], "sd = ", results[[i]][2], "censored percentag = ", (100-p[i]*100)))
}
```

The results seem to make sense; As the censoring decrease the MLE estimates get closer to the true values. Additionally, as the censoring increases, the mean moves farther towards -Inf, away from the MDL.


# MLE Trick for Strictly Positive Parameters

We often have to estimate parameters that are strictly positive. For example, the mean concentration on the normal scale should be non-negative or a variance parameter (e.g. Kriging variance/sill).

One trick is to estimate the log of the parameter. Our estimate is then $exp(x)$. This way, the parameter is always positive. We could do constrained optimization, but the algorithms for unconstrained optimization are more efficient and more robust.


## MLE for observed data and log parameters
```{r}
rMLE_log <- function(params, y){
mu = exp(params[1])
sigma = exp(params[2])
return(-1 * sum(dnorm(y, mean = mu, sd = sigma, log = TRUE)))
}
# Simulate strictly positive log-normal data
z <- rlnorm(10^3, meanlog = 3, sdlog = 1)
mle_log <- optim(par = c(5,1), rMLE_log, y = z)
results <- exp(mle_log$par)
print(paste("mu = ",results[1], "sd = ", results[2]))
print(paste("observed mean = ", mean(z), "observed sd = ", sd(z)))
```
Loading

0 comments on commit eaab0fd

Please sign in to comment.