From aea3306e3ad762e57246f3f14d5d3943f489183e Mon Sep 17 00:00:00 2001 From: BERENZ Date: Thu, 20 Feb 2025 12:01:44 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20ncn-fore?= =?UTF-8?q?igners/nonprobsvy@8df1bde5dcbe9fa7f2c1e93bc6853ccad10cf9b3=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- news/index.html | 12 +++ pkgdown.yml | 2 +- reference/index.html | 24 ++++++ reference/model_glm.html | 169 ++++++++++++++++++++++++++++++++++++++ reference/model_nn.html | 169 ++++++++++++++++++++++++++++++++++++++ reference/model_npar.html | 169 ++++++++++++++++++++++++++++++++++++++ reference/model_pmm.html | 169 ++++++++++++++++++++++++++++++++++++++ reference/model_ps.html | 4 +- search.json | 2 +- sitemap.xml | 4 + 10 files changed, 720 insertions(+), 4 deletions(-) create mode 100644 reference/model_glm.html create mode 100644 reference/model_nn.html create mode 100644 reference/model_npar.html create mode 100644 reference/model_pmm.html diff --git a/news/index.html b/news/index.html index 8c70a29..8768252 100644 --- a/news/index.html +++ b/news/index.html @@ -62,6 +62,18 @@

Features

Bugfixes

diff --git a/pkgdown.yml b/pkgdown.yml index a40dc88..16fe571 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -2,7 +2,7 @@ pandoc: 3.1.11 pkgdown: 2.1.1 pkgdown_sha: ~ articles: {} -last_built: 2025-02-19T12:50Z +last_built: 2025-02-20T12:01Z urls: reference: https://ncn-foreigners.github.io/nonprobsvy/reference article: https://ncn-foreigners.github.io/nonprobsvy/articles diff --git a/reference/index.html b/reference/index.html index 99e4187..a03f1b7 100644 --- a/reference/index.html +++ b/reference/index.html @@ -85,6 +85,30 @@

All functionsmodel_glm() + + +
Function for the mass imputation model using glm
+
+ + model_nn() + +
+
Function for the mass imputation model using nn method
+
+ + model_npar() + +
+
Function for the mass imputation model using nonparametric method
+
+ + model_pmm() + +
+
Function for the mass imputation model using pmm method
+
+ model_ps()
diff --git a/reference/model_glm.html b/reference/model_glm.html new file mode 100644 index 0000000..333dd9d --- /dev/null +++ b/reference/model_glm.html @@ -0,0 +1,169 @@ + +Function for the mass imputation model using glm — model_glm • nonprobsvy + Skip to contents + + +
+
+
+ +
+

Modle for the outcome for the mass imputation estimator

+
+ +
+

Usage

+
model_glm(
+  y_nons,
+  X_nons,
+  X_rand,
+  weights,
+  svydesign,
+  family_outcome,
+  start_outcome,
+  vars_selection,
+  pop_totals,
+  pop_size,
+  control_outcome,
+  verbose,
+  se
+)
+
+ +
+

Arguments

+ + +
y_nons
+

target variable from non-probability sample

+ + +
X_nons
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
X_rand
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
weights
+

case / frequency weights from non-probability sample

+ + +
svydesign
+

a svydesign object

+ + +
family_outcome
+

family for the glm model

+ + +
start_outcome
+

start parameters

+ + +
vars_selection
+

whether variable selection should be conducted

+ + +
pop_totals
+

population totals from the nonprob function

+ + +
pop_size
+

population size from the nonprob function

+ + +
control_outcome
+

controls passed by the control_out function

+ + +
verbose
+

parameter passed from the main nonprob function

+ + +
se
+

whether standard errors should be calculated

+ +
+
+

Value

+

an nonprob_model class which is a list with the following entries

+
model_fitted
+

fitted model either an glm.fit or cv.ncvreg object

+ +
y_nons_pred
+

predicted values for the non-probablity sample

+ +
y_rand_pred
+

predicted values for the probability sample or population totals

+ +
coefficients
+

coefficients for the model (if available)

+ +
svydesign
+

an updated surveydesign2 object (new column y_hat_MI is added)

+ +
y_mi_hat
+

estimated population mean for the target variable

+ +
vars_selection
+

whether variable selection was performed

+ +
var_prob
+

variance for the probability sample component (if available)

+ +
var_nonprob
+

variance for the non-probability sampl component

+ +
model
+

model type (character "glm")

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/model_nn.html b/reference/model_nn.html new file mode 100644 index 0000000..7386847 --- /dev/null +++ b/reference/model_nn.html @@ -0,0 +1,169 @@ + +Function for the mass imputation model using nn method — model_nn • nonprobsvy + Skip to contents + + +
+
+
+ +
+

Model for the outcome for the mass imputation estimator

+
+ +
+

Usage

+
model_nn(
+  y_nons,
+  X_nons,
+  X_rand,
+  weights,
+  svydesign,
+  family_outcome,
+  start_outcome,
+  vars_selection,
+  pop_totals,
+  pop_size,
+  control_outcome,
+  verbose,
+  se
+)
+
+ +
+

Arguments

+ + +
y_nons
+

target variable from non-probability sample

+ + +
X_nons
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
X_rand
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
weights
+

case / frequency weights from non-probability sample

+ + +
svydesign
+

a svydesign object

+ + +
family_outcome
+

family for the glm model

+ + +
start_outcome
+

start parameters

+ + +
vars_selection
+

whether variable selection should be conducted

+ + +
pop_totals
+

population totals from the nonprob function

+ + +
pop_size
+

population size from the nonprob function

+ + +
control_outcome
+

controls passed by the control_out function

+ + +
verbose
+

parameter passed from the main nonprob function

+ + +
se
+

whether standard errors should be calculated

+ +
+
+

Value

+

an nonprob_model class which is a list with the following entries

+
model_fitted
+

fitted model either an glm.fit or cv.ncvreg object

+ +
y_nons_pred
+

predicted values for the non-probablity sample

+ +
y_rand_pred
+

predicted values for the probability sample or population totals

+ +
coefficients
+

coefficients for the model (if available)

+ +
svydesign
+

an updated surveydesign2 object (new column y_hat_MI is added)

+ +
y_mi_hat
+

estimated population mean for the target variable

+ +
vars_selection
+

whether variable selection was performed

+ +
var_prob
+

variance for the probability sample component (if available)

+ +
var_nonprob
+

variance for the non-probability sampl component

+ +
model
+

model type (character "nn")

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/model_npar.html b/reference/model_npar.html new file mode 100644 index 0000000..f1786a0 --- /dev/null +++ b/reference/model_npar.html @@ -0,0 +1,169 @@ + +Function for the mass imputation model using nonparametric method — model_npar • nonprobsvy + Skip to contents + + +
+
+
+ +
+

Model for the outcome for the mass imputation estimator

+
+ +
+

Usage

+
model_npar(
+  y_nons,
+  X_nons,
+  X_rand,
+  weights,
+  svydesign,
+  family_outcome,
+  start_outcome,
+  vars_selection,
+  pop_totals,
+  pop_size,
+  control_outcome,
+  verbose,
+  se
+)
+
+ +
+

Arguments

+ + +
y_nons
+

target variable from non-probability sample

+ + +
X_nons
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
X_rand
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
weights
+

case / frequency weights from non-probability sample

+ + +
svydesign
+

a svydesign object

+ + +
family_outcome
+

family for the glm model

+ + +
start_outcome
+

start parameters

+ + +
vars_selection
+

whether variable selection should be conducted

+ + +
pop_totals
+

population totals from the nonprob function

+ + +
pop_size
+

population size from the nonprob function

+ + +
control_outcome
+

controls passed by the control_out function

+ + +
verbose
+

parameter passed from the main nonprob function

+ + +
se
+

whether standard errors should be calculated

+ +
+
+

Value

+

an nonprob_model class which is a list with the following entries

+
model_fitted
+

fitted model either an glm.fit or cv.ncvreg object

+ +
y_nons_pred
+

predicted values for the non-probablity sample

+ +
y_rand_pred
+

predicted values for the probability sample or population totals

+ +
coefficients
+

coefficients for the model (if available)

+ +
svydesign
+

an updated surveydesign2 object (new column y_hat_MI is added)

+ +
y_mi_hat
+

estimated population mean for the target variable

+ +
vars_selection
+

whether variable selection was performed

+ +
var_prob
+

variance for the probability sample component (if available)

+ +
var_nonprob
+

variance for the non-probability sampl component

+ +
model
+

model type (character "npar")

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/model_pmm.html b/reference/model_pmm.html new file mode 100644 index 0000000..65813f4 --- /dev/null +++ b/reference/model_pmm.html @@ -0,0 +1,169 @@ + +Function for the mass imputation model using pmm method — model_pmm • nonprobsvy + Skip to contents + + +
+
+
+ +
+

Model for the outcome for the mass imputation estimator

+
+ +
+

Usage

+
model_pmm(
+  y_nons,
+  X_nons,
+  X_rand,
+  weights,
+  svydesign,
+  family_outcome,
+  start_outcome,
+  vars_selection,
+  pop_totals,
+  pop_size,
+  control_outcome,
+  verbose,
+  se
+)
+
+ +
+

Arguments

+ + +
y_nons
+

target variable from non-probability sample

+ + +
X_nons
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
X_rand
+

a model.matrix with auxiliary variables from non-probability sample

+ + +
weights
+

case / frequency weights from non-probability sample

+ + +
svydesign
+

a svydesign object

+ + +
family_outcome
+

family for the glm model

+ + +
start_outcome
+

start parameters

+ + +
vars_selection
+

whether variable selection should be conducted

+ + +
pop_totals
+

population totals from the nonprob function

+ + +
pop_size
+

population size from the nonprob function

+ + +
control_outcome
+

controls passed by the control_out function

+ + +
verbose
+

parameter passed from the main nonprob function

+ + +
se
+

whether standard errors should be calculated

+ +
+
+

Value

+

an nonprob_model class which is a list with the following entries

+
model_fitted
+

fitted model either an glm.fit or cv.ncvreg object

+ +
y_nons_pred
+

predicted values for the non-probablity sample

+ +
y_rand_pred
+

predicted values for the probability sample or population totals

+ +
coefficients
+

coefficients for the model (if available)

+ +
svydesign
+

an updated surveydesign2 object (new column y_hat_MI is added)

+ +
y_mi_hat
+

estimated population mean for the target variable

+ +
vars_selection
+

whether variable selection was performed

+ +
var_prob
+

variance for the probability sample component (if available)

+ +
var_nonprob
+

variance for the non-probability sampl component

+ +
model
+

model type (character "pmm")

+ + +
+ +
+ + +
+ + + +
+ + + + + + + diff --git a/reference/model_ps.html b/reference/model_ps.html index 6a742e8..b40afe1 100644 --- a/reference/model_ps.html +++ b/reference/model_ps.html @@ -144,8 +144,8 @@

Examples#> (weights_rand * exp(eta2))) #> } #> } -#> <bytecode: 0x564de4e2eb78> -#> <environment: 0x564deb6ba310> +#> <bytecode: 0x55e9dd9d16c8> +#> <environment: 0x55e9e63c5d70> diff --git a/search.json b/search.json index 0193b41..65a14fc 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://ncn-foreigners.github.io/nonprobsvy/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 ncn-foreigners Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Łukasz Chrostowski. Author, contributor. Maciej Beręsewicz. Author, maintainer. Piotr Chlebicki. Author, contributor.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Chrostowski Ł, Beręsewicz M, Chlebicki P (2025). Inference Based Non-Probability Samples. R package version 0.2, https://github.com/ncn-foreigners/nonprobsvy.","code":"@Manual{nonprobsy, title = {Inference Based on Non-Probability Samples}, author = {Łukasz Chrostowski and Maciej Beręsewicz and Piotr Chlebicki}, note = {R package version 0.2}, year = {2025}, url = {https://github.com/ncn-foreigners/nonprobsvy}, }"},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"basic-information","dir":"","previous_headings":"","what":"Basic information","title":"Inference Based on Non-Probability Samples","text":"goal package provide R users access modern methods non-probability samples auxiliary information population probability sample available: inverse probability weighting estimators possible calibration constraints (Chen, Li, Wu 2020), mass imputation estimators based nearest neighbours (Yang, Kim, Hwang 2021), predictive mean matching regression imputation (Kim et al. 2021), doubly robust estimators (Chen, Li, Wu 2020) bias minimization (Yang, Kim, Song 2020). package allows : variable section high-dimensional space using SCAD (Yang, Kim, Song 2020), Lasso MCP penalty (via ncvreg, Rcpp, RcppArmadillo packages), estimation variance using analytical bootstrap approach (see Wu (2023)), integration survey package probability sample available Lumley (2023), different links selection (logit, probit cloglog) outcome (gaussian, binomial poisson) variables. Details use package can found: draft (proofread) version book Modern inference methods non-probability samples R, example codes reproduce papers available github repository software tutorials.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Inference Based on Non-Probability Samples","text":"can install recent version nonprobsvy package main branch Github : install stable version CRAN development version dev branch","code":"remotes::install_github(\"ncn-foreigners/nonprobsvy\") install.packages(\"nonprobsvy\") remotes::install_github(\"ncn-foreigners/nonprobsvy@dev\")"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"basic-idea","dir":"","previous_headings":"","what":"Basic idea","title":"Inference Based on Non-Probability Samples","text":"Consider following setting two samples available: non-probability (denoted SAS_A ) probability (denoted SBS_B) set auxiliary variables (denoted 𝐗\\boldsymbol{X}) available sources YY 𝐝\\boldsymbol{d} (𝐰\\boldsymbol{w}) present probability sample.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"basic-functionalities","dir":"","previous_headings":"","what":"Basic functionalities","title":"Inference Based on Non-Probability Samples","text":"Suppose YY target variable, 𝐗\\boldsymbol{X} matrix auxiliary variables, RR inclusion indicator. , interested estimating mean τ‾Y\\bar{\\tau}_Y sum τY\\tau_Y target variable given observed data set (yk,𝐱k,Rk)(y_k, \\boldsymbol{x}_k, R_k), can approach problem possible scenarios: unit-level data available non-probability sample SAS_{}, .e. (yk,𝐱k)(y_{k}, \\boldsymbol{x}_{k}) available units k∈SAk \\S_{}, population-level data available 𝐱1,...,𝐱p\\boldsymbol{x}_{1}, ..., \\boldsymbol{x}_{p}, denoted τx1,τx2,...,τxp\\tau_{x_{1}}, \\tau_{x_{2}}, ..., \\tau_{x_{p}} population size NN known. can also consider situations population data estimated (e.g. basis survey access), unit-level data available non-probability sample SAS_A probability sample SBS_B, .e. (yk,𝐱k,Rk)(y_k, \\boldsymbol{x}_k, R_k) determined data. determined data: Rk=1R_k=1 k∈SAk \\S_A otherwise Rk=0R_k=0, yky_k observed sample SAS_A 𝐱k\\boldsymbol{x}_k observed SAS_A SBS_B,","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"when-unit-level-data-is-available-for-non-probability-survey-only","dir":"","previous_headings":"Basic functionalities","what":"When unit-level data is available for non-probability survey only","title":"Inference Based on Non-Probability Samples","text":"","code":"nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, pop_totals = c(`(Intercept)`= N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), method_outcome = \"glm\", family_outcome = \"gaussian\" ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, pop_totals = c(`(Intercept)` = N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), method_selection = \"logit\" ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, pop_totals = c(`(Intercept)`= N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1) ) nonprob( selection = ~ x1 + x2 + ... + xk, outcome = y ~ x1 + x2 + …, + xk, pop_totals = c(`(Intercept)` = N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\" )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"when-unit-level-data-are-available-for-both-surveys","dir":"","previous_headings":"Basic functionalities","what":"When unit-level data are available for both surveys","title":"Inference Based on Non-Probability Samples","text":"","code":"nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\" ) nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"nn\", family_outcome = \"gaussian\", control_outcome = control_outcome(k = 2) ) nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"pmm\", family_outcome = \"gaussian\" ) nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"pmm\", family_outcome = \"gaussian\", control_outcome = control_out(penalty = \"lasso\"), control_inference = control_inf(vars_selection = TRUE) ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, svydesign = prob, method_selection = \"logit\" ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, svydesign = prob, method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1) ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, svydesign = prob, method_outcome = \"pmm\", family_outcome = \"gaussian\", control_inference = control_inf(vars_selection = TRUE) ) nonprob( selection = ~ x1 + x2 + ... + xk, outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\" ) nonprob( selection = ~ x1 + x2 + ... + xk, outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\", control_inference = control_inf( vars_selection = TRUE, bias_correction = TRUE ) )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Inference Based on Non-Probability Samples","text":"Simulate example data following paper: Kim, Jae Kwang, Zhonglei Wang. “Sampling techniques big data analysis.” International Statistical Review 87 (2019): S177-S191 [section 5.2] Declare svydesign object survey package Estimate population mean y1 based doubly robust estimator using IPW calibration constraints. Results Mass imputation estimator Results Inverse probability weighting estimator Results","code":"library(survey) library(nonprobsvy) set.seed(1234567890) N <- 1e6 ## 1000000 n <- 1000 x1 <- rnorm(n = N, mean = 1, sd = 1) x2 <- rexp(n = N, rate = 1) epsilon <- rnorm(n = N) # rnorm(N) y1 <- 1 + x1 + x2 + epsilon y2 <- 0.5*(x1 - 0.5)^2 + x2 + epsilon p1 <- exp(x2)/(1+exp(x2)) p2 <- exp(-0.5+0.5*(x2-2)^2)/(1+exp(-0.5+0.5*(x2-2)^2)) flag_bd1 <- rbinom(n = N, size = 1, prob = p1) flag_srs <- as.numeric(1:N %in% sample(1:N, size = n)) base_w_srs <- N/n population <- data.frame(x1,x2,y1,y2,p1,p2,base_w_srs, flag_bd1, flag_srs) base_w_bd <- N/sum(population$flag_bd1) sample_prob <- svydesign(ids= ~1, weights = ~ base_w_srs, data = subset(population, flag_srs == 1)) result_dr <- nonprob( selection = ~ x2, outcome = y1 ~ x1 + x2, data = subset(population, flag_bd1 == 1), svydesign = sample_prob ) summary(result_dr) #> #> Call: #> nonprob(data = subset(population, flag_bd1 == 1), selection = ~x2, #> outcome = y1 ~ x1 + x2, svydesign = sample_prob) #> #> ------------------------- #> Estimated population mean: 2.95 with overall std.err of: 0.04195 #> And std.err for nonprobability and probability samples being respectively: #> 0.000783 and 0.04195 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y1 2.867789 3.03224 #> #> #> Based on: Doubly-Robust method #> For a population of estimate size: 1025063 #> Obtained on a nonprobability sample of size: 693011 #> With an auxiliary probability sample of size: 1000 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 0.996282 0.002139 465.8 <2e-16 *** #> x1 1.001931 0.001200 835.3 <2e-16 *** #> x2 0.999125 0.001098 910.2 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -0.498995 0.003702 -134.8 <2e-16 *** #> x2 1.885627 0.005303 355.6 <2e-16 *** #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.000 1.071 1.313 1.479 1.798 2.647 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.99999 0.06603 0.23778 0.26046 0.44358 0.62222 #> #> AIC: 1010622 #> BIC: 1010645 #> Log-Likelihood: -505309 on 694009 Degrees of freedom result_mi <- nonprob( outcome = y1 ~ x1 + x2, data = subset(population, flag_bd1 == 1), svydesign = sample_prob ) summary(result_mi) #> #> Call: #> nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 ~ #> x1 + x2, svydesign = sample_prob) #> #> ------------------------- #> Estimated population mean: 2.95 with overall std.err of: 0.04203 #> And std.err for nonprobability and probability samples being respectively: #> 0.001227 and 0.04201 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y1 2.867433 3.032186 #> #> #> Based on: Mass Imputation method #> For a population of estimate size: 1e+06 #> Obtained on a nonprobability sample of size: 693011 #> With an auxiliary probability sample of size: 1000 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 0.996282 0.002139 465.8 <2e-16 *** #> x1 1.001931 0.001200 835.3 <2e-16 *** #> x2 0.999125 0.001098 910.2 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> ------------------------- result_ipw <- nonprob( selection = ~ x2, target = ~y1, data = subset(population, flag_bd1 == 1), svydesign = sample_prob) summary(result_ipw) #> #> Call: #> nonprob(data = subset(population, flag_bd1 == 1), selection = ~x2, #> target = ~y1, svydesign = sample_prob) #> #> ------------------------- #> Estimated population mean: 2.925 with overall std.err of: 0.04999 #> And std.err for nonprobability and probability samples being respectively: #> 0.001325 and 0.04997 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y1 2.826805 3.022761 #> #> #> Based on: Inverse probability weighted method #> For a population of estimate size: 1025063 #> Obtained on a nonprobability sample of size: 693011 #> With an auxiliary probability sample of size: 1000 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -0.498995 0.003702 -134.8 <2e-16 *** #> x2 1.885627 0.005303 355.6 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.000 1.071 1.313 1.479 1.798 2.647 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.99999 0.06603 0.23778 0.26046 0.44358 0.62222 #> #> AIC: 1010622 #> BIC: 1010645 #> Log-Likelihood: -505309 on 694009 Degrees of freedom"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"funding","dir":"","previous_headings":"","what":"Funding","title":"Inference Based on Non-Probability Samples","text":"Work package supported National Science Centre, OPUS 20 grant . 2020/39/B/HS4/00941.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":null,"dir":"Reference","previous_headings":"","what":"Admin data (non-probability survey) — admin","title":"Admin data (non-probability survey) — admin","text":"subset Central Job Offers Database, voluntary administrative data set (non-probability sample). data slightly manipulated ensure relationships preserved, aligned. information CBOP, please refer : https://oferty.praca.gov.pl/.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Admin data (non-probability survey) — admin","text":"","code":"admin"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Admin data (non-probability survey) — admin","text":"single data.frame 9,344 rows 6 columns id Identifier entity (company: legal local). private Whether company private (1) public (0) entity. size size entity: S – small (9 employees), M – medium (10-49) L – large (49). nace main NACE code given entity: C, D.E, F, G, H, , J, K.L, M, N, O, P, Q R.S (14 levels, 3 combined: D E, K L, R S). region region Poland (16 levels: 02, 04, ..., 32). single_shift Whether entity seeks employees single shift.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Admin data (non-probability survey) — admin","text":"","code":"data(\"admin\") head(admin) #> id private size nace region single_shift #> 1 j_1 0 L P 30 FALSE #> 2 j_2 0 L O 14 TRUE #> 3 j_3 0 L O 04 TRUE #> 4 j_4 0 L O 24 TRUE #> 5 j_5 0 L O 04 TRUE #> 6 j_6 1 L C 28 FALSE"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":null,"dir":"Reference","previous_headings":"","what":"Check the variable balance between the probability and non-probability samples — check_balance","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"Check variable balance probability non-probability samples","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"","code":"check_balance(x, object, dig)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"x Formula specifying variables check object Object nonprobsvy class dig Number digits rounding (default = 2)","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"list containing nonprobability totals, probability totals, differences","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"","code":"data(admin) data(jvs) jvs_svy <- svydesign(ids = ~ 1, weights = ~ weight, strata = ~ size + nace + region, data = jvs) ipw_est1 <- nonprob(selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\" ) ipw_est2 <- nonprob( selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1)) ## check the balance for the standard IPW check_balance(~size, ipw_est1) #> $nonprob_totals #> sizeL sizeM sizeS #> 8193.376 13529.550 31175.205 #> #> $prob_totals #> sizeL sizeM sizeS #> 8561 13758 29551 #> #> $balance #> sizeL sizeM sizeS #> -367.62 -228.45 1624.21 #> ## check the balance for the calibrated IPW check_balance(~size, ipw_est2) #> $nonprob_totals #> sizeL sizeM sizeS #> 8561 13758 29551 #> #> $prob_totals #> sizeL sizeM sizeS #> 8561 13758 29551 #> #> $balance #> sizeL sizeM sizeS #> 0 0 0 #>"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Confidence Intervals for Model Parameters — confint.nonprob","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"function computes confidence intervals selection model coefficients.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"","code":"# S3 method for class 'nonprob' confint(object, parm, level = 0.95, ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"object object nonprob class. parm names parameters confidence intervals computed, missing parameters considered. level confidence level intervals. ... additional arguments","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"object named columns include upper lower limit confidence intervals.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":null,"dir":"Reference","previous_headings":"","what":"Control parameters for inference — control_inf","title":"Control parameters for inference — control_inf","text":"control_inf constructs list necessary control parameters statistical inference.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control parameters for inference — control_inf","text":"","code":"control_inf( vars_selection = FALSE, var_method = c(\"analytic\", \"bootstrap\"), rep_type = c(\"subbootstrap\", \"auto\", \"JK1\", \"JKn\", \"BRR\", \"bootstrap\", \"mrbbootstrap\", \"Fay\"), bias_correction = FALSE, bias_inf = c(\"union\", \"div\"), num_boot = 500, alpha = 0.05, cores = 1, keep_boot = TRUE, nn_exact_se = FALSE, pi_ij = NULL )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Control parameters for inference — control_inf","text":"vars_selection default FALSE; TRUE, variables selection model used. var_method variance method (default \"analytic\"). rep_type replication type weights bootstrap method variance estimation passed survey::.svrepdesign(). Default \"subbootstrap\". bias_correction default FALSE; TRUE, bias minimization estimation used model fitting. bias_inf inference method bias minimization. union, final model fitted union selected variables selection outcome models div, final model fitted separately division selected variables relevant ones selection outcome model. num_boot number iteration bootstrap algorithms. alpha significance level (default 0.05). cores number cores parallel computing (default 1). keep_boot logical value indicating whether statistics bootstrap kept (default TRUE) nn_exact_se logical value indicating whether compute exact standard error estimate nn pmm estimator. variance estimator estimation based nn pmm can decomposed three parts, third computed using covariance imputed values units probability sample using predictive matches non-probability sample. situations term negligible computationally expensive default set FALSE, recommended option set value TRUE submitting final results. pi_ij either matrix ppsmat class object (default NULL).","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Control parameters for inference — control_inf","text":"list selected parameters.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":null,"dir":"Reference","previous_headings":"","what":"Control parameters for outcome model — control_out","title":"Control parameters for outcome model — control_out","text":"control_out constructs list necessary control parameters outcome model.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control parameters for outcome model — control_out","text":"","code":"control_out( epsilon = 1e-04, maxit = 100, trace = FALSE, k = 1, penalty = c(\"SCAD\", \"lasso\", \"MCP\"), a_SCAD = 3.7, a_MCP = 3, lambda_min = 0.001, nlambda = 100, nfolds = 10, treetype = c(\"kd\", \"rp\", \"ball\"), searchtype = c(\"standard\", \"priority\"), pmm_match_type = 1, pmm_weights = c(\"none\", \"prop_dist\"), pmm_k_choice = c(\"none\", \"min_var\"), pmm_reg_engine = c(\"glm\", \"loess\") )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Control parameters for outcome model — control_out","text":"epsilon Tolerance fitting algorithms. Default 1e-6. maxit Maximum number iterations. trace logical value. TRUE trace steps fitting algorithms. Default FALSE. k k parameter RANN::nn2() function. Default 5. penalty penalty algorithm variable selection. Default SCAD a_SCAD tuning parameter SCAD penalty outcome model. Default 3.7. a_MCP tuning parameter MCP penalty outcome model. Default 3. lambda_min smallest value lambda, fraction lambda.max. Default .001. nlambda number lambda values. Default 100. nfolds number folds cross-validation variables selection model. treetype Type tree nearest neighbour imputation (NN PMM estimator) passed RANN::nn2() function. searchtype Type search nearest neighbour imputation (NN PMM estimator) passed RANN::nn2() function. pmm_match_type (PMM Estimator) Indicates select 'closest' unit nonprobability sample unit probability sample. Either 1 (default) 2 2 matching minimizing distance y_i S_A y_j j S_B 1 matching minimizing distance y_i S_A y_i S_A. pmm_weights (PMM Estimator) Indicate weight k nearest neighbours S_B create imputed value units S_A. default value \"none\" indicates mean k nearest y's S_B used whereas \"prop_dist\" results weighted mean k values weights inversely proportional distance matched values. pmm_k_choice (PMM Estimator) Character value indicating k hyper-parameter chosen, default \"none\" meaning k provided control_outcome argument used. now option \"min_var\" means k chosen minimizing estimated variance estimator mean. Parameter k provided control list chosen starting point. pmm_reg_engine (PMM Estimator) whether use parametric (\"glm\") non-parametric (\"loess\") regression model outcome. default \"glm\".","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Control parameters for outcome model — control_out","text":"List selected parameters.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":null,"dir":"Reference","previous_headings":"","what":"Control parameters for the selection model — control_sel","title":"Control parameters for the selection model — control_sel","text":"control_sel constructs list necessary control parameters selection model.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control parameters for the selection model — control_sel","text":"","code":"control_sel( est_method = c(\"mle\", \"gee\"), gee_h_fun = 1, optimizer = c(\"maxLik\", \"optim\"), maxlik_method = c(\"NR\", \"BFGS\", \"NM\"), optim_method = c(\"BFGS\", \"Nelder-Mead\"), epsilon = 1e-04, maxit = 500, trace = FALSE, penalty = c(\"SCAD\", \"lasso\", \"MCP\"), a_SCAD = 3.7, a_MCP = 3, lambda = -1, lambda_min = 0.001, nlambda = 50, nfolds = 10, print_level = 0, start_type = c(\"zero\", \"mle\", \"naive\"), nleqslv_method = c(\"Broyden\", \"Newton\"), nleqslv_global = c(\"dbldog\", \"pwldog\", \"cline\", \"qline\", \"gline\", \"hook\", \"none\"), nleqslv_xscalm = c(\"fixed\", \"auto\"), dependence = FALSE, key = NULL )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Control parameters for the selection model — control_sel","text":"est_method Method estimation propensity score model (\"mle\" \"gee\"; default \"mle\"). gee_h_fun Smooth function generalized estimating equations (GEE) method taking following values 1 h(x, ) = (x, )x, 2 h(x, ) = x optimizer (est_method=\"mle\" ) optimization function maximum likelihood estimation. maxlik_method (est_method=\"mle\" ) maximisation method passed maxLik::maxLik() function. Default NR. optim_method (est_method=\"mle\" ) maximisation method passed stats::optim() function. Default BFGS. epsilon Tolerance fitting algorithms default 1e-6. maxit Maximum number iterations. trace logical value. TRUE trace steps fitting algorithms. Default FALSE penalty penalization function used variables selection. a_SCAD tuning parameter SCAD penalty selection model. Default 3.7. a_MCP tuning parameter MCP penalty selection model. Default 3. lambda user-specified value variable selection model fitting. lambda_min smallest value lambda, fraction lambda.max. Default .001. nlambda number lambda values. Default 50. nfolds number folds cross validation. Default 10. print_level argument determines level printing done optimization (propensity score model) process. start_type Type method start points model fitting taking following values zero start vector zeros (default methods). mle (est_method=\"gee\" ) starting parameters taken result est_method=\"mle\" method. nleqslv_method (est_method=\"gee\" ) method passed nleqslv::nleqslv() function. nleqslv_global (est_method=\"gee\" ) global strategy passed nleqslv::nleqslv() function. nleqslv_xscalm (est_method=\"gee\" ) type x scaling passed nleqslv::nleqslv() function. dependence logical value (default TRUE) informing whether samples overlap (YET IMPLEMENTED, FUTURE DEVELOPMENT). key binary key variable allowing identify overlap (YET IMPLEMENTED, FUTURE DEVELOPMENT).","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Control parameters for the selection model — control_sel","text":"List selected parameters.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":null,"dir":"Reference","previous_headings":"","what":"Job Vacancy Survey — jvs","title":"Job Vacancy Survey — jvs","text":"subset Job Vacancy Survey Poland (one quarter). data subject slight manipulation, relationships data preserved. details JVS, please refer following link: https://stat.gov.pl/obszary-tematyczne/rynek-pracy/popyt-na-prace/zeszyt-metodologiczny-popyt-na-prace,3,1.html.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Job Vacancy Survey — jvs","text":"","code":"jvs"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Job Vacancy Survey — jvs","text":"single data.frame 6,523 rows 6 columns id Identifier entity (company: legal local). private Whether company private (1) public (0) entity. size size entity: S – small (9 employees), M – medium (10-49) L – large (49). nace main NACE code given entity: C, D.E, F, G, H, , J, K.L, M, N, O, P, Q R.S (14 levels, 3 combined: D E, K L, R S). region region Poland (16 levels: 02, 04, ..., 32). weight final (calibrated) weight (w-weight). access design weights (d-weights).","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Job Vacancy Survey — jvs","text":"","code":"data(\"jvs\") head(jvs) #> id private size nace region weight #> 1 j_1 0 L O 14 1 #> 2 j_2 0 L O 24 6 #> 3 j_3 0 L R.S 14 1 #> 4 j_4 0 L R.S 14 1 #> 5 j_5 0 L R.S 22 1 #> 6 j_6 0 M R.S 26 1"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":null,"dir":"Reference","previous_headings":"","what":"Propensity score model — model_ps","title":"Propensity score model — model_ps","text":"Function specify propensity score (PS) model inverse probability weighting estimator. function provides basic functions logistic regression given link function (currently support logit, probit cloglog) additional information analytic variance estimator mean.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Propensity score model — model_ps","text":"","code":"model_ps(link = c(\"logit\", \"probit\", \"cloglog\"), ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Propensity score model — model_ps","text":"link link PS model ... Additional, optional arguments.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Propensity score model — model_ps","text":"list functions elements specific link function following entries: make_log_like log-likelihood function specific link function make_gradient gradient loglik make_hessian hessian loglik make_link link function make_link_inv inverse link function make_link_der first derivative link function make_link_inv_der first derivative inverse link function make_link_inv_rev TBA make_link_inv_rev_der TBA variance_covariance1 TBA variance_covariance2 TBA b_vec_ipw TBA b_vec_dr TBA t_vec TBA var_nonprob TBA link name selected link function PS model (character) model model type (character)","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Propensity score model — model_ps","text":"Łukasz Chrostowski, Maciej Beręsewicz","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Propensity score model — model_ps","text":"","code":"# Printing information on the model selected model_ps() #> [1] \"Propensity score model with logit link\" # extracting specific field model_ps(\"cloglog\")$make_gradient #> function (X_nons, X_rand, weights, weights_rand, ...) #> { #> function(theta) { #> eta1 <- as.matrix(X_nons) %*% theta #> eta2 <- as.matrix(X_rand) %*% theta #> invLink1 <- inv_link(eta1) #> invLink2 <- inv_link(eta2) #> t(t(X_nons) %*% (weights * exp(eta1)/invLink1) - t(X_rand) %*% #> (weights_rand * exp(eta2))) #> } #> } #> #> "},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Inference with non-probability survey samples — nonprob","title":"Inference with non-probability survey samples — nonprob","text":"nonprob function provides access various methods inference based non-probability surveys (including big data). function allows estimate population mean based access reference probability sample (via survey package), well totals means covariates. package implements state---art approaches recently proposed literature: Chen et al. (2020), Yang et al. (2020), Wu (2022) uses Lumley 2004 survey package inference (reference probability sample provided). provides various propensity score weighting (e.g. calibration constraints), mass imputation (e.g. nearest neighbour, predictive mean matching) doubly robust estimators (e.g. take account minimisation asymptotic bias population mean estimators). package uses survey package functionality probability sample available. optional parameters set NULL. obligatory ones include data well one following three: selection, outcome, target – depending method selected. case outcome target multiple y variables can specified.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inference with non-probability survey samples — nonprob","text":"","code":"nonprob( data, selection = NULL, outcome = NULL, target = NULL, svydesign = NULL, pop_totals = NULL, pop_means = NULL, pop_size = NULL, method_selection = c(\"logit\", \"cloglog\", \"probit\"), method_outcome = c(\"glm\", \"nn\", \"pmm\"), family_outcome = c(\"gaussian\", \"binomial\", \"poisson\"), subset = NULL, strata = NULL, weights = NULL, na_action = NULL, control_selection = control_sel(), control_outcome = control_out(), control_inference = control_inf(), start_selection = NULL, start_outcome = NULL, verbose = FALSE, x = TRUE, y = TRUE, se = TRUE, ... ) nonprob_dr( selection, outcome, data, svydesign, pop_totals, pop_means, pop_size, method_selection, method_outcome, family_outcome = \"gaussian\", subset, strata, weights, na_action, control_selection, control_outcome, control_inference, start_outcome, start_selection, verbose, x, y, se, ... ) nonprob_mi( outcome, data, svydesign, pop_totals, pop_means, pop_size, method_outcome, family_outcome = \"gaussian\", subset, strata, weights, na_action, control_outcome, control_inference, start_outcome, verbose, x, y, se, ... )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inference with non-probability survey samples — nonprob","text":"data data.frame dataset containing non-probability sample. selection formula (default NULL) selection (propensity) score model. outcome formula (default NULL) outcome (target) model. target formula (default NULL) target variable(s). allow multiple target variables (e.g. ~y1 + y2 + y3). svydesign optional svydesign2 class object containing probability sample design weights. pop_totals optional named vector population totals covariates. pop_means optional named vector population means covariates. pop_size optional double value population size. method_selection character (default logit) indicating method propensity score link function. method_outcome character (default glm) indicating method outcome model. family_outcome character (default gaussian) describing error distribution link function used model. Currently supports: gaussian identity link, poisson binomial. subset optional vector specifying subset observations used fitting process - yet supported. strata optional vector specifying strata (yet supported, development). weights optional vector prior weights used fitting process. assumed vector contains frequency analytic weights (.e. rows data argument repeated according values weights argument), probability/design weights. na_action function indicates happen data contain NAs (yet supported, development). control_selection list (default control_sel() result) indicating parameters used fitting selection model propensity scores. change parameters one use control_sel() function. control_outcome list (default control_out() result) indicating parameters used fitting model outcome variable. change parameters one use control_out() function. control_inference list (default control_inf() result) indicating parameters used inference based probability non-probability samples. change parameters one use control_inf() function. start_selection optional vector starting values parameters selection equation. start_outcome optional vector starting values parameters outcome equation. verbose numerical value (default TRUE) whether detailed information fitting presented. x logical value (default TRUE) indicating whether return model matrix covariates part output. y logical value (default TRUE) indicating whether return vector outcome variable part output. se Logical value (default TRUE) indicating whether calculate return standard error estimated mean. ... Additional, optional arguments.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inference with non-probability survey samples — nonprob","text":"Returns object class c(\"nonprobsvy\", \"nonprobsvy_ipw\") case inverse probability weighting estimator, c(\"nonprobsvy\", \"nonprobsvy_mi\") case mass imputation estimator, c(\"nonprobsvy\", \"nonprobsvy_dr\") case doubly robust estimator, type list containing: X – model.matrix containing data probability non-probability samples specified function call. y – list vector outcome variables specified function call. R – numeric vector indicating whether unit belongs probability (0) non-probability (1) units matrix X. prob – numeric vector estimated propensity scores non-probability sample. weights – vector estimated weights non-probability sample. control – list control functions. output – output model information estimated population mean standard errors. SE – data.frame standard error estimator population mean, divided errors probability non-probability samples. confidence_interval – data.frame confidence interval population mean estimator. nonprob_size – scalar numeric vector denoting size non-probability sample. prob_size – scalar numeric vector denoting size probability sample. pop_size – scalar numeric vector estimated population size derived estimated weights (non-probability sample) known design weights (probability sample). pop_totals – numeric vector total values auxiliary variables derived probability sample numeric vector total/mean values. estimator – character vector information type estimator selected (one c(\"ipw\", \"mi\", \"dr\")). outcome – list containing information fitting mass imputation model, case regression model object containing list returned stats::glm(), case nearest neighbour imputation object containing list returned RANN::nn2(). bias_correction control_inf() set TRUE, estimation based joint estimating equations selection outcome model therefore, list different one returned stats::glm() function contains elements coefficients – numeric vector estimated coefficients regression model. std_err – numeric vector standard errors estimated coefficients. residuals – numeric vector response residuals. variance_covariance – matrix variance-covariance matrix coefficient estimates. df_residual – scalar vector degrees freedom residuals. family – character specifies error distribution link function used model. fitted.values – numeric vector predicted values response variable based fitted model. linear.predictors – numeric vector linear fit link scale. X – matrix design matrix (model.matrix) method – set glm, since regression method. model_frame – model.matrix data probability sample used mass imputation. cve – error value lambda, averaged across cross-validation folds. selection – list containing information fitting propensity score model, coefficients – numeric vector coefficients. std_err – numeric vector standard errors estimated model coefficients. residuals – numeric vector response residuals. variance – scalar numeric vector root mean square error. fitted_values – numeric vector fitted mean values, obtained transforming linear predictors inverse link function. link – link object used. linear_predictors – numeric vector linear fit link scale. aic –\tversion Akaike's Information Criterion, minus twice maximized log-likelihood plus twice number parameters. weights – numeric vector estimated weights non-probability sample. prior.weights – numeric vector frequency weights initially supplied, vector 1s none . est_totals – numeric vector estimated total values auxiliary variables derived non-probability sample. formula – formula supplied. df_residual – residual degrees freedom. log_likelihood – value log-likelihood function mle method, case NA. cve – error value lambda, averaged across cross-validation folds variable selection model propensity score model fitting. Returned selection variables model used. method_selection – Link function, e.g. logit, cloglog probit. hessian – Hessian Gradient log-likelihood function mle method. gradient – Gradient log-likelihood function mle method. method – estimation method selection model, e.g. mle gee. prob_der – Derivative inclusion probability function units non–probability sample. prob_rand – Inclusion probabilities unit probability sample svydesign object. prob_rand_est – Inclusion probabilities non-probability sample unit probability sample. prob_rand_est_der – Derivative inclusion probabilities non–probability sample unit probability sample. stat – matrix estimated population means bootstrap iteration. Returned bootstrap method used estimate variance keep_boot control_inf() set TRUE.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inference with non-probability survey samples — nonprob","text":"Let y response variable want estimate population mean, given _y = 1N _i=1^N y_i. purpose consider data integration following structure. Let S_A non-probability sample design matrix covariates X_A = bmatrix x_11 & x_12 & & x_1p x_21 & x_22 & & x_2p & & & x_n_A1 & x_n_A2 & & x_n_Ap bmatrix vector outcome variable y = bmatrix y_1 y_2 y_n_A. bmatrix hand, let S_B probability sample design matrix covariates X_B = bmatrix x_11 & x_12 & & x_1p x_21 & x_22 & & x_2p & & & x_n_B1 & x_n_B2 & & x_n_Bp. bmatrix Instead sample units can consider vector population sums form _x = (_i Ux_i1, _i Ux_i2, ..., _i Ux_ip) means _xN, U refers finite population. Note assume access response variable S_B. general make following assumptions: selection indicator belonging non-probability sample R_i response variable y_i independent given set covariates x_i. units non-zero propensity score, .e., _i^> 0 . indicator variables R_i^R_j^independent given x_i x_j j. three possible approaches problem estimating population mean using non-probability samples: Inverse probability weighting – main drawback non-probability sampling unknown selection mechanism unit included sample. talk -called \"biased sample\" problem. inverse probability approach based assumption reference probability sample available therefore can estimate propensity score selection mechanism. estimator following form: _IPW = 1N^A_i S_A y_i_i^. purpose several estimation methods can considered. first approach maximum likelihood estimation corrected log-likelihood function, given following formula ^*() = _i S_A (x_i, )1 - (x_i,) + _i S_Bd_i^B 1 - (x_i,). literature, main approach modelling propensity scores based logit link function. However, extend propensity score model additional link functions cloglog probit. pseudo-score equations derived ML methods can replaced idea generalised estimating equations calibration constraints defined equations. U()=_i S_A h(x_i, )-_i S_B d_i^B (x_i, ) h(x_i, ). Notice h(x_i, ) = (x, )x need probability sample can use vector population totals/means. Mass imputation – method based framework imputed values outcome variables created entire probability sample. case, treat large sample training data set used build imputation model. Using imputed values probability sample (known) design weights, can build population mean estimator form: _MI = 1N^B_i S_B d_i^B y_i. opens door flexible method imputation models. package uses generalized linear models stats::glm(), nearest neighbour algorithm using RANN::nn2() predictive mean matching. Doubly robust estimation – IPW MI estimators sensitive misspecified models propensity score outcome variable, respectively. end, -called doubly robust methods presented take problems account. simple idea combine propensity score imputation models inference, leading following estimator _DR = 1N^A_i S_A d_i^(y_i - y_i) + 1N^B_i S_B d_i^B y_i. addition, approach based directly bias minimisation implemented. following formula aligned bias(_DR) = & E (_DR - ) = & E 1N _i=1^N (R_i^A_i^(x_i^T ) - 1 ) (y_i - m(x_i^T )) + & E 1N _i=1^N (R_i^B d_i^B - 1) m( x_i^T ) , aligned lead us system equations aligned J(, ) = arrayc J_1(, ) J_2(, ) array = arrayc _i=1^N R_i^\\ 1(x_i, )-1 y_i-m(x_i, ) x_i _i=1^N R_i^(x_i, ) m(x_i, ) - _i S_B d_i^B m(x_i, ) array , aligned m(x_i, ) mass imputation (regression) model outcome variable propensity scores _i^estimated using logit function model. MLE GEE approaches extended method cloglog probit links. straightforward calculate variances estimators, asymptotic equivalents variances derived using Taylor approximation proposed literature. Details can found . addition, bootstrap approach can used variance estimation. function also allows variables selection using known methods implemented handle integration probability non-probability sampling. presence high-dimensional data, variable selection important, can reduce variability estimate results using irrelevant variables build model. Let U( , ) joint estimating function ( , ). define penalized estimating functions U^p (, ) = U(, ) - arrayc q__(||) sgn() \\ q__(|\\boldsymbol|) sgn() array , _ q__ smooth functions. let q_ (x) = p_ x, p_ penalization function. Details penalization functions techniques solving type equation can found . use variable selection model, set vars_selection parameter control_inf() function TRUE. addition, control functions control_sel() control_out() can set parameters selection relevant variables, number folds cross-validation algorithm lambda value penalizations. Details can found documentation control functions nonprob.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inference with non-probability survey samples — nonprob","text":"Kim JK, Park S, Chen Y, Wu C. Combining non-probability probability survey samples mass imputation. J R Stat Soc Series . 2021;184:941– 963. Shu Yang, Jae Kwang Kim, Rui Song. Doubly robust inference combining probability non-probability samples high dimensional data. J. R. Statist. Soc. B (2020) Yilin Chen , Pengfei Li & Changbao Wu (2020) Doubly Robust Inference Nonprobability Survey Samples, Journal American Statistical Association, 115:532, 2011-2021 Shu Yang, Jae Kwang Kim Youngdeok Hwang Integration data probability surveys big found data finite population inference using mass imputation. Survey Methodology, June 2021 29 Vol. 47, . 1, pp. 29-58","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Inference with non-probability survey samples — nonprob","text":"Łukasz Chrostowski, Maciej Beręsewicz, Piotr Chlebicki","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inference with non-probability survey samples — nonprob","text":"","code":"# \\donttest{ # generate data based on Doubly Robust Inference With Non-probability Survey Samples (2021) # Yilin Chen , Pengfei Li & Changbao Wu library(sampling) #> #> Attaching package: ‘sampling’ #> The following objects are masked from ‘package:survival’: #> #> cluster, strata set.seed(123) # sizes of population and probability sample N <- 20000 # population n_b <- 1000 # probability # data z1 <- rbinom(N, 1, 0.7) z2 <- runif(N, 0, 2) z3 <- rexp(N, 1) z4 <- rchisq(N, 4) # covariates x1 <- z1 x2 <- z2 + 0.3 * z2 x3 <- z3 + 0.2 * (z1 + z2) x4 <- z4 + 0.1 * (z1 + z2 + z3) epsilon <- rnorm(N) sigma_30 <- 10.4 sigma_50 <- 5.2 sigma_80 <- 2.4 # response variables y30 <- 2 + x1 + x2 + x3 + x4 + sigma_30 * epsilon y50 <- 2 + x1 + x2 + x3 + x4 + sigma_50 * epsilon y80 <- 2 + x1 + x2 + x3 + x4 + sigma_80 * epsilon # population sim_data <- data.frame(y30, y50, y80, x1, x2, x3, x4) ## propensity score model for non-probability sample (sum to 1000) eta <- -4.461 + 0.1 * x1 + 0.2 * x2 + 0.1 * x3 + 0.2 * x4 rho <- plogis(eta) # inclusion probabilities for probability sample z_prob <- x3 + 0.2051 sim_data$p_prob <- inclusionprobabilities(z_prob, n = n_b) # data sim_data$flag_nonprob <- UPpoisson(rho) ## sampling nonprob sim_data$flag_prob <- UPpoisson(sim_data$p_prob) ## sampling prob nonprob_df <- subset(sim_data, flag_nonprob == 1) ## non-probability sample svyprob <- svydesign( ids = ~1, probs = ~p_prob, data = subset(sim_data, flag_prob == 1), pps = \"brewer\" ) ## probability sample ## mass imputation estimator MI_res <- nonprob( outcome = y80 ~ x1 + x2 + x3 + x4, data = nonprob_df, svydesign = svyprob ) summary(MI_res) #> #> Call: #> nonprob(data = nonprob_df, outcome = y80 ~ x1 + x2 + x3 + x4, #> svydesign = svyprob) #> #> ------------------------- #> Estimated population mean: 9.518 with overall std.err of: 0.151 #> And std.err for nonprobability and probability samples being respectively: #> 0.08679 and 0.1236 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y80 9.222349 9.814346 #> #> #> Based on: Mass Imputation method #> For a population of estimate size: 21631.63 #> Obtained on a nonprobability sample of size: 1032 #> With an auxiliary probability sample of size: 1044 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 1.93113 0.24859 7.768 7.95e-15 *** #> x1 1.06616 0.16954 6.289 3.20e-10 *** #> x2 1.04125 0.09731 10.700 < 2e-16 *** #> x3 0.98891 0.06927 14.277 < 2e-16 *** #> x4 0.98930 0.01904 51.946 < 2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> ------------------------- #> ## inverse probability weighted estimator IPW_res <- nonprob( selection = ~ x1 + x2 + x3 + x4, target = ~y80, data = nonprob_df, svydesign = svyprob ) summary(IPW_res) #> #> Call: #> nonprob(data = nonprob_df, selection = ~x1 + x2 + x3 + x4, target = ~y80, #> svydesign = svyprob) #> #> ------------------------- #> Estimated population mean: 9.718 with overall std.err of: 0.1962 #> And std.err for nonprobability and probability samples being respectively: #> 0.1331 and 0.1442 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y80 9.332946 10.10219 #> #> #> Based on: Inverse probability weighted method #> For a population of estimate size: 21127.42 #> Obtained on a nonprobability sample of size: 1032 #> With an auxiliary probability sample of size: 1044 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -4.582648 0.105508 -43.434 < 2e-16 *** #> x1 0.102633 0.074416 1.379 0.168 #> x2 0.234848 0.042871 5.478 4.30e-08 *** #> x3 0.181639 0.029253 6.209 5.33e-10 *** #> x4 0.184285 0.008568 21.508 < 2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.172 10.583 18.137 20.472 27.940 79.563 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.56121 -0.04204 -0.01457 0.43052 0.94475 0.98743 #> #> AIC: 7797.97 #> BIC: 7826.161 #> Log-Likelihood: -3893.985 on 2071 Degrees of freedom ## doubly robust estimator DR_res <- nonprob( outcome = y80 ~ x1 + x2 + x3 + x4, selection = ~ x1 + x2 + x3 + x4, data = nonprob_df, svydesign = svyprob ) summary(DR_res) #> #> Call: #> nonprob(data = nonprob_df, selection = ~x1 + x2 + x3 + x4, outcome = y80 ~ #> x1 + x2 + x3 + x4, svydesign = svyprob) #> #> ------------------------- #> Estimated population mean: 9.483 with overall std.err of: 0.1525 #> And std.err for nonprobability and probability samples being respectively: #> 0.08508 and 0.1265 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y80 9.183858 9.781461 #> #> #> Based on: Doubly-Robust method #> For a population of estimate size: 21127.42 #> Obtained on a nonprobability sample of size: 1032 #> With an auxiliary probability sample of size: 1044 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 1.93113 0.24859 7.768 7.95e-15 *** #> x1 1.06616 0.16954 6.289 3.20e-10 *** #> x2 1.04125 0.09731 10.700 < 2e-16 *** #> x3 0.98891 0.06927 14.277 < 2e-16 *** #> x4 0.98930 0.01904 51.946 < 2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -4.582648 0.105508 -43.434 < 2e-16 *** #> x1 0.102633 0.074416 1.379 0.168 #> x2 0.234848 0.042871 5.478 4.30e-08 *** #> x3 0.181639 0.029253 6.209 5.33e-10 *** #> x4 0.184285 0.008568 21.508 < 2e-16 *** #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.172 10.583 18.137 20.472 27.940 79.563 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.56121 -0.04204 -0.01457 0.43052 0.94475 0.98743 #> #> AIC: 7797.97 #> BIC: 7826.161 #> Log-Likelihood: -3893.985 on 2071 Degrees of freedom # }"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns population size (estimated or fixed) — pop_size","title":"Returns population size (estimated or fixed) — pop_size","text":"Returns population size assumed fixed – based pop_size argument, estimated – based probability survey specified svydesign based estimated propensity scores non-probability sample.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns population size (estimated or fixed) — pop_size","text":"","code":"pop_size(object)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns population size (estimated or fixed) — pop_size","text":"object object returned nonprob function.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns population size (estimated or fixed) — pop_size","text":"scalar returning value population size.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns population size (estimated or fixed) — pop_size","text":"","code":"data(admin) data(jvs) jvs_svy <- svydesign(ids = ~ 1, weights = ~ weight, strata = ~ size + nace + region, data = jvs) ipw_est1 <- nonprob(selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\" ) ipw_est2 <- nonprob( selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1)) ## estimated population size based on the non-calibrated IPW (MLE) pop_size(ipw_est1) #> pop_size #> 52898.13 ## estimated population size based on the calibrated IPW (GEE) pop_size(ipw_est2) #> pop_size #> 51870"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Summary statistics for model of the nonprob class. — summary.nonprob","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"Summary statistics model nonprob class.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"","code":"# S3 method for class 'nonprob' summary(object, test = c(\"t\", \"z\"), correlation = FALSE, cov = NULL, ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"object object nonprob class test Type test significance parameters \"t\" t-test \"z\" normal approximation students t distribution, default \"z\" used 30 degrees freedom \"t\" used cases. correlation correlation Logical value indicating whether correlation matrix computed covariance matrix default FALSE. cov Covariance matrix corresponding regression parameters ... Additional optional arguments","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"object summary_nonprob class containing: call – call created object. pop_total – list containing information estimated population mean, standard error confidence interval. sample_size – size samples used model. population_size – estimated size population non–probability sample drawn. test – Type statistical test performed. control – List control parameters used fitting model. model – descriptive name model used, e.g., \"Doubly-Robust\", \"Inverse probability weighted\", \"Mass Imputation\". aic – Akaike's information criterion. bic – Bayesian (Schwarz's) information criterion. residuals – Residuals model, providing information difference observed predicted values. likelihood – Logarithm likelihood function evaluated coefficients. df_residual – Residual degrees freedom. weights – Distribution estimated weights obtained model. coef – Regression coefficients estimated model. std_err – Standard errors regression coefficients. w_val – Wald statistic values significance testing coefficients. p_values – P-values corresponding Wald statistic values, assessing significance coefficients. crr – correlation matrix model coefficients, requested. confidence_interval_coef – Confidence intervals model coefficients. names – Names fitted models.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Obtain Covariance Matrix estimation. — vcov.nonprob","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"vcov method `nonprob` class.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"","code":"# S3 method for class 'nonprob' vcov(object, ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"object object nonprob class. ... additional arguments method functions","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"covariance matrix fitted coefficients","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"Returns estimated covariance matrix model coefficients calculated analytic hessian Fisher information matrix usually utilising asymptotic effectiveness maximum likelihood estimates.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"breaking-changes-0-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"nonprobsvy 0.2.0","text":"functions pop.size, controlSel, controlOut controlInf renamed pop_size, control_sel, control_out control_inf respectively. function genSimData removed completely used anywhere package. argument maxLik_method renamed maxlik_method control_sel function. predictive_match renamed pmm_match_type align PMM (Predictive Mean Matching) estimator naming convention, related parameters start pmm_ argument method removed used argument est_method_sel renamed est_method argument h renamed gee_h_fun make readable user start_type now accepts zero mle (gee models ). nonprobsvy class renamed nonprob related method adjusted change functions logit_model_nonprobsvy, probit_model_nonprobsvy cloglog_model_nonprobsvy removed favour readable model_ps function specifies propensity score model","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"features-0-2-0","dir":"Changelog","previous_headings":"","what":"Features","title":"nonprobsvy 0.2.0","text":"two additional datasets included: jvs (Job Vacancy Survey; probability sample survey) admin (Central Job Offers Database; non-probability sample survey). units auxiliary variables aligned way allows data integrated using methods implemented package. check_balance function added check balance totals variables based weighted weights non-probability probability samples. citation file added.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"bugfixes-0-2-0","dir":"Changelog","previous_headings":"","what":"Bugfixes","title":"nonprobsvy 0.2.0","text":"basic methods functions related variance estimation, weights probability linking methods rewritten optimal readable way.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"other-0-2-0","dir":"Changelog","previous_headings":"","what":"Other","title":"nonprobsvy 0.2.0","text":"informative error messages added. documentation improved. switching completely snake_case. extensive cleaning code. unit-tests added.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"documentation-0-2-0","dir":"Changelog","previous_headings":"","what":"Documentation","title":"nonprobsvy 0.2.0","text":"annotation added arguments strata, subset na_action supported time .","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"nonprobsvy-011","dir":"Changelog","previous_headings":"","what":"nonprobsvy 0.1.1","title":"nonprobsvy 0.1.1","text":"CRAN release: 2024-11-14","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"bugfixes-0-1-1","dir":"Changelog","previous_headings":"","what":"Bugfixes","title":"nonprobsvy 0.1.1","text":"bug Fix occurring estimation based auxiliary variable, led compression data frame vector. bug Fix related passing maxit argument controlSel function internally used nleqslv function bug Fix related storing vector model_frame predicting y_hat mass imputation glm model X based one auxiliary variable - fix provided converting data.frame object.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"features-0-1-1","dir":"Changelog","previous_headings":"","what":"Features","title":"nonprobsvy 0.1.1","text":"added information summary quality estimation basing difference estimated known total values auxiliary variables added estimation exact standard error k-nearest neighbor estimator. added breaking change controlOut function switching values predictive_match argument. now , predictive_match = 1 means ŷ−ŷ\\hat{y}-\\hat{y} predictive mean matching imputation predictive_match = 2 corresponds ŷ−y\\hat{y}-y matching. implemented div option variable selection (documentation) doubly robust estimation. added insights nonprob output gradient, hessian jacobian derived IPW estimation mle gee methods IPW DR model executed. added estimated inclusion probabilities derivatives probability non-probability samples nonprob output IPW DR model executed. added model_frame matrix data probability sample used mass imputation nonprob MI DR model executed.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"unit-tests-0-1-1","dir":"Changelog","previous_headings":"","what":"Unit tests","title":"nonprobsvy 0.1.1","text":"added unit tests variable selection models mi estimation vector population totals available","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"nonprobsvy-010","dir":"Changelog","previous_headings":"","what":"nonprobsvy 0.1.0","title":"nonprobsvy 0.1.0","text":"CRAN release: 2024-04-04","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"features-0-1-0","dir":"Changelog","previous_headings":"","what":"Features","title":"nonprobsvy 0.1.0","text":"implemented population mean estimation using doubly robust, inverse probability weighting mass imputation methods implemented inverse probability weighting models Maximum Likelihood Estimation Generalized Estimating Equations methods logit, complementary log-log probit link functions. implemented generalized linear models, nearest neighbours predictive mean matching methods Mass Imputation implemented bias correction estimators doubly-robust approach implemented estimation methods vector population means/totals available implemented variables selection SCAD, LASSO MCP penalization equations implemented analytic bootstrap (parallel computation - doSNOW package) variance described estimators added control parameters models nobs samples size pop.size population size estimation residuals residuals inverse probability weighting model cooks.distance identifying influential observations significant impact parameter estimates hatvalues measuring leverage individual observations logLik computing log-likelihood model, AIC (Akaike Information Criterion) evaluating model based trade-goodness fit complexity, helping model selection BIC (Bayesian Information Criterion) similar purpose AIC stronger penalty model complexity confint calculating confidence intervals around parameter estimates vcov obtaining variance-covariance matrix parameter estimates deviance assessing goodness fit model","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"unit-tests-0-1-0","dir":"Changelog","previous_headings":"","what":"Unit tests","title":"nonprobsvy 0.1.0","text":"added unit tests IPW estimators.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"github-repository-0-1-0","dir":"Changelog","previous_headings":"","what":"Github repository","title":"nonprobsvy 0.1.0","text":"added automated R-cmd check","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"documentation-0-1-0","dir":"Changelog","previous_headings":"","what":"Documentation","title":"nonprobsvy 0.1.0","text":"added documentation nonprob function.","code":""}] +[{"path":"https://ncn-foreigners.github.io/nonprobsvy/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2023 ncn-foreigners Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Łukasz Chrostowski. Author, contributor. Maciej Beręsewicz. Author, maintainer. Piotr Chlebicki. Author, contributor.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Chrostowski Ł, Beręsewicz M, Chlebicki P (2025). Inference Based Non-Probability Samples. R package version 0.2, https://github.com/ncn-foreigners/nonprobsvy.","code":"@Manual{nonprobsy, title = {Inference Based on Non-Probability Samples}, author = {Łukasz Chrostowski and Maciej Beręsewicz and Piotr Chlebicki}, note = {R package version 0.2}, year = {2025}, url = {https://github.com/ncn-foreigners/nonprobsvy}, }"},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"basic-information","dir":"","previous_headings":"","what":"Basic information","title":"Inference Based on Non-Probability Samples","text":"goal package provide R users access modern methods non-probability samples auxiliary information population probability sample available: inverse probability weighting estimators possible calibration constraints (Chen, Li, Wu 2020), mass imputation estimators based nearest neighbours (Yang, Kim, Hwang 2021), predictive mean matching regression imputation (Kim et al. 2021), doubly robust estimators (Chen, Li, Wu 2020) bias minimization (Yang, Kim, Song 2020). package allows : variable section high-dimensional space using SCAD (Yang, Kim, Song 2020), Lasso MCP penalty (via ncvreg, Rcpp, RcppArmadillo packages), estimation variance using analytical bootstrap approach (see Wu (2023)), integration survey package probability sample available Lumley (2023), different links selection (logit, probit cloglog) outcome (gaussian, binomial poisson) variables. Details use package can found: draft (proofread) version book Modern inference methods non-probability samples R, example codes reproduce papers available github repository software tutorials.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Inference Based on Non-Probability Samples","text":"can install recent version nonprobsvy package main branch Github : install stable version CRAN development version dev branch","code":"remotes::install_github(\"ncn-foreigners/nonprobsvy\") install.packages(\"nonprobsvy\") remotes::install_github(\"ncn-foreigners/nonprobsvy@dev\")"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"basic-idea","dir":"","previous_headings":"","what":"Basic idea","title":"Inference Based on Non-Probability Samples","text":"Consider following setting two samples available: non-probability (denoted SAS_A ) probability (denoted SBS_B) set auxiliary variables (denoted 𝐗\\boldsymbol{X}) available sources YY 𝐝\\boldsymbol{d} (𝐰\\boldsymbol{w}) present probability sample.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"basic-functionalities","dir":"","previous_headings":"","what":"Basic functionalities","title":"Inference Based on Non-Probability Samples","text":"Suppose YY target variable, 𝐗\\boldsymbol{X} matrix auxiliary variables, RR inclusion indicator. , interested estimating mean τ‾Y\\bar{\\tau}_Y sum τY\\tau_Y target variable given observed data set (yk,𝐱k,Rk)(y_k, \\boldsymbol{x}_k, R_k), can approach problem possible scenarios: unit-level data available non-probability sample SAS_{}, .e. (yk,𝐱k)(y_{k}, \\boldsymbol{x}_{k}) available units k∈SAk \\S_{}, population-level data available 𝐱1,...,𝐱p\\boldsymbol{x}_{1}, ..., \\boldsymbol{x}_{p}, denoted τx1,τx2,...,τxp\\tau_{x_{1}}, \\tau_{x_{2}}, ..., \\tau_{x_{p}} population size NN known. can also consider situations population data estimated (e.g. basis survey access), unit-level data available non-probability sample SAS_A probability sample SBS_B, .e. (yk,𝐱k,Rk)(y_k, \\boldsymbol{x}_k, R_k) determined data. determined data: Rk=1R_k=1 k∈SAk \\S_A otherwise Rk=0R_k=0, yky_k observed sample SAS_A 𝐱k\\boldsymbol{x}_k observed SAS_A SBS_B,","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"when-unit-level-data-is-available-for-non-probability-survey-only","dir":"","previous_headings":"Basic functionalities","what":"When unit-level data is available for non-probability survey only","title":"Inference Based on Non-Probability Samples","text":"","code":"nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, pop_totals = c(`(Intercept)`= N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), method_outcome = \"glm\", family_outcome = \"gaussian\" ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, pop_totals = c(`(Intercept)` = N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), method_selection = \"logit\" ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, pop_totals = c(`(Intercept)`= N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1) ) nonprob( selection = ~ x1 + x2 + ... + xk, outcome = y ~ x1 + x2 + …, + xk, pop_totals = c(`(Intercept)` = N, x1 = tau_x1, x2 = tau_x2, ..., xk = tau_xk), svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\" )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"when-unit-level-data-are-available-for-both-surveys","dir":"","previous_headings":"Basic functionalities","what":"When unit-level data are available for both surveys","title":"Inference Based on Non-Probability Samples","text":"","code":"nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\" ) nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"nn\", family_outcome = \"gaussian\", control_outcome = control_outcome(k = 2) ) nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"pmm\", family_outcome = \"gaussian\" ) nonprob( outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"pmm\", family_outcome = \"gaussian\", control_outcome = control_out(penalty = \"lasso\"), control_inference = control_inf(vars_selection = TRUE) ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, svydesign = prob, method_selection = \"logit\" ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, svydesign = prob, method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1) ) nonprob( selection = ~ x1 + x2 + ... + xk, target = ~ y, data = nonprob, svydesign = prob, method_outcome = \"pmm\", family_outcome = \"gaussian\", control_inference = control_inf(vars_selection = TRUE) ) nonprob( selection = ~ x1 + x2 + ... + xk, outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\" ) nonprob( selection = ~ x1 + x2 + ... + xk, outcome = y ~ x1 + x2 + ... + xk, data = nonprob, svydesign = prob, method_outcome = \"glm\", family_outcome = \"gaussian\", control_inference = control_inf( vars_selection = TRUE, bias_correction = TRUE ) )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"examples","dir":"","previous_headings":"","what":"Examples","title":"Inference Based on Non-Probability Samples","text":"Simulate example data following paper: Kim, Jae Kwang, Zhonglei Wang. “Sampling techniques big data analysis.” International Statistical Review 87 (2019): S177-S191 [section 5.2] Declare svydesign object survey package Estimate population mean y1 based doubly robust estimator using IPW calibration constraints. Results Mass imputation estimator Results Inverse probability weighting estimator Results","code":"library(survey) library(nonprobsvy) set.seed(1234567890) N <- 1e6 ## 1000000 n <- 1000 x1 <- rnorm(n = N, mean = 1, sd = 1) x2 <- rexp(n = N, rate = 1) epsilon <- rnorm(n = N) # rnorm(N) y1 <- 1 + x1 + x2 + epsilon y2 <- 0.5*(x1 - 0.5)^2 + x2 + epsilon p1 <- exp(x2)/(1+exp(x2)) p2 <- exp(-0.5+0.5*(x2-2)^2)/(1+exp(-0.5+0.5*(x2-2)^2)) flag_bd1 <- rbinom(n = N, size = 1, prob = p1) flag_srs <- as.numeric(1:N %in% sample(1:N, size = n)) base_w_srs <- N/n population <- data.frame(x1,x2,y1,y2,p1,p2,base_w_srs, flag_bd1, flag_srs) base_w_bd <- N/sum(population$flag_bd1) sample_prob <- svydesign(ids= ~1, weights = ~ base_w_srs, data = subset(population, flag_srs == 1)) result_dr <- nonprob( selection = ~ x2, outcome = y1 ~ x1 + x2, data = subset(population, flag_bd1 == 1), svydesign = sample_prob ) summary(result_dr) #> #> Call: #> nonprob(data = subset(population, flag_bd1 == 1), selection = ~x2, #> outcome = y1 ~ x1 + x2, svydesign = sample_prob) #> #> ------------------------- #> Estimated population mean: 2.95 with overall std.err of: 0.04195 #> And std.err for nonprobability and probability samples being respectively: #> 0.000783 and 0.04195 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y1 2.867789 3.03224 #> #> #> Based on: Doubly-Robust method #> For a population of estimate size: 1025063 #> Obtained on a nonprobability sample of size: 693011 #> With an auxiliary probability sample of size: 1000 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 0.996282 0.002139 465.8 <2e-16 *** #> x1 1.001931 0.001200 835.3 <2e-16 *** #> x2 0.999125 0.001098 910.2 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -0.498995 0.003702 -134.8 <2e-16 *** #> x2 1.885627 0.005303 355.6 <2e-16 *** #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.000 1.071 1.313 1.479 1.798 2.647 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.99999 0.06603 0.23778 0.26046 0.44358 0.62222 #> #> AIC: 1010622 #> BIC: 1010645 #> Log-Likelihood: -505309 on 694009 Degrees of freedom result_mi <- nonprob( outcome = y1 ~ x1 + x2, data = subset(population, flag_bd1 == 1), svydesign = sample_prob ) summary(result_mi) #> #> Call: #> nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 ~ #> x1 + x2, svydesign = sample_prob) #> #> ------------------------- #> Estimated population mean: 2.95 with overall std.err of: 0.04203 #> And std.err for nonprobability and probability samples being respectively: #> 0.001227 and 0.04201 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y1 2.867433 3.032186 #> #> #> Based on: Mass Imputation method #> For a population of estimate size: 1e+06 #> Obtained on a nonprobability sample of size: 693011 #> With an auxiliary probability sample of size: 1000 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 0.996282 0.002139 465.8 <2e-16 *** #> x1 1.001931 0.001200 835.3 <2e-16 *** #> x2 0.999125 0.001098 910.2 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> ------------------------- result_ipw <- nonprob( selection = ~ x2, target = ~y1, data = subset(population, flag_bd1 == 1), svydesign = sample_prob) summary(result_ipw) #> #> Call: #> nonprob(data = subset(population, flag_bd1 == 1), selection = ~x2, #> target = ~y1, svydesign = sample_prob) #> #> ------------------------- #> Estimated population mean: 2.925 with overall std.err of: 0.04999 #> And std.err for nonprobability and probability samples being respectively: #> 0.001325 and 0.04997 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y1 2.826805 3.022761 #> #> #> Based on: Inverse probability weighted method #> For a population of estimate size: 1025063 #> Obtained on a nonprobability sample of size: 693011 #> With an auxiliary probability sample of size: 1000 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -0.498995 0.003702 -134.8 <2e-16 *** #> x2 1.885627 0.005303 355.6 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.000 1.071 1.313 1.479 1.798 2.647 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.99999 0.06603 0.23778 0.26046 0.44358 0.62222 #> #> AIC: 1010622 #> BIC: 1010645 #> Log-Likelihood: -505309 on 694009 Degrees of freedom"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/index.html","id":"funding","dir":"","previous_headings":"","what":"Funding","title":"Inference Based on Non-Probability Samples","text":"Work package supported National Science Centre, OPUS 20 grant . 2020/39/B/HS4/00941.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":null,"dir":"Reference","previous_headings":"","what":"Admin data (non-probability survey) — admin","title":"Admin data (non-probability survey) — admin","text":"subset Central Job Offers Database, voluntary administrative data set (non-probability sample). data slightly manipulated ensure relationships preserved, aligned. information CBOP, please refer : https://oferty.praca.gov.pl/.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Admin data (non-probability survey) — admin","text":"","code":"admin"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Admin data (non-probability survey) — admin","text":"single data.frame 9,344 rows 6 columns id Identifier entity (company: legal local). private Whether company private (1) public (0) entity. size size entity: S – small (9 employees), M – medium (10-49) L – large (49). nace main NACE code given entity: C, D.E, F, G, H, , J, K.L, M, N, O, P, Q R.S (14 levels, 3 combined: D E, K L, R S). region region Poland (16 levels: 02, 04, ..., 32). single_shift Whether entity seeks employees single shift.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/admin.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Admin data (non-probability survey) — admin","text":"","code":"data(\"admin\") head(admin) #> id private size nace region single_shift #> 1 j_1 0 L P 30 FALSE #> 2 j_2 0 L O 14 TRUE #> 3 j_3 0 L O 04 TRUE #> 4 j_4 0 L O 24 TRUE #> 5 j_5 0 L O 04 TRUE #> 6 j_6 1 L C 28 FALSE"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":null,"dir":"Reference","previous_headings":"","what":"Check the variable balance between the probability and non-probability samples — check_balance","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"Check variable balance probability non-probability samples","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"","code":"check_balance(x, object, dig)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"x Formula specifying variables check object Object nonprobsvy class dig Number digits rounding (default = 2)","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"list containing nonprobability totals, probability totals, differences","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/check_balance.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Check the variable balance between the probability and non-probability samples — check_balance","text":"","code":"data(admin) data(jvs) jvs_svy <- svydesign(ids = ~ 1, weights = ~ weight, strata = ~ size + nace + region, data = jvs) ipw_est1 <- nonprob(selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\" ) ipw_est2 <- nonprob( selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1)) ## check the balance for the standard IPW check_balance(~size, ipw_est1) #> $nonprob_totals #> sizeL sizeM sizeS #> 8193.376 13529.550 31175.205 #> #> $prob_totals #> sizeL sizeM sizeS #> 8561 13758 29551 #> #> $balance #> sizeL sizeM sizeS #> -367.62 -228.45 1624.21 #> ## check the balance for the calibrated IPW check_balance(~size, ipw_est2) #> $nonprob_totals #> sizeL sizeM sizeS #> 8561 13758 29551 #> #> $prob_totals #> sizeL sizeM sizeS #> 8561 13758 29551 #> #> $balance #> sizeL sizeM sizeS #> 0 0 0 #>"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Confidence Intervals for Model Parameters — confint.nonprob","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"function computes confidence intervals selection model coefficients.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"","code":"# S3 method for class 'nonprob' confint(object, parm, level = 0.95, ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"object object nonprob class. parm names parameters confidence intervals computed, missing parameters considered. level confidence level intervals. ... additional arguments","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/confint.nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Confidence Intervals for Model Parameters — confint.nonprob","text":"object named columns include upper lower limit confidence intervals.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":null,"dir":"Reference","previous_headings":"","what":"Control parameters for inference — control_inf","title":"Control parameters for inference — control_inf","text":"control_inf constructs list necessary control parameters statistical inference.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control parameters for inference — control_inf","text":"","code":"control_inf( vars_selection = FALSE, var_method = c(\"analytic\", \"bootstrap\"), rep_type = c(\"subbootstrap\", \"auto\", \"JK1\", \"JKn\", \"BRR\", \"bootstrap\", \"mrbbootstrap\", \"Fay\"), bias_correction = FALSE, bias_inf = c(\"union\", \"div\"), num_boot = 500, alpha = 0.05, cores = 1, keep_boot = TRUE, nn_exact_se = FALSE, pi_ij = NULL )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Control parameters for inference — control_inf","text":"vars_selection default FALSE; TRUE, variables selection model used. var_method variance method (default \"analytic\"). rep_type replication type weights bootstrap method variance estimation passed survey::.svrepdesign(). Default \"subbootstrap\". bias_correction default FALSE; TRUE, bias minimization estimation used model fitting. bias_inf inference method bias minimization. union, final model fitted union selected variables selection outcome models div, final model fitted separately division selected variables relevant ones selection outcome model. num_boot number iteration bootstrap algorithms. alpha significance level (default 0.05). cores number cores parallel computing (default 1). keep_boot logical value indicating whether statistics bootstrap kept (default TRUE) nn_exact_se logical value indicating whether compute exact standard error estimate nn pmm estimator. variance estimator estimation based nn pmm can decomposed three parts, third computed using covariance imputed values units probability sample using predictive matches non-probability sample. situations term negligible computationally expensive default set FALSE, recommended option set value TRUE submitting final results. pi_ij either matrix ppsmat class object (default NULL).","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_inf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Control parameters for inference — control_inf","text":"list selected parameters.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":null,"dir":"Reference","previous_headings":"","what":"Control parameters for outcome model — control_out","title":"Control parameters for outcome model — control_out","text":"control_out constructs list necessary control parameters outcome model.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control parameters for outcome model — control_out","text":"","code":"control_out( epsilon = 1e-04, maxit = 100, trace = FALSE, k = 1, penalty = c(\"SCAD\", \"lasso\", \"MCP\"), a_SCAD = 3.7, a_MCP = 3, lambda_min = 0.001, nlambda = 100, nfolds = 10, treetype = c(\"kd\", \"rp\", \"ball\"), searchtype = c(\"standard\", \"priority\"), pmm_match_type = 1, pmm_weights = c(\"none\", \"prop_dist\"), pmm_k_choice = c(\"none\", \"min_var\"), pmm_reg_engine = c(\"glm\", \"loess\") )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Control parameters for outcome model — control_out","text":"epsilon Tolerance fitting algorithms. Default 1e-6. maxit Maximum number iterations. trace logical value. TRUE trace steps fitting algorithms. Default FALSE. k k parameter RANN::nn2() function. Default 5. penalty penalty algorithm variable selection. Default SCAD a_SCAD tuning parameter SCAD penalty outcome model. Default 3.7. a_MCP tuning parameter MCP penalty outcome model. Default 3. lambda_min smallest value lambda, fraction lambda.max. Default .001. nlambda number lambda values. Default 100. nfolds number folds cross-validation variables selection model. treetype Type tree nearest neighbour imputation (NN PMM estimator) passed RANN::nn2() function. searchtype Type search nearest neighbour imputation (NN PMM estimator) passed RANN::nn2() function. pmm_match_type (PMM Estimator) Indicates select 'closest' unit nonprobability sample unit probability sample. Either 1 (default) 2 2 matching minimizing distance y_i S_A y_j j S_B 1 matching minimizing distance y_i S_A y_i S_A. pmm_weights (PMM Estimator) Indicate weight k nearest neighbours S_B create imputed value units S_A. default value \"none\" indicates mean k nearest y's S_B used whereas \"prop_dist\" results weighted mean k values weights inversely proportional distance matched values. pmm_k_choice (PMM Estimator) Character value indicating k hyper-parameter chosen, default \"none\" meaning k provided control_outcome argument used. now option \"min_var\" means k chosen minimizing estimated variance estimator mean. Parameter k provided control list chosen starting point. pmm_reg_engine (PMM Estimator) whether use parametric (\"glm\") non-parametric (\"loess\") regression model outcome. default \"glm\".","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_out.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Control parameters for outcome model — control_out","text":"List selected parameters.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":null,"dir":"Reference","previous_headings":"","what":"Control parameters for the selection model — control_sel","title":"Control parameters for the selection model — control_sel","text":"control_sel constructs list necessary control parameters selection model.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Control parameters for the selection model — control_sel","text":"","code":"control_sel( est_method = c(\"mle\", \"gee\"), gee_h_fun = 1, optimizer = c(\"maxLik\", \"optim\"), maxlik_method = c(\"NR\", \"BFGS\", \"NM\"), optim_method = c(\"BFGS\", \"Nelder-Mead\"), epsilon = 1e-04, maxit = 500, trace = FALSE, penalty = c(\"SCAD\", \"lasso\", \"MCP\"), a_SCAD = 3.7, a_MCP = 3, lambda = -1, lambda_min = 0.001, nlambda = 50, nfolds = 10, print_level = 0, start_type = c(\"zero\", \"mle\", \"naive\"), nleqslv_method = c(\"Broyden\", \"Newton\"), nleqslv_global = c(\"dbldog\", \"pwldog\", \"cline\", \"qline\", \"gline\", \"hook\", \"none\"), nleqslv_xscalm = c(\"fixed\", \"auto\"), dependence = FALSE, key = NULL )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Control parameters for the selection model — control_sel","text":"est_method Method estimation propensity score model (\"mle\" \"gee\"; default \"mle\"). gee_h_fun Smooth function generalized estimating equations (GEE) method taking following values 1 h(x, ) = (x, )x, 2 h(x, ) = x optimizer (est_method=\"mle\" ) optimization function maximum likelihood estimation. maxlik_method (est_method=\"mle\" ) maximisation method passed maxLik::maxLik() function. Default NR. optim_method (est_method=\"mle\" ) maximisation method passed stats::optim() function. Default BFGS. epsilon Tolerance fitting algorithms default 1e-6. maxit Maximum number iterations. trace logical value. TRUE trace steps fitting algorithms. Default FALSE penalty penalization function used variables selection. a_SCAD tuning parameter SCAD penalty selection model. Default 3.7. a_MCP tuning parameter MCP penalty selection model. Default 3. lambda user-specified value variable selection model fitting. lambda_min smallest value lambda, fraction lambda.max. Default .001. nlambda number lambda values. Default 50. nfolds number folds cross validation. Default 10. print_level argument determines level printing done optimization (propensity score model) process. start_type Type method start points model fitting taking following values zero start vector zeros (default methods). mle (est_method=\"gee\" ) starting parameters taken result est_method=\"mle\" method. nleqslv_method (est_method=\"gee\" ) method passed nleqslv::nleqslv() function. nleqslv_global (est_method=\"gee\" ) global strategy passed nleqslv::nleqslv() function. nleqslv_xscalm (est_method=\"gee\" ) type x scaling passed nleqslv::nleqslv() function. dependence logical value (default TRUE) informing whether samples overlap (YET IMPLEMENTED, FUTURE DEVELOPMENT). key binary key variable allowing identify overlap (YET IMPLEMENTED, FUTURE DEVELOPMENT).","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Control parameters for the selection model — control_sel","text":"List selected parameters.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":null,"dir":"Reference","previous_headings":"","what":"Job Vacancy Survey — jvs","title":"Job Vacancy Survey — jvs","text":"subset Job Vacancy Survey Poland (one quarter). data subject slight manipulation, relationships data preserved. details JVS, please refer following link: https://stat.gov.pl/obszary-tematyczne/rynek-pracy/popyt-na-prace/zeszyt-metodologiczny-popyt-na-prace,3,1.html.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Job Vacancy Survey — jvs","text":"","code":"jvs"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Job Vacancy Survey — jvs","text":"single data.frame 6,523 rows 6 columns id Identifier entity (company: legal local). private Whether company private (1) public (0) entity. size size entity: S – small (9 employees), M – medium (10-49) L – large (49). nace main NACE code given entity: C, D.E, F, G, H, , J, K.L, M, N, O, P, Q R.S (14 levels, 3 combined: D E, K L, R S). region region Poland (16 levels: 02, 04, ..., 32). weight final (calibrated) weight (w-weight). access design weights (d-weights).","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Job Vacancy Survey — jvs","text":"","code":"data(\"jvs\") head(jvs) #> id private size nace region weight #> 1 j_1 0 L O 14 1 #> 2 j_2 0 L O 24 6 #> 3 j_3 0 L R.S 14 1 #> 4 j_4 0 L R.S 14 1 #> 5 j_5 0 L R.S 22 1 #> 6 j_6 0 M R.S 26 1"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_glm.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for the mass imputation model using glm — model_glm","title":"Function for the mass imputation model using glm — model_glm","text":"Modle outcome mass imputation estimator","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_glm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for the mass imputation model using glm — model_glm","text":"","code":"model_glm( y_nons, X_nons, X_rand, weights, svydesign, family_outcome, start_outcome, vars_selection, pop_totals, pop_size, control_outcome, verbose, se )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_glm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for the mass imputation model using glm — model_glm","text":"y_nons target variable non-probability sample X_nons model.matrix auxiliary variables non-probability sample X_rand model.matrix auxiliary variables non-probability sample weights case / frequency weights non-probability sample svydesign svydesign object family_outcome family glm model start_outcome start parameters vars_selection whether variable selection conducted pop_totals population totals nonprob function pop_size population size nonprob function control_outcome controls passed control_out function verbose parameter passed main nonprob function se whether standard errors calculated","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_glm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for the mass imputation model using glm — model_glm","text":"nonprob_model class list following entries model_fitted fitted model either glm.fit cv.ncvreg object y_nons_pred predicted values non-probablity sample y_rand_pred predicted values probability sample population totals coefficients coefficients model (available) svydesign updated surveydesign2 object (new column y_hat_MI added) y_mi_hat estimated population mean target variable vars_selection whether variable selection performed var_prob variance probability sample component (available) var_nonprob variance non-probability sampl component model model type (character \"glm\")","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_nn.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for the mass imputation model using nn method — model_nn","title":"Function for the mass imputation model using nn method — model_nn","text":"Model outcome mass imputation estimator","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_nn.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for the mass imputation model using nn method — model_nn","text":"","code":"model_nn( y_nons, X_nons, X_rand, weights, svydesign, family_outcome, start_outcome, vars_selection, pop_totals, pop_size, control_outcome, verbose, se )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_nn.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for the mass imputation model using nn method — model_nn","text":"y_nons target variable non-probability sample X_nons model.matrix auxiliary variables non-probability sample X_rand model.matrix auxiliary variables non-probability sample weights case / frequency weights non-probability sample svydesign svydesign object family_outcome family glm model start_outcome start parameters vars_selection whether variable selection conducted pop_totals population totals nonprob function pop_size population size nonprob function control_outcome controls passed control_out function verbose parameter passed main nonprob function se whether standard errors calculated","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_nn.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for the mass imputation model using nn method — model_nn","text":"nonprob_model class list following entries model_fitted fitted model either glm.fit cv.ncvreg object y_nons_pred predicted values non-probablity sample y_rand_pred predicted values probability sample population totals coefficients coefficients model (available) svydesign updated surveydesign2 object (new column y_hat_MI added) y_mi_hat estimated population mean target variable vars_selection whether variable selection performed var_prob variance probability sample component (available) var_nonprob variance non-probability sampl component model model type (character \"nn\")","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_npar.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for the mass imputation model using nonparametric method — model_npar","title":"Function for the mass imputation model using nonparametric method — model_npar","text":"Model outcome mass imputation estimator","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_npar.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for the mass imputation model using nonparametric method — model_npar","text":"","code":"model_npar( y_nons, X_nons, X_rand, weights, svydesign, family_outcome, start_outcome, vars_selection, pop_totals, pop_size, control_outcome, verbose, se )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_npar.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for the mass imputation model using nonparametric method — model_npar","text":"y_nons target variable non-probability sample X_nons model.matrix auxiliary variables non-probability sample X_rand model.matrix auxiliary variables non-probability sample weights case / frequency weights non-probability sample svydesign svydesign object family_outcome family glm model start_outcome start parameters vars_selection whether variable selection conducted pop_totals population totals nonprob function pop_size population size nonprob function control_outcome controls passed control_out function verbose parameter passed main nonprob function se whether standard errors calculated","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_npar.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for the mass imputation model using nonparametric method — model_npar","text":"nonprob_model class list following entries model_fitted fitted model either glm.fit cv.ncvreg object y_nons_pred predicted values non-probablity sample y_rand_pred predicted values probability sample population totals coefficients coefficients model (available) svydesign updated surveydesign2 object (new column y_hat_MI added) y_mi_hat estimated population mean target variable vars_selection whether variable selection performed var_prob variance probability sample component (available) var_nonprob variance non-probability sampl component model model type (character \"npar\")","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_pmm.html","id":null,"dir":"Reference","previous_headings":"","what":"Function for the mass imputation model using pmm method — model_pmm","title":"Function for the mass imputation model using pmm method — model_pmm","text":"Model outcome mass imputation estimator","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_pmm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function for the mass imputation model using pmm method — model_pmm","text":"","code":"model_pmm( y_nons, X_nons, X_rand, weights, svydesign, family_outcome, start_outcome, vars_selection, pop_totals, pop_size, control_outcome, verbose, se )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_pmm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function for the mass imputation model using pmm method — model_pmm","text":"y_nons target variable non-probability sample X_nons model.matrix auxiliary variables non-probability sample X_rand model.matrix auxiliary variables non-probability sample weights case / frequency weights non-probability sample svydesign svydesign object family_outcome family glm model start_outcome start parameters vars_selection whether variable selection conducted pop_totals population totals nonprob function pop_size population size nonprob function control_outcome controls passed control_out function verbose parameter passed main nonprob function se whether standard errors calculated","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_pmm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function for the mass imputation model using pmm method — model_pmm","text":"nonprob_model class list following entries model_fitted fitted model either glm.fit cv.ncvreg object y_nons_pred predicted values non-probablity sample y_rand_pred predicted values probability sample population totals coefficients coefficients model (available) svydesign updated surveydesign2 object (new column y_hat_MI added) y_mi_hat estimated population mean target variable vars_selection whether variable selection performed var_prob variance probability sample component (available) var_nonprob variance non-probability sampl component model model type (character \"pmm\")","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":null,"dir":"Reference","previous_headings":"","what":"Propensity score model — model_ps","title":"Propensity score model — model_ps","text":"Function specify propensity score (PS) model inverse probability weighting estimator. function provides basic functions logistic regression given link function (currently support logit, probit cloglog) additional information analytic variance estimator mean.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Propensity score model — model_ps","text":"","code":"model_ps(link = c(\"logit\", \"probit\", \"cloglog\"), ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Propensity score model — model_ps","text":"link link PS model ... Additional, optional arguments.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Propensity score model — model_ps","text":"list functions elements specific link function following entries: make_log_like log-likelihood function specific link function make_gradient gradient loglik make_hessian hessian loglik make_link link function make_link_inv inverse link function make_link_der first derivative link function make_link_inv_der first derivative inverse link function make_link_inv_rev TBA make_link_inv_rev_der TBA variance_covariance1 TBA variance_covariance2 TBA b_vec_ipw TBA b_vec_dr TBA t_vec TBA var_nonprob TBA link name selected link function PS model (character) model model type (character)","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Propensity score model — model_ps","text":"Łukasz Chrostowski, Maciej Beręsewicz","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Propensity score model — model_ps","text":"","code":"# Printing information on the model selected model_ps() #> [1] \"Propensity score model with logit link\" # extracting specific field model_ps(\"cloglog\")$make_gradient #> function (X_nons, X_rand, weights, weights_rand, ...) #> { #> function(theta) { #> eta1 <- as.matrix(X_nons) %*% theta #> eta2 <- as.matrix(X_rand) %*% theta #> invLink1 <- inv_link(eta1) #> invLink2 <- inv_link(eta2) #> t(t(X_nons) %*% (weights * exp(eta1)/invLink1) - t(X_rand) %*% #> (weights_rand * exp(eta2))) #> } #> } #> #> "},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Inference with non-probability survey samples — nonprob","title":"Inference with non-probability survey samples — nonprob","text":"nonprob function provides access various methods inference based non-probability surveys (including big data). function allows estimate population mean based access reference probability sample (via survey package), well totals means covariates. package implements state---art approaches recently proposed literature: Chen et al. (2020), Yang et al. (2020), Wu (2022) uses Lumley 2004 survey package inference (reference probability sample provided). provides various propensity score weighting (e.g. calibration constraints), mass imputation (e.g. nearest neighbour, predictive mean matching) doubly robust estimators (e.g. take account minimisation asymptotic bias population mean estimators). package uses survey package functionality probability sample available. optional parameters set NULL. obligatory ones include data well one following three: selection, outcome, target – depending method selected. case outcome target multiple y variables can specified.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inference with non-probability survey samples — nonprob","text":"","code":"nonprob( data, selection = NULL, outcome = NULL, target = NULL, svydesign = NULL, pop_totals = NULL, pop_means = NULL, pop_size = NULL, method_selection = c(\"logit\", \"cloglog\", \"probit\"), method_outcome = c(\"glm\", \"nn\", \"pmm\"), family_outcome = c(\"gaussian\", \"binomial\", \"poisson\"), subset = NULL, strata = NULL, weights = NULL, na_action = NULL, control_selection = control_sel(), control_outcome = control_out(), control_inference = control_inf(), start_selection = NULL, start_outcome = NULL, verbose = FALSE, x = TRUE, y = TRUE, se = TRUE, ... ) nonprob_dr( selection, outcome, data, svydesign, pop_totals, pop_means, pop_size, method_selection, method_outcome, family_outcome = \"gaussian\", subset, strata, weights, na_action, control_selection, control_outcome, control_inference, start_outcome, start_selection, verbose, x, y, se, ... ) nonprob_mi( outcome, data, svydesign, pop_totals, pop_means, pop_size, method_outcome, family_outcome = \"gaussian\", subset, strata, weights, na_action, control_outcome, control_inference, start_outcome, verbose, x, y, se, ... )"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inference with non-probability survey samples — nonprob","text":"data data.frame dataset containing non-probability sample. selection formula (default NULL) selection (propensity) score model. outcome formula (default NULL) outcome (target) model. target formula (default NULL) target variable(s). allow multiple target variables (e.g. ~y1 + y2 + y3). svydesign optional svydesign2 class object containing probability sample design weights. pop_totals optional named vector population totals covariates. pop_means optional named vector population means covariates. pop_size optional double value population size. method_selection character (default logit) indicating method propensity score link function. method_outcome character (default glm) indicating method outcome model. family_outcome character (default gaussian) describing error distribution link function used model. Currently supports: gaussian identity link, poisson binomial. subset optional vector specifying subset observations used fitting process - yet supported. strata optional vector specifying strata (yet supported, development). weights optional vector prior weights used fitting process. assumed vector contains frequency analytic weights (.e. rows data argument repeated according values weights argument), probability/design weights. na_action function indicates happen data contain NAs (yet supported, development). control_selection list (default control_sel() result) indicating parameters used fitting selection model propensity scores. change parameters one use control_sel() function. control_outcome list (default control_out() result) indicating parameters used fitting model outcome variable. change parameters one use control_out() function. control_inference list (default control_inf() result) indicating parameters used inference based probability non-probability samples. change parameters one use control_inf() function. start_selection optional vector starting values parameters selection equation. start_outcome optional vector starting values parameters outcome equation. verbose numerical value (default TRUE) whether detailed information fitting presented. x logical value (default TRUE) indicating whether return model matrix covariates part output. y logical value (default TRUE) indicating whether return vector outcome variable part output. se Logical value (default TRUE) indicating whether calculate return standard error estimated mean. ... Additional, optional arguments.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inference with non-probability survey samples — nonprob","text":"Returns object class c(\"nonprobsvy\", \"nonprobsvy_ipw\") case inverse probability weighting estimator, c(\"nonprobsvy\", \"nonprobsvy_mi\") case mass imputation estimator, c(\"nonprobsvy\", \"nonprobsvy_dr\") case doubly robust estimator, type list containing: X – model.matrix containing data probability non-probability samples specified function call. y – list vector outcome variables specified function call. R – numeric vector indicating whether unit belongs probability (0) non-probability (1) units matrix X. prob – numeric vector estimated propensity scores non-probability sample. weights – vector estimated weights non-probability sample. control – list control functions. output – output model information estimated population mean standard errors. SE – data.frame standard error estimator population mean, divided errors probability non-probability samples. confidence_interval – data.frame confidence interval population mean estimator. nonprob_size – scalar numeric vector denoting size non-probability sample. prob_size – scalar numeric vector denoting size probability sample. pop_size – scalar numeric vector estimated population size derived estimated weights (non-probability sample) known design weights (probability sample). pop_totals – numeric vector total values auxiliary variables derived probability sample numeric vector total/mean values. estimator – character vector information type estimator selected (one c(\"ipw\", \"mi\", \"dr\")). outcome – list containing information fitting mass imputation model, case regression model object containing list returned stats::glm(), case nearest neighbour imputation object containing list returned RANN::nn2(). bias_correction control_inf() set TRUE, estimation based joint estimating equations selection outcome model therefore, list different one returned stats::glm() function contains elements coefficients – numeric vector estimated coefficients regression model. std_err – numeric vector standard errors estimated coefficients. residuals – numeric vector response residuals. variance_covariance – matrix variance-covariance matrix coefficient estimates. df_residual – scalar vector degrees freedom residuals. family – character specifies error distribution link function used model. fitted.values – numeric vector predicted values response variable based fitted model. linear.predictors – numeric vector linear fit link scale. X – matrix design matrix (model.matrix) method – set glm, since regression method. model_frame – model.matrix data probability sample used mass imputation. cve – error value lambda, averaged across cross-validation folds. selection – list containing information fitting propensity score model, coefficients – numeric vector coefficients. std_err – numeric vector standard errors estimated model coefficients. residuals – numeric vector response residuals. variance – scalar numeric vector root mean square error. fitted_values – numeric vector fitted mean values, obtained transforming linear predictors inverse link function. link – link object used. linear_predictors – numeric vector linear fit link scale. aic –\tversion Akaike's Information Criterion, minus twice maximized log-likelihood plus twice number parameters. weights – numeric vector estimated weights non-probability sample. prior.weights – numeric vector frequency weights initially supplied, vector 1s none . est_totals – numeric vector estimated total values auxiliary variables derived non-probability sample. formula – formula supplied. df_residual – residual degrees freedom. log_likelihood – value log-likelihood function mle method, case NA. cve – error value lambda, averaged across cross-validation folds variable selection model propensity score model fitting. Returned selection variables model used. method_selection – Link function, e.g. logit, cloglog probit. hessian – Hessian Gradient log-likelihood function mle method. gradient – Gradient log-likelihood function mle method. method – estimation method selection model, e.g. mle gee. prob_der – Derivative inclusion probability function units non–probability sample. prob_rand – Inclusion probabilities unit probability sample svydesign object. prob_rand_est – Inclusion probabilities non-probability sample unit probability sample. prob_rand_est_der – Derivative inclusion probabilities non–probability sample unit probability sample. stat – matrix estimated population means bootstrap iteration. Returned bootstrap method used estimate variance keep_boot control_inf() set TRUE.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inference with non-probability survey samples — nonprob","text":"Let y response variable want estimate population mean, given _y = 1N _i=1^N y_i. purpose consider data integration following structure. Let S_A non-probability sample design matrix covariates X_A = bmatrix x_11 & x_12 & & x_1p x_21 & x_22 & & x_2p & & & x_n_A1 & x_n_A2 & & x_n_Ap bmatrix vector outcome variable y = bmatrix y_1 y_2 y_n_A. bmatrix hand, let S_B probability sample design matrix covariates X_B = bmatrix x_11 & x_12 & & x_1p x_21 & x_22 & & x_2p & & & x_n_B1 & x_n_B2 & & x_n_Bp. bmatrix Instead sample units can consider vector population sums form _x = (_i Ux_i1, _i Ux_i2, ..., _i Ux_ip) means _xN, U refers finite population. Note assume access response variable S_B. general make following assumptions: selection indicator belonging non-probability sample R_i response variable y_i independent given set covariates x_i. units non-zero propensity score, .e., _i^> 0 . indicator variables R_i^R_j^independent given x_i x_j j. three possible approaches problem estimating population mean using non-probability samples: Inverse probability weighting – main drawback non-probability sampling unknown selection mechanism unit included sample. talk -called \"biased sample\" problem. inverse probability approach based assumption reference probability sample available therefore can estimate propensity score selection mechanism. estimator following form: _IPW = 1N^A_i S_A y_i_i^. purpose several estimation methods can considered. first approach maximum likelihood estimation corrected log-likelihood function, given following formula ^*() = _i S_A (x_i, )1 - (x_i,) + _i S_Bd_i^B 1 - (x_i,). literature, main approach modelling propensity scores based logit link function. However, extend propensity score model additional link functions cloglog probit. pseudo-score equations derived ML methods can replaced idea generalised estimating equations calibration constraints defined equations. U()=_i S_A h(x_i, )-_i S_B d_i^B (x_i, ) h(x_i, ). Notice h(x_i, ) = (x, )x need probability sample can use vector population totals/means. Mass imputation – method based framework imputed values outcome variables created entire probability sample. case, treat large sample training data set used build imputation model. Using imputed values probability sample (known) design weights, can build population mean estimator form: _MI = 1N^B_i S_B d_i^B y_i. opens door flexible method imputation models. package uses generalized linear models stats::glm(), nearest neighbour algorithm using RANN::nn2() predictive mean matching. Doubly robust estimation – IPW MI estimators sensitive misspecified models propensity score outcome variable, respectively. end, -called doubly robust methods presented take problems account. simple idea combine propensity score imputation models inference, leading following estimator _DR = 1N^A_i S_A d_i^(y_i - y_i) + 1N^B_i S_B d_i^B y_i. addition, approach based directly bias minimisation implemented. following formula aligned bias(_DR) = & E (_DR - ) = & E 1N _i=1^N (R_i^A_i^(x_i^T ) - 1 ) (y_i - m(x_i^T )) + & E 1N _i=1^N (R_i^B d_i^B - 1) m( x_i^T ) , aligned lead us system equations aligned J(, ) = arrayc J_1(, ) J_2(, ) array = arrayc _i=1^N R_i^\\ 1(x_i, )-1 y_i-m(x_i, ) x_i _i=1^N R_i^(x_i, ) m(x_i, ) - _i S_B d_i^B m(x_i, ) array , aligned m(x_i, ) mass imputation (regression) model outcome variable propensity scores _i^estimated using logit function model. MLE GEE approaches extended method cloglog probit links. straightforward calculate variances estimators, asymptotic equivalents variances derived using Taylor approximation proposed literature. Details can found . addition, bootstrap approach can used variance estimation. function also allows variables selection using known methods implemented handle integration probability non-probability sampling. presence high-dimensional data, variable selection important, can reduce variability estimate results using irrelevant variables build model. Let U( , ) joint estimating function ( , ). define penalized estimating functions U^p (, ) = U(, ) - arrayc q__(||) sgn() \\ q__(|\\boldsymbol|) sgn() array , _ q__ smooth functions. let q_ (x) = p_ x, p_ penalization function. Details penalization functions techniques solving type equation can found . use variable selection model, set vars_selection parameter control_inf() function TRUE. addition, control functions control_sel() control_out() can set parameters selection relevant variables, number folds cross-validation algorithm lambda value penalizations. Details can found documentation control functions nonprob.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Inference with non-probability survey samples — nonprob","text":"Kim JK, Park S, Chen Y, Wu C. Combining non-probability probability survey samples mass imputation. J R Stat Soc Series . 2021;184:941– 963. Shu Yang, Jae Kwang Kim, Rui Song. Doubly robust inference combining probability non-probability samples high dimensional data. J. R. Statist. Soc. B (2020) Yilin Chen , Pengfei Li & Changbao Wu (2020) Doubly Robust Inference Nonprobability Survey Samples, Journal American Statistical Association, 115:532, 2011-2021 Shu Yang, Jae Kwang Kim Youngdeok Hwang Integration data probability surveys big found data finite population inference using mass imputation. Survey Methodology, June 2021 29 Vol. 47, . 1, pp. 29-58","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"Inference with non-probability survey samples — nonprob","text":"Łukasz Chrostowski, Maciej Beręsewicz, Piotr Chlebicki","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Inference with non-probability survey samples — nonprob","text":"","code":"# \\donttest{ # generate data based on Doubly Robust Inference With Non-probability Survey Samples (2021) # Yilin Chen , Pengfei Li & Changbao Wu library(sampling) #> #> Attaching package: ‘sampling’ #> The following objects are masked from ‘package:survival’: #> #> cluster, strata set.seed(123) # sizes of population and probability sample N <- 20000 # population n_b <- 1000 # probability # data z1 <- rbinom(N, 1, 0.7) z2 <- runif(N, 0, 2) z3 <- rexp(N, 1) z4 <- rchisq(N, 4) # covariates x1 <- z1 x2 <- z2 + 0.3 * z2 x3 <- z3 + 0.2 * (z1 + z2) x4 <- z4 + 0.1 * (z1 + z2 + z3) epsilon <- rnorm(N) sigma_30 <- 10.4 sigma_50 <- 5.2 sigma_80 <- 2.4 # response variables y30 <- 2 + x1 + x2 + x3 + x4 + sigma_30 * epsilon y50 <- 2 + x1 + x2 + x3 + x4 + sigma_50 * epsilon y80 <- 2 + x1 + x2 + x3 + x4 + sigma_80 * epsilon # population sim_data <- data.frame(y30, y50, y80, x1, x2, x3, x4) ## propensity score model for non-probability sample (sum to 1000) eta <- -4.461 + 0.1 * x1 + 0.2 * x2 + 0.1 * x3 + 0.2 * x4 rho <- plogis(eta) # inclusion probabilities for probability sample z_prob <- x3 + 0.2051 sim_data$p_prob <- inclusionprobabilities(z_prob, n = n_b) # data sim_data$flag_nonprob <- UPpoisson(rho) ## sampling nonprob sim_data$flag_prob <- UPpoisson(sim_data$p_prob) ## sampling prob nonprob_df <- subset(sim_data, flag_nonprob == 1) ## non-probability sample svyprob <- svydesign( ids = ~1, probs = ~p_prob, data = subset(sim_data, flag_prob == 1), pps = \"brewer\" ) ## probability sample ## mass imputation estimator MI_res <- nonprob( outcome = y80 ~ x1 + x2 + x3 + x4, data = nonprob_df, svydesign = svyprob ) summary(MI_res) #> #> Call: #> nonprob(data = nonprob_df, outcome = y80 ~ x1 + x2 + x3 + x4, #> svydesign = svyprob) #> #> ------------------------- #> Estimated population mean: 9.518 with overall std.err of: 0.151 #> And std.err for nonprobability and probability samples being respectively: #> 0.08679 and 0.1236 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y80 9.222349 9.814346 #> #> #> Based on: Mass Imputation method #> For a population of estimate size: 21631.63 #> Obtained on a nonprobability sample of size: 1032 #> With an auxiliary probability sample of size: 1044 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 1.93113 0.24859 7.768 7.95e-15 *** #> x1 1.06616 0.16954 6.289 3.20e-10 *** #> x2 1.04125 0.09731 10.700 < 2e-16 *** #> x3 0.98891 0.06927 14.277 < 2e-16 *** #> x4 0.98930 0.01904 51.946 < 2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> ------------------------- #> ## inverse probability weighted estimator IPW_res <- nonprob( selection = ~ x1 + x2 + x3 + x4, target = ~y80, data = nonprob_df, svydesign = svyprob ) summary(IPW_res) #> #> Call: #> nonprob(data = nonprob_df, selection = ~x1 + x2 + x3 + x4, target = ~y80, #> svydesign = svyprob) #> #> ------------------------- #> Estimated population mean: 9.718 with overall std.err of: 0.1962 #> And std.err for nonprobability and probability samples being respectively: #> 0.1331 and 0.1442 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y80 9.332946 10.10219 #> #> #> Based on: Inverse probability weighted method #> For a population of estimate size: 21127.42 #> Obtained on a nonprobability sample of size: 1032 #> With an auxiliary probability sample of size: 1044 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -4.582648 0.105508 -43.434 < 2e-16 *** #> x1 0.102633 0.074416 1.379 0.168 #> x2 0.234848 0.042871 5.478 4.30e-08 *** #> x3 0.181639 0.029253 6.209 5.33e-10 *** #> x4 0.184285 0.008568 21.508 < 2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.172 10.583 18.137 20.472 27.940 79.563 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.56121 -0.04204 -0.01457 0.43052 0.94475 0.98743 #> #> AIC: 7797.97 #> BIC: 7826.161 #> Log-Likelihood: -3893.985 on 2071 Degrees of freedom ## doubly robust estimator DR_res <- nonprob( outcome = y80 ~ x1 + x2 + x3 + x4, selection = ~ x1 + x2 + x3 + x4, data = nonprob_df, svydesign = svyprob ) summary(DR_res) #> #> Call: #> nonprob(data = nonprob_df, selection = ~x1 + x2 + x3 + x4, outcome = y80 ~ #> x1 + x2 + x3 + x4, svydesign = svyprob) #> #> ------------------------- #> Estimated population mean: 9.483 with overall std.err of: 0.1525 #> And std.err for nonprobability and probability samples being respectively: #> 0.08508 and 0.1265 #> #> 95% Confidence inverval for popualtion mean: #> lower_bound upper_bound #> y80 9.183858 9.781461 #> #> #> Based on: Doubly-Robust method #> For a population of estimate size: 21127.42 #> Obtained on a nonprobability sample of size: 1032 #> With an auxiliary probability sample of size: 1044 #> ------------------------- #> #> Regression coefficients: #> ----------------------- #> For glm regression on outcome variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) 1.93113 0.24859 7.768 7.95e-15 *** #> x1 1.06616 0.16954 6.289 3.20e-10 *** #> x2 1.04125 0.09731 10.700 < 2e-16 *** #> x3 0.98891 0.06927 14.277 < 2e-16 *** #> x4 0.98930 0.01904 51.946 < 2e-16 *** #> --- #> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 #> #> ----------------------- #> For glm regression on selection variable: #> Estimate Std. Error z value P(>|z|) #> (Intercept) -4.582648 0.105508 -43.434 < 2e-16 *** #> x1 0.102633 0.074416 1.379 0.168 #> x2 0.234848 0.042871 5.478 4.30e-08 *** #> x3 0.181639 0.029253 6.209 5.33e-10 *** #> x4 0.184285 0.008568 21.508 < 2e-16 *** #> ------------------------- #> #> Weights: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 1.172 10.583 18.137 20.472 27.940 79.563 #> ------------------------- #> #> Residuals: #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> -0.56121 -0.04204 -0.01457 0.43052 0.94475 0.98743 #> #> AIC: 7797.97 #> BIC: 7826.161 #> Log-Likelihood: -3893.985 on 2071 Degrees of freedom # }"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns population size (estimated or fixed) — pop_size","title":"Returns population size (estimated or fixed) — pop_size","text":"Returns population size assumed fixed – based pop_size argument, estimated – based probability survey specified svydesign based estimated propensity scores non-probability sample.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns population size (estimated or fixed) — pop_size","text":"","code":"pop_size(object)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns population size (estimated or fixed) — pop_size","text":"object object returned nonprob function.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns population size (estimated or fixed) — pop_size","text":"scalar returning value population size.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns population size (estimated or fixed) — pop_size","text":"","code":"data(admin) data(jvs) jvs_svy <- svydesign(ids = ~ 1, weights = ~ weight, strata = ~ size + nace + region, data = jvs) ipw_est1 <- nonprob(selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\" ) ipw_est2 <- nonprob( selection = ~ region + private + nace + size, target = ~ single_shift, svydesign = jvs_svy, data = admin, method_selection = \"logit\", control_selection = control_sel(est_method = \"gee\", gee_h_fun = 1)) ## estimated population size based on the non-calibrated IPW (MLE) pop_size(ipw_est1) #> pop_size #> 52898.13 ## estimated population size based on the calibrated IPW (GEE) pop_size(ipw_est2) #> pop_size #> 51870"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Summary statistics for model of the nonprob class. — summary.nonprob","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"Summary statistics model nonprob class.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"","code":"# S3 method for class 'nonprob' summary(object, test = c(\"t\", \"z\"), correlation = FALSE, cov = NULL, ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"object object nonprob class test Type test significance parameters \"t\" t-test \"z\" normal approximation students t distribution, default \"z\" used 30 degrees freedom \"t\" used cases. correlation correlation Logical value indicating whether correlation matrix computed covariance matrix default FALSE. cov Covariance matrix corresponding regression parameters ... Additional optional arguments","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/summary.nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Summary statistics for model of the nonprob class. — summary.nonprob","text":"object summary_nonprob class containing: call – call created object. pop_total – list containing information estimated population mean, standard error confidence interval. sample_size – size samples used model. population_size – estimated size population non–probability sample drawn. test – Type statistical test performed. control – List control parameters used fitting model. model – descriptive name model used, e.g., \"Doubly-Robust\", \"Inverse probability weighted\", \"Mass Imputation\". aic – Akaike's information criterion. bic – Bayesian (Schwarz's) information criterion. residuals – Residuals model, providing information difference observed predicted values. likelihood – Logarithm likelihood function evaluated coefficients. df_residual – Residual degrees freedom. weights – Distribution estimated weights obtained model. coef – Regression coefficients estimated model. std_err – Standard errors regression coefficients. w_val – Wald statistic values significance testing coefficients. p_values – P-values corresponding Wald statistic values, assessing significance coefficients. crr – correlation matrix model coefficients, requested. confidence_interval_coef – Confidence intervals model coefficients. names – Names fitted models.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":null,"dir":"Reference","previous_headings":"","what":"Obtain Covariance Matrix estimation. — vcov.nonprob","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"vcov method `nonprob` class.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"","code":"# S3 method for class 'nonprob' vcov(object, ...)"},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"object object nonprob class. ... additional arguments method functions","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"covariance matrix fitted coefficients","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/reference/vcov.nonprob.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Obtain Covariance Matrix estimation. — vcov.nonprob","text":"Returns estimated covariance matrix model coefficients calculated analytic hessian Fisher information matrix usually utilising asymptotic effectiveness maximum likelihood estimates.","code":""},{"path":[]},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"breaking-changes-0-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"nonprobsvy 0.2.0","text":"functions pop.size, controlSel, controlOut controlInf renamed pop_size, control_sel, control_out control_inf respectively. function genSimData removed completely used anywhere package. argument maxLik_method renamed maxlik_method control_sel function. predictive_match renamed pmm_match_type align PMM (Predictive Mean Matching) estimator naming convention, related parameters start pmm_ argument method removed used argument est_method_sel renamed est_method argument h renamed gee_h_fun make readable user start_type now accepts zero mle (gee models ). nonprobsvy class renamed nonprob related method adjusted change functions logit_model_nonprobsvy, probit_model_nonprobsvy cloglog_model_nonprobsvy removed favour readable model_ps function specifies propensity score model","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"features-0-2-0","dir":"Changelog","previous_headings":"","what":"Features","title":"nonprobsvy 0.2.0","text":"two additional datasets included: jvs (Job Vacancy Survey; probability sample survey) admin (Central Job Offers Database; non-probability sample survey). units auxiliary variables aligned way allows data integrated using methods implemented package. check_balance function added check balance totals variables based weighted weights non-probability probability samples. citation file added. model_ps – modelling propensity score model_glm – modelling y using glm function model_nn – NN method model_pmm – PMM method model_npar – non-parametric method","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"bugfixes-0-2-0","dir":"Changelog","previous_headings":"","what":"Bugfixes","title":"nonprobsvy 0.2.0","text":"basic methods functions related variance estimation, weights probability linking methods rewritten optimal readable way.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"other-0-2-0","dir":"Changelog","previous_headings":"","what":"Other","title":"nonprobsvy 0.2.0","text":"informative error messages added. documentation improved. switching completely snake_case. extensive cleaning code. unit-tests added.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"documentation-0-2-0","dir":"Changelog","previous_headings":"","what":"Documentation","title":"nonprobsvy 0.2.0","text":"annotation added arguments strata, subset na_action supported time .","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"nonprobsvy-011","dir":"Changelog","previous_headings":"","what":"nonprobsvy 0.1.1","title":"nonprobsvy 0.1.1","text":"CRAN release: 2024-11-14","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"bugfixes-0-1-1","dir":"Changelog","previous_headings":"","what":"Bugfixes","title":"nonprobsvy 0.1.1","text":"bug Fix occurring estimation based auxiliary variable, led compression data frame vector. bug Fix related passing maxit argument controlSel function internally used nleqslv function bug Fix related storing vector model_frame predicting y_hat mass imputation glm model X based one auxiliary variable - fix provided converting data.frame object.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"features-0-1-1","dir":"Changelog","previous_headings":"","what":"Features","title":"nonprobsvy 0.1.1","text":"added information summary quality estimation basing difference estimated known total values auxiliary variables added estimation exact standard error k-nearest neighbor estimator. added breaking change controlOut function switching values predictive_match argument. now , predictive_match = 1 means ŷ−ŷ\\hat{y}-\\hat{y} predictive mean matching imputation predictive_match = 2 corresponds ŷ−y\\hat{y}-y matching. implemented div option variable selection (documentation) doubly robust estimation. added insights nonprob output gradient, hessian jacobian derived IPW estimation mle gee methods IPW DR model executed. added estimated inclusion probabilities derivatives probability non-probability samples nonprob output IPW DR model executed. added model_frame matrix data probability sample used mass imputation nonprob MI DR model executed.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"unit-tests-0-1-1","dir":"Changelog","previous_headings":"","what":"Unit tests","title":"nonprobsvy 0.1.1","text":"added unit tests variable selection models mi estimation vector population totals available","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"nonprobsvy-010","dir":"Changelog","previous_headings":"","what":"nonprobsvy 0.1.0","title":"nonprobsvy 0.1.0","text":"CRAN release: 2024-04-04","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"features-0-1-0","dir":"Changelog","previous_headings":"","what":"Features","title":"nonprobsvy 0.1.0","text":"implemented population mean estimation using doubly robust, inverse probability weighting mass imputation methods implemented inverse probability weighting models Maximum Likelihood Estimation Generalized Estimating Equations methods logit, complementary log-log probit link functions. implemented generalized linear models, nearest neighbours predictive mean matching methods Mass Imputation implemented bias correction estimators doubly-robust approach implemented estimation methods vector population means/totals available implemented variables selection SCAD, LASSO MCP penalization equations implemented analytic bootstrap (parallel computation - doSNOW package) variance described estimators added control parameters models nobs samples size pop.size population size estimation residuals residuals inverse probability weighting model cooks.distance identifying influential observations significant impact parameter estimates hatvalues measuring leverage individual observations logLik computing log-likelihood model, AIC (Akaike Information Criterion) evaluating model based trade-goodness fit complexity, helping model selection BIC (Bayesian Information Criterion) similar purpose AIC stronger penalty model complexity confint calculating confidence intervals around parameter estimates vcov obtaining variance-covariance matrix parameter estimates deviance assessing goodness fit model","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"unit-tests-0-1-0","dir":"Changelog","previous_headings":"","what":"Unit tests","title":"nonprobsvy 0.1.0","text":"added unit tests IPW estimators.","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"github-repository-0-1-0","dir":"Changelog","previous_headings":"","what":"Github repository","title":"nonprobsvy 0.1.0","text":"added automated R-cmd check","code":""},{"path":"https://ncn-foreigners.github.io/nonprobsvy/news/index.html","id":"documentation-0-1-0","dir":"Changelog","previous_headings":"","what":"Documentation","title":"nonprobsvy 0.1.0","text":"added documentation nonprob function.","code":""}] diff --git a/sitemap.xml b/sitemap.xml index 431bdbe..4ee35ff 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -13,6 +13,10 @@ https://ncn-foreigners.github.io/nonprobsvy/reference/control_sel.html https://ncn-foreigners.github.io/nonprobsvy/reference/index.html https://ncn-foreigners.github.io/nonprobsvy/reference/jvs.html +https://ncn-foreigners.github.io/nonprobsvy/reference/model_glm.html +https://ncn-foreigners.github.io/nonprobsvy/reference/model_nn.html +https://ncn-foreigners.github.io/nonprobsvy/reference/model_npar.html +https://ncn-foreigners.github.io/nonprobsvy/reference/model_pmm.html https://ncn-foreigners.github.io/nonprobsvy/reference/model_ps.html https://ncn-foreigners.github.io/nonprobsvy/reference/nonprob.html https://ncn-foreigners.github.io/nonprobsvy/reference/pop_size.html