Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error message when exclusion causes recycling #575

Open
jamesmartherus opened this issue Jul 2, 2021 · 1 comment
Open

Add error message when exclusion causes recycling #575

jamesmartherus opened this issue Jul 2, 2021 · 1 comment

Comments

@jamesmartherus
Copy link

I was trying to create a variable using paste() and the dataset exclusion was causing unwanted recycling. Currently there is no error or warning message alerting the user that this has occurred. Below is a conversation that includes a reproducible example.

James Martherus 3:56 PM
Is there a simple way to concatenate character variables in crunch? I want something like
ds$newid <- paste0(as.vector(ds$wave_cat, mode="id"), as.vector(ds$country), as.vector(ds$identity))

beb 4:30 PM
Did you try it? That looks like it would work to me. The only reason it might not is if you have some kind of weird exclusions or you're using crunchLogical expressions in some other way.

James Martherus 4:32 PM
Hmm, i'll have to check the exclusion. The above will run without error, but I get missing values in the new variable despite no missing in the source variables.

[crunch] > head(table(ds$newid))
newid
1DE118365752 1DE123990096 1DE127875406 1DE143843854  1DE16211082  1DE22253890 
         173          173          173          173          173          173 

4:43
this should be a unique identifier

4:44

[crunch] > 
tmp <- paste0(as.vector(ds$wave_cat, mode="id"), as.vector(ds$country), as.vector(ds$identity))
head(table(tmp))
tmp
1DE100043566 1DE100075770 1DE100147516 1DE100238568 1DE100334674 1DE100588748 
           1            1            1            1            1            1 

markwhite 4:45 PM
ok one sec I have an idea
4:47
try this
4:47

the_exclusion <- exclusion(ds)
exclusion(ds) <- NULL
ds <- refresh(ds)
ds$newid <- paste0(
  as.vector(ds$wave_new, mode = "id"), 
  as.vector(ds$country), 
  as.vector(ds$identity)
)
ds <- refresh(ds)
exclusion(ds) <- the_exclusion
ds <- refresh(ds)

James Martherus 4:50 PM
🥳 it worked!

beb 4:52 PM
It's because of the exclusion and recycling. Example:

vecA <- 1:5
vecB <- 1:2 (excludes 3:5)

paste(1:5, 1:2) would repeat the 1:2
4:53
(in crunch)

@gergness
Copy link
Contributor

Hi, sorry this took so long to get to. I think it may have been fixed in the backend, as i can't reproduce. Is there something I'm missing, or do you agree it's behaving as expected now?

library(crunch)
login()
#> Logged into crunch.io as [email protected]
set.seed(2021-11-10)

ds <- newDataset(
    data.frame(
        wave_cat = factor(sample(c("wave 1", "wave 2", "wave 3"), 100, replace = TRUE), c("wave 1", "wave 2", "wave 3")),
        country = factor(sample(c("USA", "CAN", "GBR"), 100, replace = TRUE), c("USA", "CAN", "GBR")),
        identity = 1:100,
        exclusion_basis = runif(100) 
    ),
    "exclusion - issue 575"
)

values(categories(ds$wave_cat)) <- NA
dates(categories(ds$wave_cat)) <- c("2021-01", "2021-02", "2021-03", NA)
exclusion(ds) <- ds$exclusion_basis < 0.2


# Make a variable as described, with an exlusion filter set
ds$newid <- paste0(as.vector(ds$wave_cat, mode="id"), as.vector(ds$country), as.vector(ds$identity))


# Each value is unique
nrow(ds) == length(unique(as.vector(ds$newid)))
#> [1] TRUE

# Identical to what we expect
tmp <- paste0(as.vector(ds$wave_cat, mode="id"), as.vector(ds$country), as.vector(ds$identity))
identical(as.vector(ds$newid), tmp)
#> [1] TRUE

# And unsetting the exclusion filter makes reveals that the new variable is missing 
# when it used to be excluded
exclusion(ds) <- NULL
crtabs(~(ds$exclusion_basis < 0.2) + is.na(ds$newid), ds)
#>                      is.na(newid)
#> exclusion_basis < 0.2 TRUE FALSE
#>                 TRUE    19     0
#>                 FALSE    0    81

with_consent(delete(ds))
#> NULL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants