Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processLandsat: Check if output already exists when overwrite=FALSE #70

Open
GreatEmerald opened this issue Jun 30, 2017 · 3 comments
Open

Comments

@GreatEmerald
Copy link

At the moment when overwrite=FALSE is used, the processing is done all the way until the raster is about to be written, and then writing gets aborted because the destination file already exists. That is quite inefficient. The output filenames should be checked first, and processing done only for the output files that do not already exist.

@loicdtx
Copy link
Owner

loicdtx commented Jun 30, 2017

Good point! It would make sense to check for file existence before starting any processing.
In fact @bendv implemented a similar logic into mc.calc (see here). The thing is that mc.calc is used to run functions that literally run for days, so the extra logic and extra checks before starting any processing really make sense there. In the case of processLandsat it's more a "nice to have" since it usually runs in just a few seconds.

@GreatEmerald
Copy link
Author

Right, but in the case of the devel branch, it looks like processLandsatBatch doesn't use mc.calc, it just applies processLandsat with mclapply. I noticed this issue when running it in batch.

There is also another odd issue I'm experiencing in that running the batcher on a large number of tiles (330 in my case) where some of them have already been processed (the others haven't due to disk getting full), all 330 threads fail stating that the files already exist, even though only ~100 files truly exist. The filenames listed are all from years 1999-2001, even though I have images up to 2017, so it feels like the list somehow wraps around... If I use a pattern to filter the images by year, it works fine; if there are 12 images that year, half of them already processed, the other half does get processed correctly. That's very weird behaviour.

@loicdtx
Copy link
Owner

loicdtx commented Jun 30, 2017

mclapply splits the overall job in n processes; if an error occurs as part of one process, the entire process dies. If overwrite is set to FALSE and the file already exists, processLandsat returns an error. So if your 100 existing file were distributed to all your processes, it makes sense that all processes crashed and nothing got processed. Conclusion: processLandsatBatch needs an error catcher.

Maybe the example below better illustrates the problem.

library(parallel)

fun <- function(x) {
    if(x == 5){
        stop('Nope !!!')
    }
    print(x)
}

mclapply(seq(20), fun, mc.cores = 4)

# With error catcher
funSafe <- function(...){
    try(fun(...))
}

mclapply(seq(20), funSafe, mc.cores = 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants