diff --git a/docs/slides_with_notes.Rmd b/docs/slides_with_notes.Rmd index 651afb6..5806d3b 100644 --- a/docs/slides_with_notes.Rmd +++ b/docs/slides_with_notes.Rmd @@ -696,68 +696,54 @@ instead of all 200,000 of them.) > Instructor notes: This is another good opportunity to demonstrate > use of `htop` to monitor CPU usage. -41. Set up R for multithreading -=============================== +41. Split computation +===================== -Set up R to use all 8 CPUs you requested, and distribute the -computation (columns of the matrix) across the 8 "threads": +First, split the columns of the data frame into smaller subsets: ```{r init-cluster} library("parallel") -cl <- makeCluster(8) -cols <- clusterSplit(cl,1:10000) -``` - -Next, tell R which functions we will need to use: - -```{r cluster-register-functions} -clusterExport(cl,c("get.assoc.pvalue", - "get.assoc.pvalues")) +cols <- splitIndices(10000,8) ``` -42. Compute the *p*-values inside "parLapply" -============================================== +42. Compute the *p*-values inside "mclapply" +============================================ Now we are ready to run the multithreaded computation of association -*p*-values using "parLapply": +*p*-values using "mclapply". Let's try first with 2 CPUs: -```{r run-parlapply} -f <- function (i, geno, pheno) +```{r run-mclapply} +f <- function (i) get.assoc.pvalues(geno[,i],pheno) t0 <- proc.time() -out <- parLapply(cl,cols,f,geno,pheno) +out <- mclapply(cols,f,mc.cores = 2) t1 <- proc.time() print(t1 - t0) ``` +43. Combine mclapply outputs +============================ + Not done yet---you need to combine the individual outputs into a single vector of *p*-values. -```{r process-parlapply-output} -pvalues <- rep(0,10000) -pvalues[unlist(cols)] <- unlist(out) +```{r process-mclapply-output} +pvalues2 <- rep(0,10000) +pvalues2[unlist(cols)] <- unlist(out) ``` Check that the result is the same as before: -```{r check-parlapply-output} -min(pvalues) +```{r check-mclapply-output} +range(pvalues - pvalues2) ``` -*Did parLapply speed up the p-value computation?* +*Did mclapply speed up the p-value computation? Do you get further +speedups with 4 or 8 (or even 160) CPUs?* > Instructor notes: This is another good opportunity to demonstrate > use of `htop` to monitor CPU usage. -43. Halt the multithreaded computation -====================================== - -When you are done using parLapply, run "stopCluster": - -```{r stop-cluster} -stopCluster(cl) -``` - 44. Outline of workshop ======================= diff --git a/map_temp_assoc.R b/map_temp_assoc.R index 9b39bed..911f419 100644 --- a/map_temp_assoc.R +++ b/map_temp_assoc.R @@ -54,5 +54,5 @@ print(t1 - t0) # SUMMARIZE ASSOCIATION RESULTS # ----------------------------- -cat(sprintf("The smallest association p-value is %0.1e.\n",min(pvalues))) +cat(sprintf("The smallest association p-value is %0.3e.\n",min(pvalues))) diff --git a/slides.Rmd b/slides.Rmd index 91f5215..76d44ab 100644 --- a/slides.Rmd +++ b/slides.Rmd @@ -643,65 +643,51 @@ It applies `get.assoc.pvalue` to each column of the `geno` data frame. instead of all 200,000 of them.) -41. Set up R for multithreading -=============================== +41. Split computation +===================== -Set up R to use all 8 CPUs you requested, and distribute the -computation (columns of the matrix) across the 8 "threads": +First, split the columns of the data frame into smaller subsets: ```{r init-cluster} library("parallel") -cl <- makeCluster(8) -cols <- clusterSplit(cl,1:10000) -``` - -Next, tell R which functions we will need to use: - -```{r cluster-register-functions} -clusterExport(cl,c("get.assoc.pvalue", - "get.assoc.pvalues")) +cols <- splitIndices(10000,8) ``` -42. Compute the *p*-values inside "parLapply" -============================================== +42. Compute the *p*-values inside "mclapply" +============================================ Now we are ready to run the multithreaded computation of association -*p*-values using "parLapply": +*p*-values using "mclapply". Let's try first with 2 CPUs: -```{r run-parlapply} -f <- function (i, geno, pheno) +```{r run-mclapply} +f <- function (i) get.assoc.pvalues(geno[,i],pheno) t0 <- proc.time() -out <- parLapply(cl,cols,f,geno,pheno) +out <- mclapply(cols,f,mc.cores = 2) t1 <- proc.time() print(t1 - t0) ``` +43. Combine mclapply outputs +============================ + Not done yet---you need to combine the individual outputs into a single vector of *p*-values. -```{r process-parlapply-output} -pvalues <- rep(0,10000) -pvalues[unlist(cols)] <- unlist(out) +```{r process-mclapply-output} +pvalues2 <- rep(0,10000) +pvalues2[unlist(cols)] <- unlist(out) ``` Check that the result is the same as before: -```{r check-parlapply-output} -min(pvalues) +```{r check-mclapply-output} +range(pvalues - pvalues2) ``` -*Did parLapply speed up the p-value computation?* - - -43. Halt the multithreaded computation -====================================== - -When you are done using parLapply, run "stopCluster": +*Did mclapply speed up the p-value computation? Do you get further +speedups with 4 or 8 (or even 160) CPUs?* -```{r stop-cluster} -stopCluster(cl) -``` 44. Outline of workshop =======================