Revised mclapply challenge.

rcc-uchicago · Nov 15, 2022 · 5c0ced2 · 5c0ced2
1 parent 65cdf7c
commit 5c0ced2
Show file tree

Hide file tree

Showing 3 changed files with 41 additions and 69 deletions.
diff --git a/docs/slides_with_notes.Rmd b/docs/slides_with_notes.Rmd
@@ -696,68 +696,54 @@ instead of all 200,000 of them.)
 > Instructor notes: This is another good opportunity to demonstrate
 > use of `htop` to monitor CPU usage.
 
-41. Set up R for multithreading
-===============================
+41. Split computation
+=====================
 
-Set up R to use all 8 CPUs you requested, and distribute the
-computation (columns of the matrix) across the 8 "threads":
+First, split the columns of the data frame into smaller subsets:
 
 ```{r init-cluster}
 library("parallel")
-cl   <- makeCluster(8)
-cols <- clusterSplit(cl,1:10000)
-```
-
-Next, tell R which functions we will need to use:
-
-```{r cluster-register-functions}
-clusterExport(cl,c("get.assoc.pvalue",
-                   "get.assoc.pvalues"))
+cols <- splitIndices(10000,8)
 ```
 
-42. Compute the  *p*-values inside "parLapply"
-==============================================
+42. Compute the *p*-values inside "mclapply"
+============================================
 
 Now we are ready to run the multithreaded computation of association
-*p*-values using "parLapply":
+*p*-values using "mclapply". Let's try first with 2 CPUs:
 
-```{r run-parlapply}
-f <- function (i, geno, pheno)
+```{r run-mclapply}
+f <- function (i)
   get.assoc.pvalues(geno[,i],pheno)
 t0  <- proc.time()
-out <- parLapply(cl,cols,f,geno,pheno)
+out <- mclapply(cols,f,mc.cores = 2)
 t1  <- proc.time()
 print(t1 - t0)
 ```
 
+43. Combine mclapply outputs
+============================
+
 Not done yet---you need to combine the individual outputs into a
 single vector of *p*-values.
 
-```{r process-parlapply-output}
-pvalues <- rep(0,10000)
-pvalues[unlist(cols)] <- unlist(out)
+```{r process-mclapply-output}
+pvalues2 <- rep(0,10000)
+pvalues2[unlist(cols)] <- unlist(out)
 ```
 
 Check that the result is the same as before:
 
-```{r check-parlapply-output}
-min(pvalues)
+```{r check-mclapply-output}
+range(pvalues - pvalues2)
 ```
 
-*Did parLapply speed up the p-value computation?*
+*Did mclapply speed up the p-value computation? Do you get further
+speedups with 4 or 8 (or even 160) CPUs?*
 
 > Instructor notes: This is another good opportunity to demonstrate
 > use of `htop` to monitor CPU usage.
 
-43. Halt the multithreaded computation
-======================================
-
-When you are done using parLapply, run "stopCluster":
-
-```{r stop-cluster}
-stopCluster(cl)
-```
-
 44. Outline of workshop
 =======================
 

diff --git a/map_temp_assoc.R b/map_temp_assoc.R
@@ -54,5 +54,5 @@ print(t1 - t0)
 
 # SUMMARIZE ASSOCIATION RESULTS
 # -----------------------------
-cat(sprintf("The smallest association p-value is %0.1e.\n",min(pvalues)))
+cat(sprintf("The smallest association p-value is %0.3e.\n",min(pvalues)))
 
diff --git a/slides.Rmd b/slides.Rmd
@@ -643,65 +643,51 @@ It applies `get.assoc.pvalue` to each column of the `geno` data frame.
 instead of all 200,000 of them.)
 
 
-41. Set up R for multithreading
-===============================
+41. Split computation
+=====================
 
-Set up R to use all 8 CPUs you requested, and distribute the
-computation (columns of the matrix) across the 8 "threads":
+First, split the columns of the data frame into smaller subsets:
 
 ```{r init-cluster}
 library("parallel")
-cl   <- makeCluster(8)
-cols <- clusterSplit(cl,1:10000)
-```
-
-Next, tell R which functions we will need to use:
-
-```{r cluster-register-functions}
-clusterExport(cl,c("get.assoc.pvalue",
-                   "get.assoc.pvalues"))
+cols <- splitIndices(10000,8)
 ```
 
-42. Compute the  *p*-values inside "parLapply"
-==============================================
+42. Compute the *p*-values inside "mclapply"
+============================================
 
 Now we are ready to run the multithreaded computation of association
-*p*-values using "parLapply":
+*p*-values using "mclapply". Let's try first with 2 CPUs:
 
-```{r run-parlapply}
-f <- function (i, geno, pheno)
+```{r run-mclapply}
+f <- function (i)
   get.assoc.pvalues(geno[,i],pheno)
 t0  <- proc.time()
-out <- parLapply(cl,cols,f,geno,pheno)
+out <- mclapply(cols,f,mc.cores = 2)
 t1  <- proc.time()
 print(t1 - t0)
 ```
 
+43. Combine mclapply outputs
+============================
+
 Not done yet---you need to combine the individual outputs into a
 single vector of *p*-values.
 
-```{r process-parlapply-output}
-pvalues <- rep(0,10000)
-pvalues[unlist(cols)] <- unlist(out)
+```{r process-mclapply-output}
+pvalues2 <- rep(0,10000)
+pvalues2[unlist(cols)] <- unlist(out)
 ```
 
 Check that the result is the same as before:
 
-```{r check-parlapply-output}
-min(pvalues)
+```{r check-mclapply-output}
+range(pvalues - pvalues2)
 ```
 
-*Did parLapply speed up the p-value computation?*
-
-
-43. Halt the multithreaded computation
-======================================
-
-When you are done using parLapply, run "stopCluster":
+*Did mclapply speed up the p-value computation? Do you get further
+speedups with 4 or 8 (or even 160) CPUs?*
 
-```{r stop-cluster}
-stopCluster(cl)
-```
 
 44. Outline of workshop
 =======================