R
to find out the # of workers on your machine:parallel::detectCores()
Involves distributing a set of tasks to workers
1 task in Monte Carlo simulation:
10 conditions * 100 reps = 1000 tasks
Each worker performs:
# The parallel version (for your computer)
library(foreach)
library(doParallel)
n_cores <- detectCores() - 2 # Reserve a few workers for other tasks
cl <- makeCluster(n_cores)
registerDoParallel(cl)
n_tasks <- 100
results <-
foreach (task = 1:n_tasks, .combine = rbind) %dopar% {
set.seed(task)
tmp <- runif(1, 1, 2)
Sys.sleep(tmp)
c(task, tmp)
}
Organize your simulation like this:
conditions <- expand.grid(
IV1 = c(100, 300, 500),
IV2 = c(1, 2, 3)
)
task_list <- expand.grid(
idx_condition = 1:dim(conditions)[1],
idx_trial = 1:100
)
n_tasks <- dim(task_list)[1]
tasks <- 1:n_tasks
tasks <- sample(tasks)
foreach (task = 1:n_tasks, .combine = rbind) %dopar% {
idx_condition <- task_list$idx_condition[task]
idx_replication <- task_list$idx_replication[task]
set.seed(idx_replication)
# do your thing here
}
The key is to have one script run everything!
needs ssh
.
Windows users need some setup for this.
Also install Windows Terminal from Windows store.
Use Windows Terminal to use Ubuntu.
needs ssh
.
Mac users can use Terminal from Launchpad.
Now we have the console ready.
Run this to get in:
ssh <<your tacc username>>@stampede2.tacc.utexas.edu
example:
ssh sangdonlim@stampede2.tacc.utexas.edu
Type in your TACC password and the 2FA code.
If your TACC account does not belong to a faculty project you won’t be able to login yet.
If you get error messages about locale after login, it is okay to ignore it.
But if you want to fix it:
exit
sudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
Run this:
module load Rstats/3.5.1
Note: you must use R 3.5.1!
R 4.0.3 is on TACC but it does not run in parallel yet (as of March 2022)
Run this:
R
and install.packages()
Installed packages stay on your TACC account.
To exit from R, run q()
Because R 3.5.1 is old..
install.packages()
may not work for some packages.
To install old versions manually:
library(devtools)
install_version("mirt", version = "1.31", repos = "http://cran.us.r-project.org")
or raise a support ticket on TACC to let them know we need R 4.0.3!
Open another console. Run this:
sftp <<your tacc username>>@stampede2.tacc.utexas.edu
example:
sftp sangdonlim@stampede2.tacc.utexas.edu
Use these Linux commands:
pwd ## show the current working directory on remote (Stampede2)
lpwd ## show the current working directory on local (your computer)
ls ## list all files in remote working directory
lls ## list all files in local working directory
cd mydir ## open a folder named mydir on remote
lcd mydir ## open a folder named mydir on local (your computer)
mkdir mydir ## create a folder named mydir on remote
lmkdir mydir ## create a folder named mydir on local
put * ## upload all files from remote wd to local wd
get * ## download all files from remote wd to local wd
~
is a shortcut to your home directory.
To change your remote (TACC) working directory to your remote home directory:
cd ~
If you are using Windows, your C drive is located on /mnt/c/
.
To change your local working directory to your local C drive:
lcd /mnt/c/
needs a SLURM script.
Create a file named run.sh
with
#!/bin/bash
#SBATCH -J sim # Job name
#SBATCH -o sim.o # Name of stdout output log file
#SBATCH -e sim.e # Name of stderr output log file
#SBATCH -N 4 # Total number of nodes to request
#SBATCH -n 32 # Total number of workers to request (distributed over nodes)
#SBATCH -p development # The type of queue to submit to
#SBATCH -t 0:10:00 # Time limit to request (hh:mm:ss)
#SBATCH -A YOUR_PROJECT_ID # Your project name
#SBATCH --mail-user=YOUR@EMAIL.EDU # TACC will send emails with status updates
#SBATCH --mail-type=all # Get all status updates
# load R module
module reset
module load Rstats/3.5.1
# call R code from RMPISNOW
ibrun RMPISNOW < main.R
This job will run main.R
.
Now submit the job:
sbatch run.sh
The job is now on the waiting line.
In the SLURM script we requested for:
This does not mean we will get 8 workers in each node!
The job will run for 10 minutes as specified in the SLURM script.
If your R code does not complete within the time limit:
To see the status of the job:
showq -u
To cancel the job:
scancel <JOB_ID>
example:
scancel 123456
Examine log files:
cat sim.o
cat sim.e
Filenames are specified in the SLURM script.
TACC has multiple storage areas
$HOME
: 10GB, auto backup, permanent$WORK
: 1TB, no backup, permanent$SCRATCH
: Unlimited (~30PB), no backup, deleted after 10 daysLinux commands to move to each folder:
cd $HOME
cd $WORK
cd $SCRATCH
R functions to retrieve file paths:
Sys.getenv("HOME")
Sys.getenv("WORK")
Sys.getenv("SCRATCH")
When you are done with testing on development
queue
normal
queuehttps://portal.tacc.utexas.edu/user-guides/stampede2#table5
Each queue has limits on what each job can request.
development
: 16 nodes, 2 hoursnormal
: 256 nodes, 48 hourslarge
: 2048 nodes, 48 hourslong
: 32 nodes, 120 hoursYour project has SU credits assigned to it.
Use one node for one hour = costs 0.8 SU
Other types of processors have their own queue type. See the link above.
If you request for 48 h but it completes in 1 h:
Reasons to not to specify 99999 nodes * 999 hours:
You will have to wait longer on the list.
TACC will “hold” the full amount of SU on your project until it knows how much it needs to charge.
tidyverse
..tidyverse
is a big package, and it may make your code take a lot more time to run on TACC.
revealjs
in R markdown