Usage

Installation

To use Cholla, first clone the repository

git clone https://github.com/cholla-hydro/cholla.git

The existing Cholla wiki (https://github.com/cholla-hydro/cholla/wiki) can be used to help set up Cholla.

How to build Cholla on Lux

Log into Cholla

Use ssh to log into lux.ucsc.edu:

ssh [username]@lux.ucsc.edu

Updates

After cloning Cholla, we will need to update a few things (as of 6/20/2024).

In cholla/builds/make.type.hydro, we need to make sure that the spatial reconstruction method is PPMP (not PLMC, which fails on Lux with this version of the code). This is the third flag down.

We also need to add some setup and host files. These can all be found in the repo for this website.

Add setup.lux.sh to the cholla/ directory:

#!/bin/bash

###module load hdf5/1.10.6 cuda10.2/10.2 openmpi/4.0.1
module load hdf5/1.10.6 cuda11.2 openmpi/4.1.5 devtoolset-9

export MACHINE=lux
export CHOLLA_ENVSET=1

Add make_cholla.sh to the cholla/ directory:

make TYPE=cosmology HOST=lux -j 20
#make TYPE=cosmology+rt HOST=lux -j 20
#make TYPE=rt HOST=lux -j 20

Add make.host.lux to the cholla/builds/ directory:

#-- make.inc for the Lux Server

#-- Compiler and flags for different build type
CXX               = mpicxx
CXXFLAGS_DEBUG    = -g -O0 -std=c++17
CXXFLAGS_OPTIMIZE = -Ofast -std=c++17
GPUFLAGS         = -std=c++17

OMP_NUM_THREADS = 10

#-- Library
CUDA_ROOT    = /cm/shared/apps/cuda11.2/toolkit/current
HDF5_ROOT    = /cm/shared/apps/hdf5/1.10.6
FFTW_ROOT    = /home/brvillas/code/fftw-3.3.8
PFFT_ROOT    = /data/groups/comp-astro/bruno/code_mpi_local/pfft
GRACKLE_ROOT = /home/brvillas/code/grackle

#Paris does not do GPU_MPI transfers
PARIS_MPI_GPU = -DPARIS_NO_GPU_MPI

In the cholla/ directory, run

source setup.lux.sh

sh make_cholla.sh

You should now have cholla.cosmology.lux in cholla/bin/.

An example run can be found in Running a test calculation on Lux

How to build Cholla on Frontier

Log into Frontier

Use ssh to log into Frontier :

$ ssh [username]@frontier.olcf.ornl.gov

which will place you in your user home directory /css/home/[username] using a random login node.

Clone Cholla

A helpful organization is to create a github directory in your user home directory to keep track of all repositories cloned.

After cloning Cholla, you have the opportunity to list all remote branches with git branch :

[cholla]$ git branch -r

As well as the ability to switch into any of these branches with git switch. To switch to the dev branch, you can run

[cholla]$ git switch dev

Note: currently non-standard-cosmologies has not been merged with dev yet, so this is a seperate branch itself.

Building Cholla

To build Cholla on Frontier, we want to load helpful modules and export some required variables as well. This is completed by running the setup.frontier.cce.sh script that is in the cholla/builds directory:

[cholla]$ source builds/setup.frontier.cce.sh

Apart from loading helpful modules, this bash script will set MPICH_GPU_SUPPORT_ENABLED to True, as well as prepend the Cray link editor library path to the default link editor library path. Lastly, this script will also set the read-write user-level cache location of AMD’s rocFFT RunTimeCompiler to /dev/null.

Next we compile the program with make using the TYPE flag of cosmology and the HOST flag of frontier:

[cholla]$ make HOST=frontier TYPE=cosmology -j 20

where the -j flag specifies to use 20 cores for compilation. From this command, there will be a binary executable in the bin directory. The HOST flag will tell the Makefile to include builds/make.host.frontier and the TYPE flag tells the Makefile to include builds/make.type.cosmology when compiling the program. The builds/make.host.frontier file will define the compiler as well as helpful compiler flags. It will also provide information regarding the root path to MPI, FFTW, and GOOGLETEST. The builds/make.type.cosmology file will provide additional macro flags that will specify to the source code to build the program for cosmology.

Cosmology Flags

TODO: PROVIDE DESCRIPTION FOR ALL MACRO FLAGS INCLUDED IN MAKE.TYPE.COSMOLOGY

Running Cholla

The project directory for the cosmological simulations study is saved at /lustre/orion/ast206/. The simulation runs are computed and saved within the project directory in /lustre/orion/ast206/proj-shared/runs. Within this directory, subdirectories should be created with the following naming scheme [dims3]_[boxsize]_[uvb-rate]_[descr1]_[descr2]. With this scheme, dims3 is the number of cells in one dimension, boxsize is the physical size of one dimension, and uvb-rate details the UV-background rate, and descrX is just any extra descriptors of the specific simulation run.

For example, the subdirectory 2048_50Mpc_v22_dmo is a simulation of 2048^3 total cells in which one side is 50 Mpc with a v22 UV-background rate ran on only dark matter.

After creating a directory to hold information for a specific simulation run, we have to prepare some input files in this directory before running a batch script using Slurm.

ics: this is a symbolic link to the initial conditions for the simulation (a set of different initial conditions are currently being held in /lustre/orion/ast206/proj-shared-ics/)
param.txt: this is a parameter text file that holds the input information required for Cholla to run a simulation box
data: this is a directory to hold the output snapshots
scale_outputs.txt: this is a text file that holds the scale factor at which to save snapshots
uvb_rates_V22.txt: this is an hdf5 file that contains details for the UV-background rate

With these details, we can finally detail the batch script with this template slurm file:

#!/bin/bash -l
#SBATCH -J CS_2048_50Mpc
#SBATCH -N 64
#SBATCH -t 2:00:00
#SBATCH -A AST206
#SBATCH -o CS_2048_50Mpc.o%j


#-- set CHOLLA location
CHOLLA_HOME=/ccs/home/[username]/github/cholla

EXECUTABLE=${CHOLLA_HOME}/bin/cholla.cosmology.frontier
source ${CHOLLA_HOME}/builds/setup.frontier.cce.sh

export MPICH_ALLTOALL_SYNC_FREQ=2
export MPICH_OFI_CXI_COUNTER_REPORT=2
export OMP_NUM_THREADS=7

env &> job.environ
scontrol show hostnames > job.nodes
ldd $EXECUTABLE > job.exec.ldd

srun -u -N 64 -n 512 -c 7 --gpu-bind=closest --gpus-per-task=1 \
$EXECUTABLE param.txt |& tee STDOUT

#srun -u -N 64 -n 512 -c 7 --gpu-bind=closest --gpus-per-task=1 \
#$EXECUTABLE param.txt init=Read_Grid indir=./data/167/ outdir=./data/ nfile=167 |& tee STDOUT

The Slurm directive flags detail:

-J: the job name
-N: number of compute nodes requested
-t: walltime requested
-A: OLCF project to charge
-o: standard output file for the job (%j is placeholder for job number)

After setting the location to the Cholla executable and running the frontier setup file, the script exports some helpful macros. The script will set both MPICH_ALLTOALL_SYNC_FREQ (details here) and MPICH_OFI_CXI_COUNTER_REPORT (details here) to 2. It will also set OMP_NUM_THREADS (details here) to 7.

Next, the script will redirect the environment variables into a job.environ file. It will also place the SLURM_JOB_NODELIST environment variable, listing the name of all host names line-by-line, into a job.nodes file. The script will also print all shared object dependencies of the Cholla executable into a job.exec.ldd file.

Finally, the script will call srun on the Cholla executable and the parameter text file with the following flags

-u: executable is run with a pseudo terminal such that the output is not buffered
-N: number of nodes
-n: total number of MPI tasks
-c: CPU cores per MPI task
--gpu-bind=closest: binds each task to GPU on same NUMA domain as MPI rank’s CPU core
--gpus-per-task: number of GPUs to use on each task

The last part of the srun pipes the executable output and calls tee which will read from the standard input and write to standard output specified to a file called STDOUT.

The last two commented out lines in the script detail how to start a simulation run from a snapshot (here, snapshot 167).

[in development]