How can I attach to a running Slurm job
To attach to a running Slurm job, use
srun --pty --overlap --jobid YOUR-JOBID bash. This will give you a shell on the first node of your job and you can run
nvidia-smi, etc. to check your job.
This is an alternative to SSH-ing into your node.
srun to attach to a job is the only way to see the correct GPU if you have multiple GPU jobs running on a single node as SSH will always get you into last modified cgroup which might not be the job / GPUs you are looking for.