| VERSION 4.5 |
For a given number -np or -nt of processors/threads, this program systematically times mdrun with various numbers of PME-only nodes and determines which setting is fastest. It will also test whether performance can be enhanced by shifting load from the reciprocal to the real space part of the Ewald sum. Simply pass your .tpr file to g_tune_pme together with other options for mdrun as needed.
Which executables are used can be set in the environment variables MPIRUN and MDRUN. If these are not present, 'mpirun' and 'mdrun' will be used as defaults. Note that for certain MPI frameworks you need to provide a machine- or hostfile. This can also be passed via the MPIRUN variable, e.g. 'export MPIRUN="/usr/local/mpirun -machinefile hosts"'
Please call g_tune_pme with the normal options you would pass to mdrun and add -np for the number of processors to perform the tests on, or -nt for the number of threads. You can also add -r to repeat each test several times to get better statistics.
g_tune_pme can test various real space / reciprocal space workloads for you. With -ntpr you control how many extra .tpr files will be written with enlarged cutoffs and smaller fourier grids respectively. Typically, the first test (no. 0) will be with the settings from the input .tpr file; the last test (no. ntpr) will have cutoffs multiplied by (and at the same time fourier grid dimensions divided by) the scaling factor -fac (default 1.2). The remaining .tpr files will have equally spaced values inbetween these extremes. Note that you can set -ntpr to 1 if you just want to find the optimal number of PME-only nodes; in that case your input .tpr file will remain unchanged.
For the benchmark runs, the default of 1000 time steps should suffice for most MD systems. The dynamic load balancing needs about 100 time steps to adapt to local load imbalances, therefore the time step counters are by default reset after 100 steps. For large systems (>1M atoms) you may have to set -resetstep to a higher value. From the 'DD' load imbalance entries in the md.log output file you can tell after how many steps the load is sufficiently balanced.
Example call: g_tune_pme -np 64 -s protein.tpr -launch
After calling mdrun several times, detailed performance information is available in the output file perf.out. Note that during the benchmarks a couple of temporary files are written (options -b*), these will be automatically deleted after each test.
If you want the simulation to be started automatically with the optimized parameters, use the command line option -launch.
option | filename | type | description |
---|---|---|---|
-p | perf.out | Output | Generic output file |
-err | errors.log | Output | Log file |
-so | tuned.tpr | Output | Run input file: tpr tpb tpa |
-s | topol.tpr | Input | Run input file: tpr tpb tpa |
-o | traj.trr | Output | Full precision trajectory: trr trj cpt |
-x | traj.xtc | Output, Opt. | Compressed trajectory (portable xdr format) |
-cpi | state.cpt | Input, Opt. | Checkpoint file |
-cpo | state.cpt | Output, Opt. | Checkpoint file |
-c | confout.gro | Output | Structure file: gro g96 pdb etc. |
-e | ener.edr | Output | Energy file |
-g | md.log | Output | Log file |
-dhdl | dhdl.xvg | Output, Opt. | xvgr/xmgr file |
-field | field.xvg | Output, Opt. | xvgr/xmgr file |
-table | table.xvg | Input, Opt. | xvgr/xmgr file |
-tablep | tablep.xvg | Input, Opt. | xvgr/xmgr file |
-tableb | table.xvg | Input, Opt. | xvgr/xmgr file |
-rerun | rerun.xtc | Input, Opt. | Trajectory: xtc trr trj gro g96 pdb cpt |
-tpi | tpi.xvg | Output, Opt. | xvgr/xmgr file |
-tpid | tpidist.xvg | Output, Opt. | xvgr/xmgr file |
-ei | sam.edi | Input, Opt. | ED sampling input |
-eo | sam.edo | Output, Opt. | ED sampling output |
-j | wham.gct | Input, Opt. | General coupling stuff |
-jo | bam.gct | Output, Opt. | General coupling stuff |
-ffout | gct.xvg | Output, Opt. | xvgr/xmgr file |
-devout | deviatie.xvg | Output, Opt. | xvgr/xmgr file |
-runav | runaver.xvg | Output, Opt. | xvgr/xmgr file |
-px | pullx.xvg | Output, Opt. | xvgr/xmgr file |
-pf | pullf.xvg | Output, Opt. | xvgr/xmgr file |
-mtx | nm.mtx | Output, Opt. | Hessian matrix |
-dn | dipole.ndx | Output, Opt. | Index file |
-bo | bench.trr | Output | Full precision trajectory: trr trj cpt |
-bx | bench.xtc | Output | Compressed trajectory (portable xdr format) |
-bcpo | bench.cpt | Output | Checkpoint file |
-bc | bench.gro | Output | Structure file: gro g96 pdb etc. |
-be | bench.edr | Output | Energy file |
-bg | bench.log | Output | Log file |
-beo | bench.edo | Output, Opt. | ED sampling output |
-bdhdl | benchdhdl.xvg | Output, Opt. | xvgr/xmgr file |
-bfield | benchfld.xvg | Output, Opt. | xvgr/xmgr file |
-btpi | benchtpi.xvg | Output, Opt. | xvgr/xmgr file |
-btpid | benchtpid.xvg | Output, Opt. | xvgr/xmgr file |
-bjo | bench.gct | Output, Opt. | General coupling stuff |
-bffout | benchgct.xvg | Output, Opt. | xvgr/xmgr file |
-bdevout | benchdev.xvg | Output, Opt. | xvgr/xmgr file |
-brunav | benchrnav.xvg | Output, Opt. | xvgr/xmgr file |
-bpx | benchpx.xvg | Output, Opt. | xvgr/xmgr file |
-bpf | benchpf.xvg | Output, Opt. | xvgr/xmgr file |
-bmtx | benchn.mtx | Output, Opt. | Hessian matrix |
-bdn | bench.ndx | Output, Opt. | Index file |
option | type | default | description |
---|---|---|---|
-[no]h | gmx_bool | no | Print help info and quit |
-[no]version | gmx_bool | no | Print version info and quit |
-nice | int | 0 | Set the nicelevel |
-xvg | enum | xmgrace | xvg plot formatting: xmgrace, xmgr or none |
-np | int | 1 | Number of nodes to run the tests on (must be > 2 for separate PME nodes) |
-npstring | enum | -np | Specify the number of processors to $MPIRUN using this string: -np, -n or none |
-nt | int | 1 | Number of threads to run the tests on (turns MPI & mpirun off) |
-r | int | 2 | Repeat each test this often |
-max | real | 0.5 | Max fraction of PME nodes to test with |
-min | real | 0.25 | Min fraction of PME nodes to test with |
-npme | enum | auto | Benchmark all possible values for -npme or just the subset that is expected to perform well: auto, all or subset |
-upfac | real | 1.2 | Upper limit for rcoulomb scaling factor (Note that rcoulomb upscaling results in fourier grid downscaling) |
-downfac | real | 1 | Lower limit for rcoulomb scaling factor |
-ntpr | int | 0 | Number of tpr files to benchmark. Create these many files with scaling factors ranging from 1.0 to fac. If < 1, automatically choose the number of tpr files to test |
-four | real | 0 | Use this fourierspacing value instead of the grid found in the tpr input file. (Spacing applies to a scaling factor of 1.0 if multiple tpr files are written) |
-steps | step | 1000 | Take timings for these many steps in the benchmark runs |
-resetstep | int | 100 | Let dlb equilibrate these many steps before timings are taken (reset cycle counters after these many steps) |
-simsteps | step | -1 | If non-negative, perform these many steps in the real run (overwrite nsteps from tpr, add cpt steps) |
-[no]launch | gmx_bool | no | Lauch the real simulation after optimization |
-deffnm | string | Set the default filename for all file options at launch time | |
-ddorder | enum | interleave | DD node order: interleave, pp_pme or cartesian |
-[no]ddcheck | gmx_bool | yes | Check for all bonded interactions with DD |
-rdd | real | 0 | The maximum distance for bonded interactions with DD (nm), 0 is determine from initial coordinates |
-rcon | real | 0 | Maximum distance for P-LINCS (nm), 0 is estimate |
-dlb | enum | auto | Dynamic load balancing (with DD): auto, no or yes |
-dds | real | 0.8 | Minimum allowed dlb scaling of the DD cell size |
-gcom | int | -1 | Global communication frequency |
-[no]v | gmx_bool | no | Be loud and noisy |
-[no]compact | gmx_bool | yes | Write a compact log file |
-[no]seppot | gmx_bool | no | Write separate V and dVdl terms for each interaction type and node to the log file(s) |
-pforce | real | -1 | Print all forces larger than this (kJ/mol nm) |
-[no]reprod | gmx_bool | no | Try to avoid optimizations that affect binary reproducibility |
-cpt | real | 15 | Checkpoint interval (minutes) |
-[no]cpnum | gmx_bool | no | Keep and number checkpoint files |
-[no]append | gmx_bool | yes | Append to previous output files when continuing from checkpoint instead of adding the simulation part number to all file names (for launch only) |
-maxh | real | -1 | Terminate after 0.99 times this time (hours) |
-multi | int | 0 | Do multiple simulations in parallel |
-replex | int | 0 | Attempt replica exchange every # steps |
-reseed | int | -1 | Seed for replica exchange, -1 is generate a seed |
-[no]ionize | gmx_bool | no | Do a simulation including the effect of an X-Ray bombardment on your system |