RECOMMENDED FORTRAN COMPILER OPTIONS

FPE HOME | Directives | Double Options | Format | Functions | Operators | Options | Recommend | Source | Statements

Compiler: Sun f77 version 5.0 and f90 version 2.0

Recommended flags for production:
-fast
is shorthand for -native (select current hardware target), -O4 (the default optimization level unless -g is specified), -libmil (inline certain math library routines), -fsimple=1 (optimize floating-point operations), -dalign (align data for faster memory operations), -xlibmopt (use the optimized math libraries), -depend (to help optimize DO loops), -fns (for possibly faster underflow handling), and -ftrap=%none (to disable floating-point traps).
-xarch=v8plusa
indicates details of the sunbert hardware and is necessary if the code uses MPI. (Otherwise you will get the following error: "tmrun: texecve /var/tmp/TEST/xtanh: Exec format error.") Note that this option should follow any -fast or -native option.
-fnonstd
allows non IEEE handling of floating point exceptions, which can be done in hardware rather than in software. In particular, divide by zero is an error with -fnonstd.
Other options to consider:
-O4
includes routine inlining of routines within the same source file. If many small routines are compiled together, this optimization level is worth trying. This is the default with -fast.
-O5
adds additional optimizations that may or may not help any given code. Be sure to test execution speed for improvements if you use this option.
-xtypemap=real:64,double:64,integer:mixed
to increase the default real size to 8 bytes while leaving integers at 4 bytes (but allocated 8 bytes for integers, so they are not packed in memory). This option is only available in f77, not in f90.
-stackvar
is needed for recursion or threaded codes. Local variables are allocated on the stack with this option--which is not the default.
-xildoff
prevents the compiler from using the incremental linker. The incremental linker is the default if you use -g and not -G and link in a seperate step from compilation. The incremental linker can speed up the link, but at a considerable increase in binary file size.
Recommended flags for debugging:
-g
casues debugger symbol tables to be generated and lowers the default optimization level (if -On or -fast is not specified) so that you can step a line at a time within the debugger. Loop inlining and parallelization are always eliminated if -g is specified.
-g -O3
combination (in the order shown) allows some symbolic debugging (although only very limited breakpointing) while still allowing the code to optimize fairly well. High level optimizations (such as routine inlining) are ruled out.

Compilers: DEC f77 and f90 version 5.1

Recommended flags for production:
-fast
is shorthand for -assume noaccuracy_sensitive (the type of expression rearrangement that most compilers do by default), -align dcommons (align data on double word boundaries, a potentially big win), -math_library fast (use faster math libraries which may give slightly less accurate answers and less exception checking), -assume nozsize (assume no zero-sized array sections in Fortran 90 array syntax), -assume bigarrays (may help f90 with -wsf) and -O4 (the default optimization level anyway unless -g is specified). Note that the lesser accuracy implied by "-math_library fast" has been a problem for some codes and in those cases the default "-math_library accurate" is a better choice.
-fpe
allows non-IEEE handling of floating point exceptions, the same as -fnonstd in the Sun compilers. This is the default, but for clarity it is a good idea to specify it.
-tune host
option indicates that the code is to be tuned for the hardware on which the compiler is running. Note that as far as the compilers are concerned, the DEC 8400 and 4100 are the same since they use the same type of processor. To cross compile you can specify a specific architecture, for example -tune ev6.
Other options to consider:
-O5
adds additional optimizations that may or may not help any given code. If you use this option, be sure to test the code for improved execution speed and for correctness. Some of the optimizations performed at this level are very aggressive and can cause the code to execute incorrectly.
-om -non_shared
causes statically linked code to be optimized further after linking.
-r8
is recommended to increase the default real size to 8 bytes. This option does not change the default size for any other types (such as DOUBLE PRECISION).
-recursive
is needed for recursion. Local variables are allocated on the stack with this option--which is not the default. -automatic may also be used to allocate variables on the stack.
-omp
is used to parallelize the code as specified by OpenMP directives. In most cases -recursive or -automatic should be also specified when this option is used.
Recommended flags for debugging:
-g
casues debugger symbol tables to be generated and lowers the optimization level so that you can step a line at a time within the debugger.
-g3
allows some symbolic debugging (although only very limited breakpointing) while still allowing the code to optimize fairly well.
Notes:
There is an advantage to compiling several (or all) source file at the same time, since that allows the compiler to perform interprocedural optimizations. All Cray style pointers are 64-bit items with this compiler. The value used for .true. is -1 rather than the 1 that is used by many compilers.

Compilers: IBM xlf, xlf90, xlhpf, mpxlf, and related compilers, version 5.1

Flags requiring an early decision:
-qautodbl=dbl4 or -qrealsize=8
tell the compiler to make default REAL's be 8 bytes (64 bits). Which option you use depends on how things are declared in your code. A table showing the effect of the various combinations is available. If you are moving a code from the Cray machines and wish for REAL to continue to be 8 bytes and DOUBLE PRECISION to be 16 bytes, then -qrealsize=8 is the best choice. This option leaves variables declared with REAL*4, REAL*8, REAL*16, COMPLEX*8, etc. at the size that you would expect. This option has the same effect as either the "-r8 -i4" combination or the "-dbl -i4" combination to the Sun compilers. Note that 16 byte real arithmetic is very slow on the current machines. If you are moving a code from a machine (such as the Meiko) with default 4 byte REAL's then -qautodbl=dbl4 may be a better choice. This option makes both default REAL and DOUBLE PRECISION be 8 byte quantities. Beware, however, that this option also makes anything declared REAL*4 be 8 bytes and COMPLEX*8 uses 16 bytes--this may or may not be what is desired. If a code contains both REAL*4 declarations which are to stay 4 bytes or COMPLEX*8 declarations which are to stay 16 bytes and DOUBLE PRECISION declarations which are to stay 8 bytes, then some code modification may be necessary in order to get to default REAL to be 8 bytes.
-qsave or -qnosave
tell the compiler that unsaved local variables are to be put either in static storage (-qsave) or on the run-time stack (-qnosave). The current default is -qsave for xlf, f77, mpxlf, and xlhpf, but the default is -qnosave for xlf90 and xlfhpf90. Furthermore, the defaults may change in the future, so it is wise to consider the issues and then explicitly state one of these options. The equivalent of the -qsave option is the default on the Meiko and DEC machines, and this option means that local variables retain their values between calls to a routine. The equivalent of -qnosave is the default on the Cray machines, and this option can lead to a smaller code and help open the way for recursive calls. The -qnosave option is also generally needed if the code is multi-threaded.
Recommended flags for debugging:
-g
casues debugger symbol tables to be generated. Any use of -O with the -g option seems to make the symbol tables more confusing than useful by totalview or other debuggers.
-C
causes bounds checking to be activated.
-qextchk
causes the compiler to check for procedure interface and common block mismatches.
-qflttrap
causes the compiler to insert code to detect and trap floating-point exceptions. (Be sure to specify enable and the particular types of exceptions to trap.) You may wish to use the -qsigtrap option to control how the exception is handled.
Optimization level options to consider:
-qarch=auto
tells the compiler to compile for the kind of machine that the compile is being done on. If the code is to run on processors that are different from those on the current node then specific specification of the architecture may be necessary. The specific specification for the TR machine is -qtune=ppcgr. The (future) power3 machines can be specified with -qarch=pwr3. Note that use of the wrong specific specification of architecture may cause load or compile-time errors-- usually "illegal instructions".
The default is to compile for just any POWER or PowerPC processor and ignore the specific unique features available on each specific type.
Note that with the xlc, xlC, and KCC compilers the best options to use are -qarch=ppcgr -qtune=604 as most of the other options are not available.
-O2
performs many basic optimizations. This is a definate step above the default optimization level.
-O3 -qstrict
performs additional optimizations, but with the restriction that the numerical results should not change at all.
-O3
performs fairly agressive optimizations, possibly introducing low order differences in results.
-O3 -qhot
performs aggressive loop transformations.
-Q
has the compiler inline short subroutines that are called.
-qipa
performs interprocedural analysis to aid with other optimizations.
-qsmp
causes multi-threaded (SMP parallel) code to be generated by both automatic and directive based parallelization. Available with xlf_r and xlf90_r.
Other assorted options to consider:
-qtune=604
is implied by -qarch=604.
-qintlog
allows integers to be used as logicals and logicals to be used as integers--a common practice in some codes.
-qddim
allows dynamic dimensioning of pointered arrays where the Livermore/Cray/Integer style of pointers are used. This is needed if the variables used to declare array dimensions might change after the routine has started execution.
-qextname
attaches an underscore character to the end of all Fortran external names (as is done by many other compilers, including the Sun compilers.) This may eliminate the possibility of inadvertently calling a C routine from a Fortran routine (with the confused methods of argument passing that that implies), but it puts in place a method of naming externals that is non-standard on the system and will probably lead to even greater confusion.
-qnullterm
attacks one part of the problem of passing character strings from Fortran routines to C routines. In particular, this option places a NULL character after any character constant that is passed as an argument. While this may help for character constants passed as arguments, it does nothing for character variables.
-qrecur
lets all routines be called recursively.
-bnso -bI:/lib/syscalls.exp
asks the linker to load all routines statically (rather than the default dynamic linking).
-bmaxdata:nnn
is needed to allow a data area greater than 256 mega-bytes. nnn may be a C style decimal, octal, or hex constant up 2GB.
-bmaxstack:nnn
is needed to allow a larger area for the program stack. nnn may be a C style decimal, octal, or hex constant up 256MB.
-bmap:filename
writes a load map to the indicated filename.
Default options (which can be overridden):
-qfullpath
keeps the full pathname of the source files in the executable file so that the sources can be found if needed for debugging. Making this the default is a local modification.
-bnoobjreorder
is a loader option that blocks the loader from reordering the objects within the executable file (which the loader normally does in order to optimize paging behavior). This option must be given or Totalview will not be able to find objects within the program. Making this the default is a local modification.
-qsave
static storage for local variables (default with xlf, f77, mpxlf, and xlhpf).
-qnosave
stack storage for local variables (default with xlf90, f90, and xlhpf90).
-qfree
free form input (default with xlf90, f90, and xlhpf90).
-qfixed
fixed form input (default with xlf, f77, mpxlf, and xlhpf).
-qalias=intptr
indicates program may contain integer (or Cray) pointers (default with xlf, f77, and mpxlf).
-qnozerosize
assumes character strings and arrays cannot be zero sized (default with xlf, f77, and mpxlf)
-qposition=appendold
positions the file pointer at the end of the file when data is written after an OPEN statement with no POSITION= but with STATUS="OLD" (default with xlf, f77, mpxlf, and xlhpf).
-qflag=1:1
specifies severity level of diagnostic messages to be reported (default for xlhpf and xlhpf90).
-qxlf77=intarg:intxor:persistent:noleadzero:gedit77:noblankpad:oldboz: softeof
use the semantics and I/O data formatting of the previous version the XL Fortran compiler. See the man pages for details (Default with xlf, f77, mpxlf, and xlhpf).