Determine callable status of loci
A very common question about a NGS set of reads is what areas of the genome are considered callable. This tool considers the coverage at each locus and emits either a per base state or a summary interval BED file that partitions the genomic intervals into the following callable states:
A BAM file containing exactly one sample.
A file with the callable status covering each base and a table of callable status x count of all examined bases
gatk CallableLoci \
-I myreads.bam \
-R myreference.fasta \
-O callable_status.bed \
--summary table.txt
would produce a BED file that looks like:
20 10000000 10000864 PASS
20 10000865 10000985 POOR_MAPPING_QUALITY
20 10000986 10001138 PASS
20 10001139 10001254 POOR_MAPPING_QUALITY
20 10001255 10012255 PASS
20 10012256 10012259 POOR_MAPPING_QUALITY
20 10012260 10012263 PASS
20 10012264 10012328 POOR_MAPPING_QUALITY
20 10012329 10012550 PASS
20 10012551 10012551 LOW_COVERAGE
20 10012552 10012554 PASS
20 10012555 10012557 LOW_COVERAGE
20 10012558 10012558 PASS
as well as a summary table that looks like:
state nBases
REF_N 0
PASS 996046
NO_COVERAGE 121
LOW_COVERAGE 928
EXCESSIVE_COVERAGE 0
POOR_MAPPING_QUALITY 2906
@author Mark DePristo / Jonn Smith
@since May 7, 2010 / Nov 1, 2024
These Read Filters are automatically applied to the data by the Engine before processing by CallableLoci.
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
| Argument name(s) | Default value | Summary | |
|---|---|---|---|
| Required Arguments | |||
| --input -I |
BAM/SAM/CRAM file containing reads | ||
| --output -O |
Output file (BED or per-base format) | ||
| --reference -R |
Reference sequence file | ||
| --summary |
Name of file for output summary | ||
| Optional Tool Arguments | |||
| --arguments_file |
read one or more arguments files and add them to the command line | ||
| --cloud-index-prefetch-buffer -CIPB |
-1 | Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset. | |
| --cloud-prefetch-buffer -CPB |
40 | Size of the cloud-only prefetch buffer (in MB; 0 to disable). | |
| --disable-bam-index-caching -DBIC |
false | If true, don't cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified. | |
| --disable-sequence-dictionary-validation |
false | If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk! | |
| --gcs-max-retries -gcs-retries |
20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection | |
| --gcs-project-for-requester-pays |
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed. | ||
| --help -h |
false | display the help message | |
| --interval-merging-rule -imr |
ALL | Interval merging rule for abutting intervals | |
| --intervals -L |
One or more genomic intervals over which to operate | ||
| --max-depth |
Maximum read depth before a locus is considered poorly mapped | ||
| --max-depth-per-sample |
0 | Maximum number of reads to retain per sample per locus. Reads above this threshold will be downsampled. Set to 0 to disable. | |
| --max-fraction-of-reads-with-low-mapq -frlmq |
0.1 | If the fraction of reads at a base with low mapping quality exceeds this value, the site may be poorly mapped | |
| --max-low-mapq -mlmq |
1 | Maximum value for MAPQ to be considered a problematic mapped read | |
| --min-base-quality -mbq |
20 | Minimum quality of bases to count towards depth | |
| --min-mapping-quality -mmq |
10 | Minimum mapping quality of reads to count towards depth | |
| --sites-only-vcf-output |
false | If true, don't emit genotype fields when writing vcf file output. | |
| --version |
false | display the version number for this tool | |
| Optional Common Arguments | |||
| --add-output-sam-program-record |
true | If true, adds a PG tag to created SAM/BAM/CRAM files. | |
| --add-output-vcf-command-line |
true | If true, adds a command line header line to created VCF files. | |
| --create-output-bam-index -OBI |
true | If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. | |
| --create-output-bam-md5 -OBM |
false | If true, create a MD5 digest for any BAM/SAM/CRAM file created | |
| --create-output-variant-index -OVI |
true | If true, create a VCF index when writing a coordinate-sorted VCF file. | |
| --create-output-variant-md5 -OVM |
false | If true, create a a MD5 digest any VCF file created. | |
| --disable-read-filter -DF |
Read filters to be disabled before analysis | ||
| --disable-tool-default-read-filters |
false | Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on) | |
| --exclude-intervals -XL |
One or more genomic intervals to exclude from processing | ||
| --gatk-config-file |
A configuration file to use with the GATK. | ||
| --interval-exclusion-padding -ixp |
0 | Amount of padding (in bp) to add to each interval you are excluding. | |
| --interval-padding -ip |
0 | Amount of padding (in bp) to add to each interval you are including. | |
| --interval-set-rule -isr |
UNION | Set merging approach to use for combining interval inputs | |
| --inverted-read-filter -XRF |
Inverted (with flipped acceptance/failure conditions) read filters applied before analysis (after regular read filters). | ||
| --lenient -LE |
false | Lenient processing of VCF files | |
| --max-variants-per-shard |
0 | If non-zero, partitions VCF output into shards, each containing up to the given number of records. | |
| --QUIET |
false | Whether to suppress job-summary info on System.err. | |
| --read-filter -RF |
Read filters to be applied before analysis | ||
| --read-index |
Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically. | ||
| --read-validation-stringency -VS |
SILENT | Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. | |
| --seconds-between-progress-updates |
10.0 | Output traversal statistics every time this many seconds elapse | |
| --sequence-dictionary |
Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file. | ||
| --tmp-dir |
Temp directory to use. | ||
| --use-jdk-deflater -jdk-deflater |
false | Whether to use the JdkDeflater (as opposed to IntelDeflater) | |
| --use-jdk-inflater -jdk-inflater |
false | Whether to use the JdkInflater (as opposed to IntelInflater) | |
| --verbosity |
INFO | Control verbosity of logging. | |
| Advanced Arguments | |||
| --format |
BED | Output format | |
| --min-depth |
4 | Minimum QC+ read depth before a locus is considered callable | |
| --min-depth-for-low-mapq -mdflmq |
10 | Minimum read depth before a locus is considered a potential candidate for poorly mapped | |
| --showHidden |
false | display hidden arguments | |
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
If true, adds a PG tag to created SAM/BAM/CRAM files.
boolean true
If true, adds a command line header line to created VCF files.
boolean true
read one or more arguments files and add them to the command line
List[File] []
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset.
int -1 [ [ -∞ ∞ ] ]
Size of the cloud-only prefetch buffer (in MB; 0 to disable).
int 40 [ [ -∞ ∞ ] ]
If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
boolean true
If true, create a MD5 digest for any BAM/SAM/CRAM file created
boolean false
If true, create a VCF index when writing a coordinate-sorted VCF file.
boolean true
If true, create a a MD5 digest any VCF file created.
boolean false
If true, don't cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified.
boolean false
Read filters to be disabled before analysis
List[String] []
If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk!
boolean false
Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on)
boolean false
One or more genomic intervals to exclude from processing
Use this argument to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the
command line (e.g. -XL 1 or -XL 1:100-200) or by loading in a file containing a list of intervals
(e.g. -XL myFile.intervals). strings gathered from the command line -XL argument to be parsed into intervals to exclude
List[String] []
Output format
The output of this tool will be written in this format. The recommended option is BED.
The --format argument is an enumerated type (OutputFormat), which can have one of the following values:
OutputFormat BED
A configuration file to use with the GATK.
String null
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
int 20 [ [ -∞ ∞ ] ]
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.
String ""
display the help message
boolean false
BAM/SAM/CRAM file containing reads
R List[GATKPath] []
Amount of padding (in bp) to add to each interval you are excluding.
Use this to add padding to the intervals specified using -XL. For example, '-XL 1:100' with a
padding value of 20 would turn into '-XL 1:80-120'. This is typically used to add padding around targets when
analyzing exomes.
int 0 [ [ -∞ ∞ ] ]
Interval merging rule for abutting intervals
By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not
actually overlap) into a single continuous interval. However you can change this behavior if you want them to be
treated as separate intervals instead.
The --interval-merging-rule argument is an enumerated type (IntervalMergingRule), which can have one of the following values:
IntervalMergingRule ALL
Amount of padding (in bp) to add to each interval you are including.
Use this to add padding to the intervals specified using -L. For example, '-L 1:100' with a
padding value of 20 would turn into '-L 1:80-120'. This is typically used to add padding around targets when
analyzing exomes.
int 0 [ [ -∞ ∞ ] ]
Set merging approach to use for combining interval inputs
By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can
change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to
perform the analysis only on chromosome 1 exomes, you could specify -L exomes.intervals -L 1 --interval-set-rule
INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will
always be merged using UNION).
Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set.
The --interval-set-rule argument is an enumerated type (IntervalSetRule), which can have one of the following values:
IntervalSetRule UNION
One or more genomic intervals over which to operate
List[String] []
Inverted (with flipped acceptance/failure conditions) read filters applied before analysis (after regular read filters).
List[String] []
Lenient processing of VCF files
boolean false
Maximum read depth before a locus is considered poorly mapped
If the QC+ depth exceeds this value the site is considered to have EXCESSIVE_DEPTH
Integer null
Maximum number of reads to retain per sample per locus. Reads above this threshold will be downsampled. Set to 0 to disable.
int 0 [ [ -∞ ∞ ] ]
If the fraction of reads at a base with low mapping quality exceeds this value, the site may be poorly mapped
If the number of reads at this site is greater than minDepthForLowMAPQ and the fraction of reads with low mapping quality
exceeds this fraction then the site has POOR_MAPPING_QUALITY.
double 0.1 [ [ -∞ ∞ ] ]
Maximum value for MAPQ to be considered a problematic mapped read
The gap between this value and mmq are reads that are not sufficiently well mapped for calling but
aren't indicative of mapping problems. For example, if maxLowMAPQ = 1 and mmq = 20, then reads with
MAPQ == 0 are poorly mapped, MAPQ >= 20 are considered as contributing to calling, where
reads with MAPQ >= 1 and 20 are not bad in and of themselves but aren't sufficiently good to contribute to
calling. In effect this reads are invisible, driving the base to the NO_ or LOW_COVERAGE states
int 1 [ [ 0 255 ] ]
If non-zero, partitions VCF output into shards, each containing up to the given number of records.
int 0 [ [ 0 ∞ ] ]
Minimum quality of bases to count towards depth
Bases with less than minBaseQuality are viewed as not sufficiently high quality to contribute to the PASS state
int 20 [ [ 0 255 ] ]
Minimum QC+ read depth before a locus is considered callable
If the number of QC+ bases (on reads with MAPQ > minMappingQuality and with base quality > minBaseQuality) exceeds this
value and is less than maxDepth the site is considered PASS.
int 4 [ [ 0 ∞ ] ]
Minimum read depth before a locus is considered a potential candidate for poorly mapped
We don't want to consider a site as POOR_MAPPING_QUALITY just because it has two reads, and one is MAPQ. We
won't assign a site to the POOR_MAPPING_QUALITY state unless there are at least minDepthForLowMAPQ reads
covering the site.
int 10 [ [ -∞ ∞ ] ]
Minimum mapping quality of reads to count towards depth
Reads with MAPQ > minMappingQuality are treated as usable for variation detection, contributing to the PASS
state.
int 10 [ [ 0 255 ] ]
Output file (BED or per-base format)
R GATKPath null
Whether to suppress job-summary info on System.err.
Boolean false
Read filters to be applied before analysis
List[String] []
Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically.
List[GATKPath] []
Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
The --read-validation-stringency argument is an enumerated type (ValidationStringency), which can have one of the following values:
ValidationStringency SILENT
Reference sequence file
R GATKPath null
Output traversal statistics every time this many seconds elapse
double 10.0 [ [ -∞ ∞ ] ]
Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file.
GATKPath null
display hidden arguments
boolean false
If true, don't emit genotype fields when writing vcf file output.
boolean false
Name of file for output summary
Callable loci summary counts will be written to this file.
R GATKPath null
Temp directory to use.
GATKPath null
Whether to use the JdkDeflater (as opposed to IntelDeflater)
boolean false
Whether to use the JdkInflater (as opposed to IntelInflater)
boolean false
Control verbosity of logging.
The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:
LogLevel INFO
display the version number for this tool
boolean false
See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum
GATK version 4.6.2.0 built at Sun, 13 Apr 2025 13:21:43 -0400.