Showing tool doc from version 4.6.2.0 | The latest version is
4.6.2.0

FilterIntervals

Filters intervals based on annotations and/or count statistics

Category Copy Number Variant Discovery


Overview

Given specified intervals, annotated intervals output by {@link AnnotateIntervals}, and/or counts output by {@link CollectReadCounts}, outputs a filtered Picard interval list. The set intersection of intervals from the specified intervals, the annotated intervals, and the first count file will be taken as the initial set of intervals on which to perform filtering. Parameters for filtering based on the annotations and counts can be adjusted. Annotation-based filters will be applied first, followed by count-based filters. In the end, any singleton intervals (i.e., those being by themselves on their corresponding contigs) found after applying other filters will be filtered out. The result may be passed via -L to other tools (e.g., {@link DetermineGermlineContigPloidy} and {@link GermlineCNVCaller}) to mask intervals from analysis.

Inputs

Outputs

Usage examples

     gatk FilterIntervals \
          -L preprocessed_intervals.interval_list \
          -XL blacklist_intervals.interval_list \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.hdf5 \
          ... \
          --annotated-intervals annotated_intervals.tsv \
          -O filtered_intervals.interval_list
 
     gatk FilterIntervals \
          -L preprocessed_intervals.interval_list \
          --annotated-intervals annotated_intervals.tsv \
          -O filtered_intervals.interval_list
 
     gatk FilterIntervals \
          -L preprocessed_intervals.interval_list \
          -I sample_1.counts.hdf5 \
          -I sample_2.counts.hdf5 \
          ... \
          -O filtered_intervals.interval_list
 

Caveats

Note that a minimum mappability greater than zero and/or a maximum segmental duplication content less than one both have the potential to exclude real variant calls by excluding their intervals due to these criteria.

@author Samuel Lee <slee@broadinstitute.org>

FilterIntervals specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--intervals
 -L
One or more genomic intervals over which to operate
--output
 -O
Output Picard interval-list file containing the filtered intervals.
Optional Tool Arguments
--annotated-intervals
Input file containing annotations for genomic intervals (output of AnnotateIntervals). Must be provided if no counts files are provided.
--arguments_file
read one or more arguments files and add them to the command line
--extreme-count-filter-maximum-percentile
99.0 Maximum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly greater than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)
--extreme-count-filter-minimum-percentile
1.0 Minimum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly less than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)
--extreme-count-filter-percentage-of-samples
90.0 Percentage-of-samples parameter for the extreme-count filter. Intervals with a count that has a percentile outside of [extreme-count-filter-minimum-percentile, extreme-count-filter-maximum-percentile] in a percentage of samples strictly greater than this will be filtered out. (This is the second count-based filter applied.)
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.
--help
 -h
false display the help message
--input
 -I
Input TSV or HDF5 files containing integer read counts in genomic intervals (output of CollectReadCounts). Must be provided if no annotated-intervals file is provided.
--interval-merging-rule
 -imr
ALL Interval merging rule for abutting intervals
--low-count-filter-count-threshold
10 Count-threshold parameter for the low-count filter. Intervals with a count strictly less than this threshold in a percentage of samples strictly greater than low-count-filter-percentage-of-samples will be filtered out. (This is the first count-based filter applied.)
--low-count-filter-percentage-of-samples
50.0 Percentage-of-samples parameter for the low-count filter. Intervals with a count strictly less than low-count-filter-count-threshold in a percentage of samples strictly greater than this will be filtered out. (This is the first count-based filter applied.)
--maximum-gc-content
0.9 Maximum allowed value for GC-content annotation (inclusive).
--maximum-mappability
1.0 Maximum allowed value for mappability annotation (inclusive).
--maximum-segmental-duplication-content
0.5 Maximum allowed value for segmental-duplication-content annotation (inclusive).
--minimum-gc-content
0.1 Minimum allowed value for GC-content annotation (inclusive).
--minimum-mappability
0.9 Minimum allowed value for mappability annotation (inclusive).
--minimum-segmental-duplication-content
0.0 Minimum allowed value for segmental-duplication-content annotation (inclusive).
--version
false display the version number for this tool
Optional Common Arguments
--exclude-intervals
 -XL
One or more genomic intervals to exclude from processing
--gatk-config-file
A configuration file to use with the GATK.
--interval-exclusion-padding
 -ixp
0 Amount of padding (in bp) to add to each interval you are excluding.
--interval-padding
 -ip
0 Amount of padding (in bp) to add to each interval you are including.
--interval-set-rule
 -isr
UNION Set merging approach to use for combining interval inputs
--QUIET
false Whether to suppress job-summary info on System.err.
--tmp-dir
Temp directory to use.
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotated-intervals

Input file containing annotations for genomic intervals (output of AnnotateIntervals). Must be provided if no counts files are provided.

File  null


--arguments_file

read one or more arguments files and add them to the command line

List[File]  []


--exclude-intervals / -XL

One or more genomic intervals to exclude from processing
Use this argument to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -XL 1 or -XL 1:100-200) or by loading in a file containing a list of intervals (e.g. -XL myFile.intervals). strings gathered from the command line -XL argument to be parsed into intervals to exclude

List[String]  []


--extreme-count-filter-maximum-percentile

Maximum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly greater than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)

double  99.0  [ [ 0  100 ] ]


--extreme-count-filter-minimum-percentile

Minimum-percentile parameter for the extreme-count filter. Intervals with a count that has a percentile strictly less than this in a percentage of samples strictly greater than extreme-count-filter-percentage-of-samples will be filtered out. (This is the second count-based filter applied.)

double  1.0  [ [ 0  100 ] ]


--extreme-count-filter-percentage-of-samples

Percentage-of-samples parameter for the extreme-count filter. Intervals with a count that has a percentile outside of [extreme-count-filter-minimum-percentile, extreme-count-filter-maximum-percentile] in a percentage of samples strictly greater than this will be filtered out. (This is the second count-based filter applied.)

double  90.0  [ [ 0  100 ] ]


--gatk-config-file

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--gcs-project-for-requester-pays

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.

String  ""


--help / -h

display the help message

boolean  false


--input / -I

Input TSV or HDF5 files containing integer read counts in genomic intervals (output of CollectReadCounts). Must be provided if no annotated-intervals file is provided.

List[File]  []


--interval-exclusion-padding / -ixp

Amount of padding (in bp) to add to each interval you are excluding.
Use this to add padding to the intervals specified using -XL. For example, '-XL 1:100' with a padding value of 20 would turn into '-XL 1:80-120'. This is typically used to add padding around targets when analyzing exomes.

int  0  [ [ -∞  ∞ ] ]


--interval-merging-rule / -imr

Interval merging rule for abutting intervals
By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not actually overlap) into a single continuous interval. However you can change this behavior if you want them to be treated as separate intervals instead.

The --interval-merging-rule argument is an enumerated type (IntervalMergingRule), which can have one of the following values:

ALL
OVERLAPPING_ONLY

IntervalMergingRule  ALL


--interval-padding / -ip

Amount of padding (in bp) to add to each interval you are including.
Use this to add padding to the intervals specified using -L. For example, '-L 1:100' with a padding value of 20 would turn into '-L 1:80-120'. This is typically used to add padding around targets when analyzing exomes.

int  0  [ [ -∞  ∞ ] ]


--interval-set-rule / -isr

Set merging approach to use for combining interval inputs
By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the analysis only on chromosome 1 exomes, you could specify -L exomes.intervals -L 1 --interval-set-rule INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will always be merged using UNION). Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set.

The --interval-set-rule argument is an enumerated type (IntervalSetRule), which can have one of the following values:

UNION
Take the union of all intervals
INTERSECTION
Take the intersection of intervals (the subset that overlaps all intervals specified)

IntervalSetRule  UNION


--intervals / -L

One or more genomic intervals over which to operate

R List[String]  []


--low-count-filter-count-threshold

Count-threshold parameter for the low-count filter. Intervals with a count strictly less than this threshold in a percentage of samples strictly greater than low-count-filter-percentage-of-samples will be filtered out. (This is the first count-based filter applied.)

int  10  [ [ 0  ∞ ] ]


--low-count-filter-percentage-of-samples

Percentage-of-samples parameter for the low-count filter. Intervals with a count strictly less than low-count-filter-count-threshold in a percentage of samples strictly greater than this will be filtered out. (This is the first count-based filter applied.)

double  50.0  [ [ 0  100 ] ]


--maximum-gc-content

Maximum allowed value for GC-content annotation (inclusive).

double  0.9  [ [ 0  1 ] ]


--maximum-mappability

Maximum allowed value for mappability annotation (inclusive).

double  1.0  [ [ 0  1 ] ]


--maximum-segmental-duplication-content

Maximum allowed value for segmental-duplication-content annotation (inclusive).

double  0.5  [ [ 0  1 ] ]


--minimum-gc-content

Minimum allowed value for GC-content annotation (inclusive).

double  0.1  [ [ 0  1 ] ]


--minimum-mappability

Minimum allowed value for mappability annotation (inclusive).

double  0.9  [ [ 0  1 ] ]


--minimum-segmental-duplication-content

Minimum allowed value for segmental-duplication-content annotation (inclusive).

double  0.0  [ [ 0  1 ] ]


--output / -O

Output Picard interval-list file containing the filtered intervals.

R File  null


--QUIET

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--tmp-dir

Temp directory to use.

GATKPath  null


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.6.2.0 built at Sun, 13 Apr 2025 13:21:43 -0400.