Showing tool doc from version 4.6.2.0 | The latest version is
4.6.2.0

IntervalListTools (Picard)

A tool for performing various IntervalList manipulations

Summary

This tool offers multiple interval list file manipulation capabilities, including: sorting, merging, subtracting, padding, and other set-theoretic operations. The default action is to merge and sort the intervals provided in the INPUTs. Other options, e.g. interval subtraction, are controlled by the arguments.
Both IntervalList and VCF files are accepted as input. IntervalList should be denoted with the extension .interval_list, while a VCF must have one of .vcf, .vcf.gz, .bcf When VCF file is used as input, each variant is translated into an using its reference allele or the END INFO annotation (if present) to determine the extent of the interval. IntervalListTools can also "scatter" the resulting interval-list into many interval-files. This can be useful for creating multiple interval lists for scattering an analysis over.

Details

The IntervalList file format is designed to help the users avoid mixing references when supplying intervals and other genomic data to a single tool. A SAM style header must be present at the top of the file. After the header, the file then contains records, one per line in text format with the followingvalues tab-separated: - Sequence name (SN) - Start position (1-based) - End position (1-based, inclusive) - Strand (either + or -) - Interval name (ideally unique names for intervals) The coordinate system is 1-based, closed-ended so that the first base in a sequence has position 1, and both the start and the end positions are included in an interval. Example interval list file
@HD	VN:1.0
@SQ	SN:chr1	LN:501
@SQ	SN:chr2	LN:401
chr1	1	100	+	starts at the first base of the contig and covers 100 bases
chr2	100	100	+	interval with exactly one base

Usage Examples

1. Combine the intervals from two interval lists:

java -jar picard.jar IntervalListTools \
      ACTION=CONCAT \
      I=input.interval_list \
      I=input_2.interval_list \
      O=new.interval_list

2. Combine the intervals from two interval lists, sorting the resulting in list and merging overlapping and abutting intervals:

 java -jar picard.jar IntervalListTools \
       ACTION=CONCAT \
       SORT=true \
       UNIQUE=true \
       I=input.interval_list \
       I=input_2.interval_list \
       O=new.interval_list 

3. Subtract the intervals in SECOND_INPUT from those in INPUT

 java -jar picard.jar IntervalListTools \
       ACTION=SUBTRACT \
       I=input.interval_list \
       SI=input_2.interval_list \
       O=new.interval_list 

4. Find bases that are in either input1.interval_list or input2.interval_list, and also in input3.interval_list:

 java -jar picard.jar IntervalListTools \
       ACTION=INTERSECT \
       I=input1.interval_list \
       I=input2.interval_list \
       SI=input3.interval_list \
       O=new.interval_list 

5. Combine overlapping intervals but NOT abutting intervals:

 java -jar picard.jar IntervalListTools \
       ACTION=UNION \
       DONT_MERGE_ABUTTING=true \
       I=input1.interval_list \
       O=new.interval_list 

Category Intervals Manipulation


Overview

Performs various {@link IntervalList} manipulations.

Summary

This tool offers multiple interval list file manipulation capabilities, including: sorting, merging, subtracting, padding, and other set-theoretic operations. The default action is to merge and sort the intervals provided in the {@link #INPUT}s. Other options, e.g. interval subtraction, are controlled by the arguments.
Both {@link IntervalList} and VCF files are accepted as input. {@link IntervalList} should be denoted with the extension {@value htsjdk.samtools.util.IOUtil#INTERVAL_LIST_FILE_EXTENSION}, while a VCF must have one of {@value htsjdk.samtools.util.IOUtil#VCF_FILE_EXTENSION}, {@value htsjdk.samtools.util.IOUtil#COMPRESSED_VCF_FILE_EXTENSION}, {@value htsjdk.samtools.util.IOUtil#BCF_FILE_EXTENSION}. When VCF file is used as input, each variant is translated into an using its reference allele or the END INFO annotation (if present) to determine the extent of the interval.

{@link IntervalListTools} can also "scatter" the resulting interval-list into many interval-files. This can be useful for creating multiple interval lists for scattering an analysis over.

Details

The IntervalList file format is designed to help the users avoid mixing references when supplying intervals and other genomic data to a single tool. A SAM style header must be present at the top of the file. After the header, the file then contains records, one per line in text format with the following values tab-separated:
 
  • Sequence name (SN)
  • Start position (1-based)
  • End position (1-based, end inclusive)
  • Strand (either + or -)
  • Interval name (ideally unique names for intervals)
The coordinate system is 1-based, closed-ended, so that the first base in a sequence is at position 1, and both the start and the end positions are included in an interval.

For Example:

 \@HD	VN:1.0
 \@SQ	SN:chr1	LN:501
 \@SQ	SN:chr2	LN:401
 chr1	1	100	+	starts at the first base of the contig and covers 100 bases
 chr2	100	100	+	interval with exactly one base
 

Usage examples

1. Combine the intervals from two interval lists:

 java -jar picard.jar IntervalListTools \\
       ACTION=CONCAT \\
       I=input.interval_list \\
       I=input_2.interval_list \\
       O=new.interval_list
 

2. Combine the intervals from two interval lists, sorting the resulting in list and merging overlapping and abutting intervals:

 java -jar picard.jar IntervalListTools \\
       ACTION=CONCAT \\
       SORT=true \\
       UNIQUE=true \\
       I=input.interval_list \\
       I=input_2.interval_list \\
       O=new.interval_list
 

3. Subtract the intervals in SECOND_INPUT from those in INPUT:

 java -jar picard.jar IntervalListTools \\
       ACTION=SUBTRACT \\
       I=input.interval_list \\
       SI=input_2.interval_list \\
       O=new.interval_list
 

4. Find bases that are in either input1.interval_list or input2.interval_list, and also in input3.interval_list:

 java -jar picard.jar IntervalListTools \\
       ACTION=INTERSECT \\
       I=input1.interval_list \\
       I=input2.interval_list \\
       SI=input3.interval_list \\
       O=new.interval_list
 
@author Tim Fennell

IntervalListTools (Picard) specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--INPUT
 -I
One or more interval lists. If multiple interval lists are provided the output is theresult of merging the inputs. Supported formats are interval_list and VCF.If file extension is unrecognized, assumes file is interval_listFor standard input (stdin), write /dev/stdin as the input file
Optional Tool Arguments
--ACTION
CONCAT Action to take on inputs.
--arguments_file
read one or more arguments files and add them to the command line
--BREAK_BANDS_AT_MULTIPLES_OF
 -BRK
0 If set to a positive value will create a new interval list with the original intervals broken up at integer multiples of this value. Set to 0 to NOT break up intervals.
--COMMENT
One or more lines of comment to add to the header of the output file (as @CO lines in the SAM header).
--COUNT_OUTPUT
File to which to print count of bases or intervals in final output interval list. When not set, value indicated by OUTPUT_VALUE will be printed to stdout. If this parameter is set, OUTPUT_VALUE must not be NONE.
--DONT_MERGE_ABUTTING
false If false, do not merge abutting intervals (keep them separate). Note: abutting intervals are combined by default with the UNION action.
--help
 -h
false display the help message
--INCLUDE_FILTERED
false Whether to include filtered variants in the vcf when generating an interval list from vcf.
--INVERT
false Produce the inverse list of intervals, that is, the regions in the genome that are
not
covered by any of the input intervals. Will merge abutting intervals first. Output will be sorted.
--OUTPUT
 -O
The output interval list file to write (if SCATTER_COUNT == 1) or the directory into which to write the scattered interval sub-directories (if SCATTER_COUNT > 1).
--OUTPUT_VALUE
NONE What value to output to COUNT_OUTPUT file or stdout (for scripting). If COUNT_OUTPUT is provided, this parameter must not be NONE.
--PADDING
0 The amount to pad each end of the intervals by before other operations are undertaken. Negative numbers are allowed and indicate intervals should be shrunk. Resulting intervals < 0 bases long will be removed. Padding is applied to the interval lists (both INPUT and SECOND_INPUT, if provided) before the ACTION is performed.
--SCATTER_CONTENT
When scattering with this argument, each of the resultant files will (ideally) have this amount of 'content', which means either base-counts or interval-counts depending on SUBDIVISION_MODE. When provided, overrides SCATTER_COUNT
--SCATTER_COUNT
1 The number of files into which to scatter the resulting list by locus; in some situations, fewer intervals may be emitted.
--SECOND_INPUT
 -SI
Second set of intervals for SUBTRACT and DIFFERENCE operations.
--SORT
true If true, sort the resulting interval list by coordinate.
--SUBDIVISION_MODE
 -M
INTERVAL_SUBDIVISION The mode used to scatter the interval list.
--UNIQUE
false If true, merge overlapping and adjacent intervals to create a list of unique intervals. Implies SORT=true.
--version
false display the version number for this tool
Optional Common Arguments
--COMPRESSION_LEVEL
5 Compression level for all compressed files created (e.g. BAM and VCF).
--CREATE_INDEX
false Whether to create an index when writing VCF or coordinate sorted BAM output.
--CREATE_MD5_FILE
false Whether to create an MD5 digest for any BAM or FASTQ files created.
--MAX_RECORDS_IN_RAM
500000 When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
--QUIET
false Whether to suppress job-summary info on System.err.
--REFERENCE_SEQUENCE
 -R
Reference sequence file.
--TMP_DIR
One or more directories with space available to be used by this program for temporary storage of working files
--USE_JDK_DEFLATER
 -use_jdk_deflater
false Use the JDK Deflater instead of the Intel Deflater for writing compressed output
--USE_JDK_INFLATER
 -use_jdk_inflater
false Use the JDK Inflater instead of the Intel Inflater for reading compressed input
--VALIDATION_STRINGENCY
STRICT Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
--VERBOSITY
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--ACTION

Action to take on inputs.

The --ACTION argument is an enumerated type (Action), which can have one of the following values:

CONCAT
The concatenation of all the intervals in all the INPUTs, no sorting or merging of overlapping/abutting intervals implied. Will result in a possibly unsorted list unless requested otherwise.
UNION
Like CONCATENATE but with UNIQUE and SORT implied, the result being the set-wise union of all INPUTS, with overlapping and abutting intervals merged into one.
INTERSECT
The sorted and merged set of all loci that are contained in all of the INPUTs.
SUBTRACT
Subtracts the intervals in SECOND_INPUT from those in INPUT. The resulting loci are those in INPUT that are not in SECOND_INPUT.
SYMDIFF
Results in loci that are in INPUT or SECOND_INPUT but are not in both.
OVERLAPS
Outputs the entire intervals from INPUT that have bases which overlap any interval from SECOND_INPUT. Note that this is different than INTERSECT in that each original interval is either emitted in its entirety, or not at all.

Action  CONCAT


--arguments_file

read one or more arguments files and add them to the command line

List[File]  []


--BREAK_BANDS_AT_MULTIPLES_OF / -BRK

If set to a positive value will create a new interval list with the original intervals broken up at integer multiples of this value. Set to 0 to NOT break up intervals.

int  0  [ [ -∞  ∞ ] ]


--COMMENT

One or more lines of comment to add to the header of the output file (as @CO lines in the SAM header).

List[String]  []


--COMPRESSION_LEVEL

Compression level for all compressed files created (e.g. BAM and VCF).

int  5  [ [ -∞  ∞ ] ]


--COUNT_OUTPUT

File to which to print count of bases or intervals in final output interval list. When not set, value indicated by OUTPUT_VALUE will be printed to stdout. If this parameter is set, OUTPUT_VALUE must not be NONE.

PicardHtsPath  null


--CREATE_INDEX

Whether to create an index when writing VCF or coordinate sorted BAM output.

Boolean  false


--CREATE_MD5_FILE

Whether to create an MD5 digest for any BAM or FASTQ files created.

boolean  false


--DONT_MERGE_ABUTTING

If false, do not merge abutting intervals (keep them separate). Note: abutting intervals are combined by default with the UNION action.

boolean  false


--help / -h

display the help message

boolean  false


--INCLUDE_FILTERED

Whether to include filtered variants in the vcf when generating an interval list from vcf.

boolean  false


--INPUT / -I

One or more interval lists. If multiple interval lists are provided the output is theresult of merging the inputs. Supported formats are interval_list and VCF.If file extension is unrecognized, assumes file is interval_listFor standard input (stdin), write /dev/stdin as the input file

R List[PicardHtsPath]  []


--INVERT

Produce the inverse list of intervals, that is, the regions in the genome that are
not
covered by any of the input intervals. Will merge abutting intervals first. Output will be sorted.

boolean  false


--MAX_RECORDS_IN_RAM

When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.

Integer  500000  [ [ -∞  ∞ ] ]


--OUTPUT / -O

The output interval list file to write (if SCATTER_COUNT == 1) or the directory into which to write the scattered interval sub-directories (if SCATTER_COUNT > 1).

PicardHtsPath  null


--OUTPUT_VALUE

What value to output to COUNT_OUTPUT file or stdout (for scripting). If COUNT_OUTPUT is provided, this parameter must not be NONE.

The --OUTPUT_VALUE argument is an enumerated type (Output), which can have one of the following values:

NONE
BASES
INTERVALS

Output  NONE


--PADDING

The amount to pad each end of the intervals by before other operations are undertaken. Negative numbers are allowed and indicate intervals should be shrunk. Resulting intervals < 0 bases long will be removed. Padding is applied to the interval lists (both INPUT and SECOND_INPUT, if provided) before the ACTION is performed.

int  0  [ [ -∞  ∞ ] ]


--QUIET

Whether to suppress job-summary info on System.err.

Boolean  false


--REFERENCE_SEQUENCE / -R

Reference sequence file.

PicardHtsPath  null


--SCATTER_CONTENT

When scattering with this argument, each of the resultant files will (ideally) have this amount of 'content', which means either base-counts or interval-counts depending on SUBDIVISION_MODE. When provided, overrides SCATTER_COUNT

Integer  null


--SCATTER_COUNT

The number of files into which to scatter the resulting list by locus; in some situations, fewer intervals may be emitted.

int  1  [ [ -∞  ∞ ] ]


--SECOND_INPUT / -SI

Second set of intervals for SUBTRACT and DIFFERENCE operations.

List[PicardHtsPath]  []


--showHidden / -showHidden

display hidden arguments

boolean  false


--SORT

If true, sort the resulting interval list by coordinate.

boolean  true


--SUBDIVISION_MODE / -M

The mode used to scatter the interval list.

The --SUBDIVISION_MODE argument is an enumerated type (IntervalListScatterMode), which can have one of the following values:

INTERVAL_SUBDIVISION
Scatter the interval list into similarly sized interval lists (by base count), breaking up intervals as needed.
BALANCING_WITHOUT_INTERVAL_SUBDIVISION
Scatter the interval list into similarly sized interval lists (by base count), but without breaking up intervals.
BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW
Scatter the interval list into similarly sized interval lists (by base count), but without breaking up intervals. Will overflow current interval list so that the remaining lists will not have too many bases to deal with.
INTERVAL_COUNT
Scatter the interval list into similarly sized interval lists (by interval count, not by base count). Resulting interval lists will contain the same number of intervals except for the last, which contains the remainder.
INTERVAL_COUNT_WITH_DISTRIBUTED_REMAINDER
Scatter the interval list into similarly sized interval lists (by interval count, not by base count). Resulting interval lists will contain similar number of intervals.

IntervalListScatterMode  INTERVAL_SUBDIVISION


--TMP_DIR

One or more directories with space available to be used by this program for temporary storage of working files

List[File]  []


--UNIQUE

If true, merge overlapping and adjacent intervals to create a list of unique intervals. Implies SORT=true.

boolean  false


--USE_JDK_DEFLATER / -use_jdk_deflater

Use the JDK Deflater instead of the Intel Deflater for writing compressed output

Boolean  false


--USE_JDK_INFLATER / -use_jdk_inflater

Use the JDK Inflater instead of the Intel Inflater for reading compressed input

Boolean  false


--VALIDATION_STRINGENCY

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:

STRICT
LENIENT
SILENT

ValidationStringency  STRICT


--VERBOSITY

Control verbosity of logging.

The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.6.2.0 built at Sun, 13 Apr 2025 13:21:43 -0400.