Showing tool doc from version 4.6.2.0 | The latest version is
4.6.2.0

MergeBamAlignment (Picard)

Merge alignment data from a SAM or BAM with data in an unmapped BAM file.

Summary

A command-line tool for merging BAM/SAM alignment info from a third-party aligner with the data in an unmapped BAM file, producing a third BAM file that has alignment data (from the aligner) and all the remaining data from the unmapped BAM. Quick note: this is not a tool for taking multiple sam files and creating a bigger file by merging them. For that use-case, see {@link MergeSamFiles}.

Details

Many alignment tools (still!) require fastq format input. The unmapped bam may contain useful information that will be lost in the conversion to fastq (meta-data like sample alias, library, barcodes, etc., and read-level tags.) This tool takes an unaligned bam with meta-data, and the aligned bam produced by calling {@link SamToFastq} and then passing the result to an aligner/mapper. It produces a new SAM file that includes all aligned and unaligned reads and also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process of converting to fastq). The resulting file will be valid for use by Picard and GATK tools. The output may be coordinate-sorted, in which case the tags, NM, MD, and UQ will be calculated and populated, or query-name sorted, in which case the tags will not be calculated or populated.

Usage example:

java -jar picard.jar MergeBamAlignment \ ALIGNED=aligned.bam \ UNMAPPED=unmapped.bam \ O=merge_alignments.bam \ R=reference_sequence.fasta

Note about required arguments

The aligned reads must be specified using either the ALIGNED_BAM or READ1_ALIGNED_BAM and READ2_ALIGNED_BAM arguments. Without aligned reads specified in one of those manners, the tool will not run.

Caveats

This tool has been developing for a while and many arguments have been added to it over the years. You may be particularly interested in the following (partial) list:

Category Read Data Manipulation


Overview

Summary

A command-line tool for merging BAM/SAM alignment info from a third-party aligner with the data in an unmapped BAM file, producing a third BAM file that has alignment data (from the aligner) and all the remaining data from the unmapped BAM. Quick note: this is not a tool for taking multiple sam files and creating a bigger file by merging them. For that use-case, see {@link MergeSamFiles}.

Details

Many alignment tools (still!) require fastq format input. The unmapped bam may contain useful information that will be lost in the conversion to fastq (meta-data like sample alias, library, barcodes, etc., and read-level tags.) This tool takes an unaligned bam with meta-data, and the aligned bam produced by calling {@link SamToFastq} and then passing the result to an aligner/mapper. It produces a new SAM file that includes all aligned and unaligned reads and also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process of converting to fastq). The resulting file will be valid for use by Picard and GATK tools. The output may be coordinate-sorted, in which case the tags, NM, MD, and UQ will be calculated and populated, or query-name sorted, in which case the tags will not be calculated or populated.

Usage example:

 java -jar picard.jar MergeBamAlignment \\
      ALIGNED=aligned.bam \\
      UNMAPPED=unmapped.bam \\
      O=merge_alignments.bam \\
      R=reference_sequence.fasta
 

Note about required arguments

The aligned reads must be specified using either the ALIGNED_BAM or READ1_ALIGNED_BAM and READ2_ALIGNED_BAM arguments. Without aligned reads specified in one of those manners, the tool will not run.

Caveats

This tool has been developing for a while and many arguments have been added to it over the years. You may be particularly interested in the following (partial) list: @author ktibbett@broadinstitute.org

MergeBamAlignment (Picard) specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--OUTPUT
 -O
Merged SAM or BAM file to write to.
--REFERENCE_SEQUENCE
 -R
Reference sequence file.
--UNMAPPED_BAM
 -UNMAPPED
Original SAM or BAM file of unmapped reads, which must be in queryname order. Reads MUST be unmapped.
Optional Tool Arguments
--ADD_MATE_CIGAR
 -MC
true Adds the mate CIGAR tag (MC) if true, does not if false.
--ALIGNED_BAM
 -ALIGNED
SAM or BAM file(s) with alignment data.
--ALIGNED_READS_ONLY
false Whether to output only aligned reads.
--ALIGNER_PROPER_PAIR_FLAGS
false Use the aligner's idea of what a proper pair is rather than computing in this program.
--arguments_file
read one or more arguments files and add them to the command line
--ATTRIBUTES_TO_REMOVE
Attributes from the alignment record that should be removed when merging. This overrides ATTRIBUTES_TO_RETAIN if they share common tags.
--ATTRIBUTES_TO_RETAIN
Reserved alignment attributes (tags starting with X, Y, or Z) that should be brought over from the alignment data when merging.
--ATTRIBUTES_TO_REVERSE
 -RV
[OQ, U2] Attributes on negative strand reads that need to be reversed.
--ATTRIBUTES_TO_REVERSE_COMPLEMENT
 -RC
[E2, SQ] Attributes on negative strand reads that need to be reverse complemented.
--CLIP_ADAPTERS
true Whether to clip adapters where identified.
--CLIP_OVERLAPPING_READS
true For paired reads, clip the 3' end of each read if necessary so that it does not extend past the 5' end of its mate. Reads are first soft clipped so that the 3' aligned end of each read does not extend past the 5' aligned end of its mate. If HARD_CLIP_OVERLAPPING_READS is also true, then reads are additionally hard clipped so that the 3' unclipped end of each read does not extend past the 5' unclipped end of its mate. Hard clipped bases and their qualities are stored in the XB and XQ tags, respectively.
--EXPECTED_ORIENTATIONS
 -ORIENTATIONS
The expected orientation of proper read pairs. Replaces JUMP_SIZE
--HARD_CLIP_OVERLAPPING_READS
false If true, hard clipping will be applied to overlapping reads. By default, soft clipping is used.
--help
 -h
false display the help message
--INCLUDE_SECONDARY_ALIGNMENTS
true If false, do not write secondary alignments to output.
--IS_BISULFITE_SEQUENCE
false Whether the lane is bisulfite sequence (used when calculating the NM tag).
--JUMP_SIZE
 -JUMP
The expected jump size (required if this is a jumping library). Deprecated. Use EXPECTED_ORIENTATIONS instead
--MATCHING_DICTIONARY_TAGS
[M5, LN] List of Sequence Records tags that must be equal (if present) in the reference dictionary and in the aligned file. Mismatching tags will cause an error if in this list, and a warning otherwise.
--MAX_INSERTIONS_OR_DELETIONS
 -MAX_GAPS
1 The maximum number of insertions or deletions permitted for an alignment to be included. Alignments with more than this many insertions or deletions will be ignored. Set to -1 to allow any number of insertions or deletions.
--MIN_UNCLIPPED_BASES
32 If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will be marked as contaminant.
--PAIRED_RUN
 -PE
true DEPRECATED. This argument is ignored and will be removed.
--PRIMARY_ALIGNMENT_STRATEGY
BestMapq Strategy for selecting primary alignment when the aligner has provided more than one alignment for a pair or fragment, and none are marked as primary, more than one is marked as primary, or the primary alignment is filtered out for some reason. For all strategies, ties are resolved arbitrarily.
--PROGRAM_GROUP_COMMAND_LINE
 -PG_COMMAND
The command line of the program group (if not supplied by the aligned file).
--PROGRAM_GROUP_NAME
 -PG_NAME
The name of the program group (if not supplied by the aligned file).
--PROGRAM_GROUP_VERSION
 -PG_VERSION
The version of the program group (if not supplied by the aligned file).
--PROGRAM_RECORD_ID
 -PG
The program group ID of the aligner (if not supplied by the aligned file).
--READ1_ALIGNED_BAM
 -R1_ALIGNED
SAM or BAM file(s) with alignment data from the first read of a pair.
--READ1_TRIM
 -R1_TRIM
0 The number of bases trimmed from the beginning of read 1 prior to alignment
--READ2_ALIGNED_BAM
 -R2_ALIGNED
SAM or BAM file(s) with alignment data from the second read of a pair.
--READ2_TRIM
 -R2_TRIM
0 The number of bases trimmed from the beginning of read 2 prior to alignment
--SORT_ORDER
 -SO
coordinate The order in which the merged reads should be output.
--UNMAP_CONTAMINANT_READS
 -UNMAP_CONTAM
false Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial sample),and unmap + label those reads accordingly.
--UNMAPPED_READ_STRATEGY
DO_NOT_CHANGE How to deal with alignment information in reads that are being unmapped (e.g. due to cross-species contamination.) Currently ignored unless UNMAP_CONTAMINANT_READS = true. Note that the DO_NOT_CHANGE strategy will actually reset the cigar and set the mapping quality on unmapped reads since otherwisethe result will be an invalid record. To force no change use the DO_NOT_CHANGE_INVALID strategy.
--version
false display the version number for this tool
Optional Common Arguments
--ADD_PG_TAG_TO_READS
true Add PG tag to each read in a SAM or BAM
--COMPRESSION_LEVEL
5 Compression level for all compressed files created (e.g. BAM and VCF).
--CREATE_INDEX
false Whether to create an index when writing VCF or coordinate sorted BAM output.
--CREATE_MD5_FILE
false Whether to create an MD5 digest for any BAM or FASTQ files created.
--MAX_RECORDS_IN_RAM
500000 When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
--QUIET
false Whether to suppress job-summary info on System.err.
--TMP_DIR
One or more directories with space available to be used by this program for temporary storage of working files
--USE_JDK_DEFLATER
 -use_jdk_deflater
false Use the JDK Deflater instead of the Intel Deflater for writing compressed output
--USE_JDK_INFLATER
 -use_jdk_inflater
false Use the JDK Inflater instead of the Intel Inflater for reading compressed input
--VALIDATION_STRINGENCY
STRICT Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
--VERBOSITY
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--ADD_MATE_CIGAR / -MC

Adds the mate CIGAR tag (MC) if true, does not if false.

Boolean  true


--ADD_PG_TAG_TO_READS

Add PG tag to each read in a SAM or BAM

boolean  true


--ALIGNED_BAM / -ALIGNED

SAM or BAM file(s) with alignment data.

Exclusion: This argument cannot be used at the same time as READ1_ALIGNED_BAM, READ2_ALIGNED_BAM.

List[File]  []


--ALIGNED_READS_ONLY

Whether to output only aligned reads.

boolean  false


--ALIGNER_PROPER_PAIR_FLAGS

Use the aligner's idea of what a proper pair is rather than computing in this program.

boolean  false


--arguments_file

read one or more arguments files and add them to the command line

List[File]  []


--ATTRIBUTES_TO_REMOVE

Attributes from the alignment record that should be removed when merging. This overrides ATTRIBUTES_TO_RETAIN if they share common tags.

List[String]  []


--ATTRIBUTES_TO_RETAIN

Reserved alignment attributes (tags starting with X, Y, or Z) that should be brought over from the alignment data when merging.

List[String]  []


--ATTRIBUTES_TO_REVERSE / -RV

Attributes on negative strand reads that need to be reversed.

Set[String]  [OQ, U2]


--ATTRIBUTES_TO_REVERSE_COMPLEMENT / -RC

Attributes on negative strand reads that need to be reverse complemented.

Set[String]  [E2, SQ]


--CLIP_ADAPTERS

Whether to clip adapters where identified.

boolean  true


--CLIP_OVERLAPPING_READS

For paired reads, clip the 3' end of each read if necessary so that it does not extend past the 5' end of its mate. Reads are first soft clipped so that the 3' aligned end of each read does not extend past the 5' aligned end of its mate. If HARD_CLIP_OVERLAPPING_READS is also true, then reads are additionally hard clipped so that the 3' unclipped end of each read does not extend past the 5' unclipped end of its mate. Hard clipped bases and their qualities are stored in the XB and XQ tags, respectively.

boolean  true


--COMPRESSION_LEVEL

Compression level for all compressed files created (e.g. BAM and VCF).

int  5  [ [ -∞  ∞ ] ]


--CREATE_INDEX

Whether to create an index when writing VCF or coordinate sorted BAM output.

Boolean  false


--CREATE_MD5_FILE

Whether to create an MD5 digest for any BAM or FASTQ files created.

boolean  false


--EXPECTED_ORIENTATIONS / -ORIENTATIONS

The expected orientation of proper read pairs. Replaces JUMP_SIZE

Exclusion: This argument cannot be used at the same time as JUMP_SIZE.

The --EXPECTED_ORIENTATIONS argument is an enumerated type (List[PairOrientation]), which can have one of the following values:

FR
RF
TANDEM

List[PairOrientation]  []


--HARD_CLIP_OVERLAPPING_READS

If true, hard clipping will be applied to overlapping reads. By default, soft clipping is used.

boolean  false


--help / -h

display the help message

boolean  false


--INCLUDE_SECONDARY_ALIGNMENTS

If false, do not write secondary alignments to output.

boolean  true


--IS_BISULFITE_SEQUENCE

Whether the lane is bisulfite sequence (used when calculating the NM tag).

boolean  false


--JUMP_SIZE / -JUMP

The expected jump size (required if this is a jumping library). Deprecated. Use EXPECTED_ORIENTATIONS instead

Exclusion: This argument cannot be used at the same time as EXPECTED_ORIENTATIONS.

Integer  null


--MATCHING_DICTIONARY_TAGS

List of Sequence Records tags that must be equal (if present) in the reference dictionary and in the aligned file. Mismatching tags will cause an error if in this list, and a warning otherwise.

List[String]  [M5, LN]


--MAX_INSERTIONS_OR_DELETIONS / -MAX_GAPS

The maximum number of insertions or deletions permitted for an alignment to be included. Alignments with more than this many insertions or deletions will be ignored. Set to -1 to allow any number of insertions or deletions.

int  1  [ [ -∞  ∞ ] ]


--MAX_RECORDS_IN_RAM

When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.

Integer  500000  [ [ -∞  ∞ ] ]


--MIN_UNCLIPPED_BASES

If UNMAP_CONTAMINANT_READS is set, require this many unclipped bases or else the read will be marked as contaminant.

int  32  [ [ -∞  ∞ ] ]


--OUTPUT / -O

Merged SAM or BAM file to write to.

R File  null


--PAIRED_RUN / -PE

DEPRECATED. This argument is ignored and will be removed.

Boolean  true


--PRIMARY_ALIGNMENT_STRATEGY

Strategy for selecting primary alignment when the aligner has provided more than one alignment for a pair or fragment, and none are marked as primary, more than one is marked as primary, or the primary alignment is filtered out for some reason. For all strategies, ties are resolved arbitrarily.

The --PRIMARY_ALIGNMENT_STRATEGY argument is an enumerated type (PrimaryAlignmentStrategy), which can have one of the following values:

BestMapq
Expects that multiple alignments will be correlated with HI tag, and prefers the pair of alignments with the largest MAPQ, in the absence of a primary selected by the aligner.
EarliestFragment
Prefers the alignment which maps the earliest base in the read. Note that EarliestFragment may not be used for paired reads.
BestEndMapq
Appropriate for cases in which the aligner is not pair-aware, and does not output the HI tag. It simply picks the alignment for each end with the highest MAPQ, and makes those alignments primary, regardless of whether the two alignments make sense together.
MostDistant
Appropriate for a non-pair-aware aligner. Picks the alignment pair with the largest insert size. If all alignments would be chimeric, it picks the alignments for each end with the best MAPQ.

PrimaryAlignmentStrategy  BestMapq


--PROGRAM_GROUP_COMMAND_LINE / -PG_COMMAND

The command line of the program group (if not supplied by the aligned file).

String  null


--PROGRAM_GROUP_NAME / -PG_NAME

The name of the program group (if not supplied by the aligned file).

String  null


--PROGRAM_GROUP_VERSION / -PG_VERSION

The version of the program group (if not supplied by the aligned file).

String  null


--PROGRAM_RECORD_ID / -PG

The program group ID of the aligner (if not supplied by the aligned file).

String  null


--QUIET

Whether to suppress job-summary info on System.err.

Boolean  false


--READ1_ALIGNED_BAM / -R1_ALIGNED

SAM or BAM file(s) with alignment data from the first read of a pair.

Exclusion: This argument cannot be used at the same time as ALIGNED_BAM.

List[File]  []


--READ1_TRIM / -R1_TRIM

The number of bases trimmed from the beginning of read 1 prior to alignment

int  0  [ [ -∞  ∞ ] ]


--READ2_ALIGNED_BAM / -R2_ALIGNED

SAM or BAM file(s) with alignment data from the second read of a pair.

Exclusion: This argument cannot be used at the same time as ALIGNED_BAM.

List[File]  []


--READ2_TRIM / -R2_TRIM

The number of bases trimmed from the beginning of read 2 prior to alignment

int  0  [ [ -∞  ∞ ] ]


--REFERENCE_SEQUENCE / -R

Reference sequence file.

R PicardHtsPath  null


--showHidden / -showHidden

display hidden arguments

boolean  false


--SORT_ORDER / -SO

The order in which the merged reads should be output.

The --SORT_ORDER argument is an enumerated type (SortOrder), which can have one of the following values:

unsorted
queryname
coordinate
duplicate
unknown

SortOrder  coordinate


--TMP_DIR

One or more directories with space available to be used by this program for temporary storage of working files

List[File]  []


--UNMAP_CONTAMINANT_READS / -UNMAP_CONTAM

Detect reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial sample),and unmap + label those reads accordingly.

boolean  false


--UNMAPPED_BAM / -UNMAPPED

Original SAM or BAM file of unmapped reads, which must be in queryname order. Reads MUST be unmapped.

R File  null


--UNMAPPED_READ_STRATEGY

How to deal with alignment information in reads that are being unmapped (e.g. due to cross-species contamination.) Currently ignored unless UNMAP_CONTAMINANT_READS = true. Note that the DO_NOT_CHANGE strategy will actually reset the cigar and set the mapping quality on unmapped reads since otherwisethe result will be an invalid record. To force no change use the DO_NOT_CHANGE_INVALID strategy.

The --UNMAPPED_READ_STRATEGY argument is an enumerated type (UnmappingReadStrategy), which can have one of the following values:

COPY_TO_TAG
DO_NOT_CHANGE
DO_NOT_CHANGE_INVALID
MOVE_TO_TAG

UnmappingReadStrategy  DO_NOT_CHANGE


--USE_JDK_DEFLATER / -use_jdk_deflater

Use the JDK Deflater instead of the Intel Deflater for writing compressed output

Boolean  false


--USE_JDK_INFLATER / -use_jdk_inflater

Use the JDK Inflater instead of the Intel Inflater for reading compressed input

Boolean  false


--VALIDATION_STRINGENCY

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:

STRICT
LENIENT
SILENT

ValidationStringency  STRICT


--VERBOSITY

Control verbosity of logging.

The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.6.2.0 built at Sun, 13 Apr 2025 13:21:43 -0400.