Showing tool doc from version 4.6.2.0 | The latest version is
4.6.2.0

BaitDesigner (Picard)

Designs oligonucleotide baits for hybrid selection reactions.

This tool is used to design custom bait sets for hybrid selection experiments. The following files are input into BaitDesigner: a (TARGET) interval list indicating the sequences of interest, e.g. exons with their respective coordinates, a reference sequence, and a unique identifier string (DESIGN_NAME).

The tool will output interval_list files of both bait and target sequences as well as the actual bait sequences in FastA format. At least two baits are output for each target sequence, with greater numbers for larger intervals. Although the default values for both bait size (120 bases) nd offsets (80 bases) are suitable for most applications, these values can be customized. Offsets represent the distance between sequential baits on a contiguous stretch of target DNA sequence.

The tool will also output a pooled set of 55,000 (default) oligonucleotides representing all of the baits redundantly. This redundancy achieves a uniform concentration of oligonucleotides for synthesis by a vendor as well as equal numbersof each bait to prevent bias during the hybrid selection reaction.

Usage example:

java -jar picard.jar BaitDesigner \
TARGET=targets.interval_list \
DESIGN_NAME=new_baits \
R=reference_sequence.fasta

Category Reference


Overview

Designs baits for hybrid selection! @author Tim Fennell

BaitDesigner (Picard) specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--DESIGN_NAME
The name of the bait design
--REFERENCE_SEQUENCE
 -R
Reference sequence file.
--TARGETS
 -T
The file with design parameters and targets
Optional Tool Arguments
--arguments_file
read one or more arguments files and add them to the command line
--BAIT_OFFSET
80 The desired offset between the start of one bait and the start of another bait for the same target.
--BAIT_SIZE
120 The length of each individual bait to design
--DESIGN_ON_TARGET_STRAND
false If true design baits on the strand of the target feature, if false always design on the + strand of the genome.
--DESIGN_STRATEGY
FixedOffset The design strategy to use to layout baits across each target
--FILL_POOLS
true If true, fill up the pools with alternating fwd and rc copies of all baits. Equal copies of all baits will always be maintained
--help
 -h
false display the help message
--LEFT_PRIMER
ATCGCACCAGCGTGT The left amplification primer to prepend to all baits for synthesis
--MERGE_NEARBY_TARGETS
true If true merge targets that are 'close enough' that designing against a merged target would be more efficient.
--MINIMUM_BAITS_PER_TARGET
2 The minimum number of baits to design per target.
--OUTPUT_AGILENT_FILES
true If true also output .design.txt files per pool with one line per bait sequence
--OUTPUT_DIRECTORY
 -O
The output directory. If not provided then the DESIGN_NAME will be used as the output directory
--PADDING
0 Pad the input targets by this amount when designing baits. Padding is applied on both sides in this amount.
--POOL_SIZE
55000 The size of pools or arrays for synthesis. If no pool files are desired, can be set to 0.
--REPEAT_TOLERANCE
50 Baits that have more than REPEAT_TOLERANCE soft or hard masked bases will not be allowed
--RIGHT_PRIMER
CACTGCGGCTCCTCA The right amplification primer to prepend to all baits for synthesis
--version
false display the version number for this tool
Optional Common Arguments
--COMPRESSION_LEVEL
5 Compression level for all compressed files created (e.g. BAM and VCF).
--CREATE_INDEX
false Whether to create an index when writing VCF or coordinate sorted BAM output.
--CREATE_MD5_FILE
false Whether to create an MD5 digest for any BAM or FASTQ files created.
--MAX_RECORDS_IN_RAM
500000 When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
--QUIET
false Whether to suppress job-summary info on System.err.
--TMP_DIR
One or more directories with space available to be used by this program for temporary storage of working files
--USE_JDK_DEFLATER
 -use_jdk_deflater
false Use the JDK Deflater instead of the Intel Deflater for writing compressed output
--USE_JDK_INFLATER
 -use_jdk_inflater
false Use the JDK Inflater instead of the Intel Inflater for reading compressed input
--VALIDATION_STRINGENCY
STRICT Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
--VERBOSITY
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--arguments_file

read one or more arguments files and add them to the command line

List[File]  []


--BAIT_OFFSET

The desired offset between the start of one bait and the start of another bait for the same target.

int  80  [ [ -∞  ∞ ] ]


--BAIT_SIZE

The length of each individual bait to design

int  120  [ [ -∞  ∞ ] ]


--COMPRESSION_LEVEL

Compression level for all compressed files created (e.g. BAM and VCF).

int  5  [ [ -∞  ∞ ] ]


--CREATE_INDEX

Whether to create an index when writing VCF or coordinate sorted BAM output.

Boolean  false


--CREATE_MD5_FILE

Whether to create an MD5 digest for any BAM or FASTQ files created.

boolean  false


--DESIGN_NAME

The name of the bait design

R String  null


--DESIGN_ON_TARGET_STRAND

If true design baits on the strand of the target feature, if false always design on the + strand of the genome.

boolean  false


--DESIGN_STRATEGY

The design strategy to use to layout baits across each target

The --DESIGN_STRATEGY argument is an enumerated type (DesignStrategy), which can have one of the following values:

CenteredConstrained
Implementation that "constrains" baits to be within the target region when possible.
FixedOffset
Design that places baits at fixed offsets over targets, allowing them to hang off the ends as dictated by the target size and offset.
Simple
Ultra simple bait design algorithm that just lays down baits starting at the target start position until either the bait start runs off the end of the target or the bait would run off the sequence

DesignStrategy  FixedOffset


--FILL_POOLS

If true, fill up the pools with alternating fwd and rc copies of all baits. Equal copies of all baits will always be maintained

boolean  true


--help / -h

display the help message

boolean  false


--LEFT_PRIMER

The left amplification primer to prepend to all baits for synthesis

String  ATCGCACCAGCGTGT


--MAX_RECORDS_IN_RAM

When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.

Integer  500000  [ [ -∞  ∞ ] ]


--MERGE_NEARBY_TARGETS

If true merge targets that are 'close enough' that designing against a merged target would be more efficient.

boolean  true


--MINIMUM_BAITS_PER_TARGET

The minimum number of baits to design per target.

int  2  [ [ -∞  ∞ ] ]


--OUTPUT_AGILENT_FILES

If true also output .design.txt files per pool with one line per bait sequence

boolean  true


--OUTPUT_DIRECTORY / -O

The output directory. If not provided then the DESIGN_NAME will be used as the output directory

File  null


--PADDING

Pad the input targets by this amount when designing baits. Padding is applied on both sides in this amount.

int  0  [ [ -∞  ∞ ] ]


--POOL_SIZE

The size of pools or arrays for synthesis. If no pool files are desired, can be set to 0.

int  55000  [ [ -∞  ∞ ] ]


--QUIET

Whether to suppress job-summary info on System.err.

Boolean  false


--REFERENCE_SEQUENCE / -R

Reference sequence file.

R PicardHtsPath  null


--REPEAT_TOLERANCE

Baits that have more than REPEAT_TOLERANCE soft or hard masked bases will not be allowed

int  50  [ [ -∞  ∞ ] ]


--RIGHT_PRIMER

The right amplification primer to prepend to all baits for synthesis

String  CACTGCGGCTCCTCA


--showHidden / -showHidden

display hidden arguments

boolean  false


--TARGETS / -T

The file with design parameters and targets

R File  null


--TMP_DIR

One or more directories with space available to be used by this program for temporary storage of working files

List[File]  []


--USE_JDK_DEFLATER / -use_jdk_deflater

Use the JDK Deflater instead of the Intel Deflater for writing compressed output

Boolean  false


--USE_JDK_INFLATER / -use_jdk_inflater

Use the JDK Inflater instead of the Intel Inflater for reading compressed input

Boolean  false


--VALIDATION_STRINGENCY

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:

STRICT
LENIENT
SILENT

ValidationStringency  STRICT


--VERBOSITY

Control verbosity of logging.

The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.6.2.0 built at Sun, 13 Apr 2025 13:21:43 -0400.