Showing tool doc from version 4.6.2.0 | The latest version is
4.6.2.0

DenoiseReadCounts

Denoises read counts to produce denoised copy ratios

Category Copy Number Variant Discovery


Overview

Denoises read counts to produce denoised copy ratios.

Typically, a panel of normals produced by {@link CreateReadCountPanelOfNormals} is provided as input. The input counts are then standardized by 1) transforming to fractional coverage, 2) performing optional explicit GC-bias correction (if the panel contains GC-content annotated intervals), 3) filtering intervals to those contained in the panel, 4) dividing by interval medians contained in the panel, 5) dividing by the sample median, and 6) transforming to log2 copy ratio. The result is then denoised by subtracting the projection onto the specified number of principal components from the panel.

If no panel is provided, then the input counts are instead standardized by 1) transforming to fractional coverage, 2) performing optional explicit GC-bias correction (if GC-content annotated intervals are provided), 3) dividing by the sample median, and 4) transforming to log2 copy ratio. No denoising is performed, so the denoised result is simply taken to be identical to the standardized result.

If performed, explicit GC-bias correction is done by {@link GCBiasCorrector}.

Note that {@code number-of-eigensamples} principal components from the input panel will be used for denoising; if only fewer are available in the panel, then they will all be used. This parameter can thus be used to control the amount of denoising, which will ultimately affect the sensitivity of the analysis.

See comments for {@link CreateReadCountPanelOfNormals} regarding coverage on sex chromosomes. If sex chromosomes are not excluded from coverage collection, it is strongly recommended that case samples are denoised only with panels containing only individuals of the same sex as the case samples.

Inputs

Outputs

Usage examples

     gatk DenoiseReadCounts \
          -I sample.counts.hdf5 \
          --count-panel-of-normals panel_of_normals.pon.hdf5 \
          --standardized-copy-ratios sample.standardizedCR.tsv \
          --denoised-copy-ratios sample.denoisedCR.tsv
 
     gatk DenoiseReadCounts \
          -I sample.counts.hdf5 \
          --annotated-intervals annotated_intervals.tsv \
          --standardized-copy-ratios sample.standardizedCR.tsv \
          --denoised-copy-ratios sample.denoisedCR.tsv
 
     gatk DenoiseReadCounts \
          -I sample.counts.hdf5 \
          --standardized-copy-ratios sample.standardizedCR.tsv \
          --denoised-copy-ratios sample.denoisedCR.tsv
 
@author Samuel Lee <slee@broadinstitute.org>

DenoiseReadCounts specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--denoised-copy-ratios
Output file for denoised copy ratios.
--input
 -I
Input TSV or HDF5 file containing integer read counts in genomic intervals for a single case sample (output of CollectReadCounts).
--standardized-copy-ratios
Output file for standardized copy ratios. GC-bias correction will be performed if annotations for GC content are provided.
Optional Tool Arguments
--annotated-intervals
Input file containing annotations for GC content in genomic intervals (output of AnnotateIntervals). Intervals must be identical to and in the same order as those in the input read-counts file. If a panel of normals is provided, this input will be ignored.
--arguments_file
read one or more arguments files and add them to the command line
--count-panel-of-normals
Input HDF5 file containing the panel of normals (output of CreateReadCountPanelOfNormals).
--gcs-max-retries
 -gcs-retries
20 If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.
--help
 -h
false display the help message
--number-of-eigensamples
Number of eigensamples to use for denoising. If not specified or if the number of eigensamples available in the panel of normals is smaller than this, all eigensamples will be used.
--version
false display the version number for this tool
Optional Common Arguments
--gatk-config-file
A configuration file to use with the GATK.
--QUIET
false Whether to suppress job-summary info on System.err.
--tmp-dir
Temp directory to use.
--use-jdk-deflater
 -jdk-deflater
false Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater
 -jdk-inflater
false Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--annotated-intervals

Input file containing annotations for GC content in genomic intervals (output of AnnotateIntervals). Intervals must be identical to and in the same order as those in the input read-counts file. If a panel of normals is provided, this input will be ignored.

File  null


--arguments_file

read one or more arguments files and add them to the command line

List[File]  []


--count-panel-of-normals

Input HDF5 file containing the panel of normals (output of CreateReadCountPanelOfNormals).

File  null


--denoised-copy-ratios

Output file for denoised copy ratios.

R File  null


--gatk-config-file

A configuration file to use with the GATK.

String  null


--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int  20  [ [ -∞  ∞ ] ]


--gcs-project-for-requester-pays

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.

String  ""


--help / -h

display the help message

boolean  false


--input / -I

Input TSV or HDF5 file containing integer read counts in genomic intervals for a single case sample (output of CollectReadCounts).

R File  null


--number-of-eigensamples

Number of eigensamples to use for denoising. If not specified or if the number of eigensamples available in the panel of normals is smaller than this, all eigensamples will be used.

Integer  null


--QUIET

Whether to suppress job-summary info on System.err.

Boolean  false


--showHidden / -showHidden

display hidden arguments

boolean  false


--standardized-copy-ratios

Output file for standardized copy ratios. GC-bias correction will be performed if annotations for GC content are provided.

R File  null


--tmp-dir

Temp directory to use.

GATKPath  null


--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean  false


--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean  false


--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.6.2.0 built at Sun, 13 Apr 2025 13:21:43 -0400.