Software used in the reports module

general_json_report

Convert input files to JSON for the general report.

🐍 Rule

rule general_json_report:
    input:
        files=[f'{filedef["input"]}' for filedef in general_report["files"]],
        output_files=config.get("general_report", {}),
    output:
        json="reports/general_json_report/{sample}_{type}.general.json",
    params:
        sample="{sample}_{type}",
        pipeline_version=pipeline_version,
        pipeline_name=pipeline_name,
        tc=get_tc_general_report,
        units=units,
        reference_genome=config.get("reference", {}).get("fasta", ""),
    log:
        "reports/general_json_report/{sample}_{type}.general_report.log",
    benchmark:
        repeat(
            "reports/general_json_report/{sample}_{type}.output.benchmark.tsv",
            config.get("general_json_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("general_json_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("general_json_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("general_json_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("general_json_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("general_json_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("general_json_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("general_json_report", {}).get("container", config["default_container"])
    message:
        "{rule}: generate general html report from json config"
    script:
        "../scripts/general_json_report.py"

↔ input / output files

Rule parameters Key Value Description
input files [f'{filedef["input"]}' for filedef in general_report["files"]] Files that should be compiled into JSON.
output_files config.get("general_report", {}) Path to yaml file with definitions of the input files.
output json "reports/general_json_report/{sample}_{type}.general.json" JSON file with data to be presented in the general report.

🔧 Configuration

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

general_html_report

Generate a general HTML report for a sample.

🐍 Rule

rule general_html_report:
    input:
        config_schema=workflow.source_path("../schemas/general_html_report_json.schema.yaml"),
        html_template=workflow.source_path("../templates/general_html_report/index.html"),
        json="reports/general_json_report/{sample}_{type}.general.json",
        css_files=[
            workflow.source_path("../templates/general_html_report/style.css"),
            workflow.source_path("../templates/assets/css/datatables.min.css"),
        ],
        js_files=[workflow.source_path("../templates/assets/js/datatables.min.js")],
        additional_json={},
    output:
        html="reports/general_html_report/{sample}_{type}.general_report.html",
    params:
        final_directory_depth=config.get("general_html_report", {}).get("final_directory_depth", 1),
        multiqc_config=config.get("general_html_report", {}).get("multiqc_config", ""),
        units=units,
        extra=config.get("general_html_report", {}).get("extra", ""),
    log:
        "reports/general_html_report/{sample}_{type}.general_report.log",
    benchmark:
        repeat(
            "reports/general_html_report/{sample}_{type}.output.benchmark.tsv",
            config.get("general_html_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("general_html_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("general_html_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("general_html_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("general_html_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("general_html_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("general_html_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("general_html_report", {}).get("container", config["default_container"])
    message:
        "{rule}: generate general html report from json config {input.json}"
    script:
        "../scripts/general_html_report.py"

↔ input / output files

Rule parameters Key Value Description
input config_schema workflow.source_path("../schemas/general_html_report_json.schema.yaml") Validation schema for the JSON input.
html_template workflow.source_path("../templates/general_html_report/index.html") HTML template that the report should be based on.
json "reports/general_json_report/{sample}_{type}.general.json" JSON that should be rendered in the report as produced by general_json_report.
css_files [ workflow.source_path("../templates/general_html_report/style.css"), workflow.source_path("../templates/assets/css/datatables.min.css"), ] CSS files that should be included in the template.
additional_json {} Additional JSON that should be included in the report.
output html "reports/general_html_report/{sample}_{type}.general_report.html" Interactive HTML report for a sample.

🔧 Configuration

Software settings (config.yaml)

Key Type Description
final_directory_depth integer How deep in the final results directory the report will be. This will be used to correctly resolve relative paths in the JSON config. For example, if the report is located in the directory results/reports, the depth would be 2.
multiqc_config string Path to multiqc config file in cases where you have custom content in the multiqc report or if you want to hide certain general statistics columns in the general report that are also hidden in the multiqc report.
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
extra string parameters that should be forwarded

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

cnv_html_report

Generate an HTML report for CNVs.

🐍 Rule

rule cnv_html_report:
    input:
        json="reports/cnv_html_report/{sample}_{type}.{tc_method}.merged.json",
        html_template=workflow.source_path("../templates/cnv_html_report/index.html"),
        js_files=[
            workflow.source_path("../templates/assets/js/d3.v7.min.js"),
            workflow.source_path("../templates/cnv_html_report/01-chromosome-plot.js"),
            workflow.source_path("../templates/cnv_html_report/02-genome-plot.js"),
            workflow.source_path("../templates/cnv_html_report/03-results-table.js"),
            workflow.source_path("../templates/cnv_html_report/04-window-summary.js"),
            workflow.source_path("../templates/cnv_html_report/05-main.js"),
        ],
        css_files=[
            workflow.source_path("../templates/assets/css/icons.css"),
            workflow.source_path("../templates/cnv_html_report/style.css"),
        ],
        tc_file=get_tc_file,
        extra_table_files=[t["path"] for t in config.get("cnv_html_report", {}).get("extra_tables", [])],
    output:
        html=temp("reports/cnv_html_report/{sample}_{type}.{tc_method}.cnv_report.html"),
    params:
        include_table=config.get("cnv_html_report", {}).get("show_table", True),
        extra_tables=config.get("cnv_html_report", {}).get("extra_tables", []),
        tc=get_tc,
        tc_method=lambda wildcards: wildcards.tc_method,
        include_cytobands=config.get("cnv_html_report", {}).get("cytobands", False),
    log:
        "reports/cnv_html_report/{sample}_{type}.{tc_method}.cnv_report.html.log",
    benchmark:
        repeat(
            "reports/cnv_html_report/{sample}_{type}.{tc_method}.cnv_report.html.benchmark.tsv",
            config.get("cnv_html_report", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("cnv_html_report", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("cnv_html_report", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("cnv_html_report", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("cnv_html_report", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("cnv_html_report", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("cnv_html_report", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("cnv_html_report", {}).get("container", config["default_container"])
    message:
        "{rule}: Compile a CNV HTML report for {wildcards.sample}_{wildcards.type}"
    script:
        "../scripts/cnv_html_report.py"

↔ input / output files

Rule parameters Key Value Description
input json "reports/cnv_html_report/{sample}_{type}.{tc_method}.merged.json" Merged JSON file with CNV calls and other sample information.
html_template workflow.source_path("../templates/cnv_html_report/index.html") Path to the html template
js_files [ workflow.source_path("../templates/assets/js/d3.v7.min.js"), workflow.source_path("../templates/cnv_html_report/01-chromosome-plot.js"), workflow.source_path("../templates/cnv_html_report/02-genome-plot.js"), workflow.source_path("../templates/cnv_html_report/03-results-table.js"), workflow.source_path("../templates/cnv_html_report/04-window-summary.js"), workflow.source_path("../templates/cnv_html_report/05-main.js"), ] List of javascript files that should be included in the report. The order of the files is significant. If you have dependencies between the files, you need to supply the dependencies before the script(s) that depends on them.
css_files [ workflow.source_path("../templates/assets/css/icons.css"), workflow.source_path("../templates/cnv_html_report/style.css"), ] List of css files that should be included in the report. Files are included in the order given.
tc_file get_tc_file Path to a text file containing the tumor cell content estimated by the method tc_method.
output html "reports/cnv_html_report/{sample}_{type}.{tc_method}.cnv_report.html" Interactive HTML report for CNVs.

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
cytobands boolean Whether or not to add cytoband information to the plots and the CNV table.
show_table boolean Whether or not to display a table of called CNVs in the report. If this is true, then the attributes filtered_cnv_vcfs and unfiltered_cnv_vcfs under merge_cnv_json are required.
extra_tables array Additional tables that should be added to the report. The tables will be based on the columns of the TSV file listed, and column names are required and assumed to be present.

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

cnv_json

Convert CNV results from a particular CNV caller to JSON that is compatible with the final report.

🐍 Rule

rule cnv_json:
    input:
        ratios=get_cnv_ratios,
        segments=get_cnv_segments,
    output:
        json=temp("reports/cnv_html_report/{sample}_{type}.{caller}.{tc_method}.json"),
    params:
        skip_chromosomes=config.get("reference", {}).get("skip_chrs"),
    log:
        "reports/cnv_html_report/{sample}_{type}.{caller}.{tc_method}.json.log",
    benchmark:
        repeat(
            "reports/cnv_html_report/{sample}_{type}.{caller}.{tc_method}.json.benchmark.tsv",
            config.get("cnv_json", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("cnv_json", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("cnv_json", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("cnv_json", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("cnv_json", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("cnv_json", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("cnv_json", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("cnv_json", {}).get("container", config["default_container"])
    message:
        "{rule}: Create JSON representation for CNV results from {wildcards.caller} "
        "for {wildcards.sample}_{wildcards.type}"
    script:
        "../scripts/cnv_json.py"

↔ input / output files

Rule parameters Key Value Description
input ratios get_cnv_ratios Path to a file with log2 ratios for a specific caller. Determined by an input function that returns a path to the correct file based on which caller was used.
segments get_cnv_segments Path to a file with CNV segments for a specific caller. Determined by an input function that returns a path to the correct file based on which caller was used.
output json "reports/cnv_html_report/{sample}_{type}.{caller}.{tc_method}.json" A JSON representation of the CNV results from a specific caller.

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time

merge_cnv_json

Merge JSON files from multiple CNV callers and add annotations and other sample specific data.

🐍 Rule

rule merge_cnv_json:
    input:
        json=get_json_for_merge_cnv_json,
        fai=config.get("reference", {}).get("fai", ""),
        annotation_bed=config.get("merge_cnv_json", {}).get("annotations", []),
        germline_vcf=get_germline_vcf,
        filtered_cnv_vcfs=get_filtered_cnv_vcf,
        cnv_vcfs=get_unfiltered_cnv_vcf,
        cytobands=config.get("merge_cnv_json", {}).get("cytobands", []),
    output:
        json=temp("reports/cnv_html_report/{sample}_{type}.{tc_method}.merged.json"),
    params:
        skip_chromosomes=config.get("reference", {}).get("skip_chrs", []),
        cytobands=config.get("cnv_html_report", {}).get("cytobands", False),
    log:
        "reports/cnv_html_report/{sample}_{type}.{tc_method}.merged.json.log",
    benchmark:
        repeat(
            "reports/cnv_html_report/{sample}_{type}.{tc_method}.merged.json.benchmark.tsv",
            config.get("merge_cnv_json", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("merge_cnv_json", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("merge_cnv_json", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("merge_cnv_json", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("merge_cnv_json", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("merge_cnv_json", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("merge_cnv_json", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("merge_cnv_json", {}).get("container", config["default_container"])
    message:
        "{rule}: Merge CNV JSON data for {wildcards.sample}_{wildcards.type}"
    script:
        "../scripts/merge_cnv_json.py"

↔ input / output files

Rule parameters Key Value Description
input json get_json_for_merge_cnv_json One or more JSON files containing CNV information, one file per caller. Supplied by input function that determines the paths based on the included callers, and the callers that should be included is determined by the svdb_merge config. The paths to the files are based on the cnv_sv module.
fai config.get("reference", {}).get("fai", "") Reference genome FASTA index.
annotation_bed config.get("merge_cnv_json", {}).get("annotations", []) Zero or more BED files with regions that should be annotated in the chromosome plot.
germline_vcf get_germline_vcf Optional VCF file with germline variants. Supplied by input function that takes it from the config.
filtered_cnv_vcfs get_filtered_cnv_vcf Zero or more VCF files containing filtered CNV calls that should be displayed in the results table. Supplied by input function that takes these from the config.
cnv_vcfs get_unfiltered_cnv_vcf Zero or more VCF files containing unfiltered CNV calls that should be displayed in the results table. Supplied by input function that takes these from the config.
cytobands config.get("merge_cnv_json", {}).get("cytobands", []) Optional path to a file with cytoband definitions. This file should conform to the UCSC schema for cytobands. See https://www.genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=map&hgta_track=cytoBand&hgta_table=cytoBand&hgta_doSchema=describe+table+schema
output json "reports/cnv_html_report/{sample}_{type}.{tc_method}.merged.json" Merged JSON file with CNV calls and other sample information that is ready to be included in the final report.

🔧 Configuration

Software settings (config.yaml)

Key Type Description
benchmark_repeats integer set number of times benchmark should be repeated
container string name or path to docker/singularity container
annotations array List of BED files with custom annotations.
cytoband_config object Cytoband visualisation configuration.
cytobands string Tab-separated file containing cytoband information. The file should have five columns: chromosome name, start position (0-based), end position (exclusive), name, and Giemsa stain result. The Giemsa stain results should match with the config under cnv_html_report.cytoband_config
filtered_cnv_vcfs array VCF files containing filtered CNV calls that should be displayed in the results table. Will only have an effect if show_table is true, and if show_table is true, this attribute is required.
germline_vcf string Path to a germline VCF file that will be used to display VAFs in the plots of the report. The path supports the wildcards sample and type wildcards.
unfiltered_cnv_vcfs array VCF files containing unfiltered CNV calls that should be displayed in the results table. Will only have an effect if show_table is true, and if show_table is true, this attribute is required.

Resources settings (resources.yaml)

Key Type Description
mem_mb integer max memory in MB to be available
mem_per_cpu integer memory in MB used per cpu
partition string partition to use on cluster
threads integer number of threads to be available
time string max execution time