molamola auto-detects the plot type from the VCF header, so the same command shape works for both SV and compound-het VCFs:
molamola --vcf path/to/sample.vcf --out reports/
open reports/sample.report.html # SV mode
open reports/sample.compound_het.report.html # compound-het mode
--out DIR is required — molamola refuses rather than silently writing the report next to the input file. The directory is created if it does not exist.
molamola --vcf sample.sniffles.vcf --out reports/
molamola --vcf sample.t2t.sniffles.vcf --reference t2t --out reports/
The bundled T2T cytoband (molamola/data/cytoBand.t2t.txt.gz) is selected automatically. The acrocentric noise flag flips off for T2T (those p-arms are real sequence on T2T-CHM13v2.0).
molamola --vcf sample.sniffles.vcf --out reports/ \
--focus chr7:57716411 --focus-window 10000
Output filenames get a .focus_<chr>_<pos> tag so focused renders don’t overwrite genome-wide ones. ISCN labels (e.g. t(7;17)(q11.23;q12)) print to stdout for each matched event.
molamola --vcf sample.sniffles.vcf --focus chr7:q11.23 --out reports/
The second part of --focus can be either a position (chr7:57716411) or an ISCN cytoband (chr7:q11.23). Bands can also be specified as a prefix (chr1:p36) to match every sub-band on the chromosome whose name starts with that prefix.
--cov-ratio defaults to auto, computed as max(2.0, p99 of the in-sample max-coverage / median-coverage distribution). Pass a number (int or float) to fix the threshold instead — useful for cross-sample consistency:
molamola --vcf sample.sniffles.vcf --cov-ratio 2.5 --cov-vaf-max 0.45 \
--out reports/
molamola --vcf sample.sniffles.vcf --bin-size 5000000 --out reports/
5 Mb bins instead of 1 Mb — smoother visual, less detail.
--caller auto (default) detects the caller via INFO-field fingerprinting. Override when the fingerprint is ambiguous (bcftools-merged or re-headered VCFs):
molamola --vcf sample.cutesv.vcf --caller cutesv --out reports/
Compound-het mode needs both phasing (PS FORMAT field) and VEP annotation (CSQ INFO field). A raw phased small-variant VCF — e.g. straight Clair3 output, even with --use_whatshap_for_final_output_phasing — has the first but not the second, and molamola will refuse it. Annotate with Ensembl VEP first.
1. Download the matching VEP cache once (one-time, ~20 GB), on a machine with internet:
wget https://ftp.ensembl.org/pub/release-105/variation/indexed_vep_cache/homo_sapiens_vep_105_GRCh38.tar.gz
If your VEP isn’t 105, swap 105 for your release number (it appears twice in the URL); the cache release must match the VEP release exactly. Transfer to wherever you run VEP if that’s a different machine.
2. Unpack into a stable cache directory:
mkdir -p VEP_cache && cd VEP_cache
tar -xzf ../homo_sapiens_vep_105_GRCh38.tar.gz
# creates VEP_cache/homo_sapiens/105_GRCh38/
3. Run VEP fully offline, with --canonical --symbol --pick so the CSQ shape matches what molamola consumes:
vep --input_file sample.phased.vcf \
--output_file sample.phased.vep.vcf \
--vcf --offline \
--cache --dir_cache /path/to/VEP_cache \
--assembly GRCh38 \
--fasta /path/to/hg38.fa \
--canonical --symbol --pick \
--force_overwrite
Then molamola --vcf sample.phased.vep.vcf --out reports/ picks it up as compound-het mode.
Notes on VEP. VEP is third-party software (Ensembl); molamola does not bundle or wrap it. The cache release and VEP binary release must match exactly — a mismatch leads to silent mis-annotation rather than a clean error. Compound-het mode reads VEP’s Consequence, SYMBOL, and CANONICAL fields as-is; any quirks of a particular VEP build are inherited. --pick reduces multi-transcript CSQ entries to one per variant; without it molamola will pick CANONICAL == YES itself, which is fine but slower on large VCFs.
molamola --vcf sample.phased.vep.vcf.gz --gene NEB --out reports/
The panel always plots regardless of variant count. If --gene FOO lands in a gene with zero phased het variants, the report still gets a placeholder section so multi-gene runs (--gene A --gene B --gene C) don’t silently lose entries.
molamola --vcf sample.phased.vep.vcf.gz \
--gene NEB --gene LDLR --gene CFTR --out reports/
Drop --gene to let molamola surface candidate genes:
molamola --vcf sample.phased.vep.vcf.gz --out reports/
The HTML splits the result into two clearly-labelled sections:
The strict heading is shown even when its subset is empty so the dichotomy is always visible.
Tune via --min-pair-count N (raise for stricter sweeps) and --max-genes N (default 50). Conflicting+conflicting and no-ClinVar+no-ClinVar pairs are excluded from the sweep — use explicit --gene to plot those one-off.
The bundled ClinVar (molamola/data/clinvar.hg38.tsv.xz, ~13 MB) is dated; pass a fresher copy with --clinvar:
# Reduced TSV (smallest)
molamola --vcf sample.phased.vep.vcf.gz --out reports/ \
--clinvar /path/to/clinvar.hg38.tsv.xz
# Or NCBI's raw VCF (190+ MB) — auto-detected by extension
molamola --vcf sample.phased.vep.vcf.gz --out reports/ \
--clinvar /path/to/clinvar.vcf.gz
The release date of whichever file is loaded is logged in the HTML report’s run-metadata.
molamola --vcf sample.phased.vep.vcf.gz --out reports/ \
--canonical-exons /path/to/custom_exons.tsv.gz
Schema: gene_symbol\tchrom\tstart\tend\tstrand\ttranscript_id\texon_starts\texon_ends (gzipped TSV; comma-separated exon coords; 0-based half-open).
./summarize_bnds.sh path/to/sample.sniffles.vcf
Reports SV-type counts, BND filter distribution, intra/inter-chromosomal split, and the top 10 BND chromosome pairs.