Skip to content

Benchmark data

Luca Santuari edited this page Mar 11, 2020 · 20 revisions

For cell line data, we already included in sv-callers the GiaB sample NA12878/HG001, defined on the IGSR portal.

We want to include two additional samples:

  1. NA24385/HG002 from GiaB

  2. The synthetic diploid sample CHM1_CHM13 derived from two complete hydatidiform mole (CHM) cell lines: CHM1 and CHM13


BAM files

  • HG002 2x250 bp paired end reads mapped on the hs37d5 reference sequence. This is the BAM file that is used in the HiFi (PacBio CCS reads) publication.
  • HG002 2x148 bp paired end reads (README) mapped on the hs37d5 reference sequence. This is the BAM file that is used in the Cameron2019 benchmark (Methods, section "Cell line evaluation").

Truth sets:


BAM files

Two BAM files from the study are available at the ENA Project PRJEB13208. For these BAM files, the GRCh37 reference genome was used. See section "Calling SNPs and short indels from Illumina data" of the publication. There are two sequencing libraries: CHM1_CHM13_2 (ERR1341796) and CHM1_CHM13_3 (ERR1341793)

Truth sets

  • nstd137 relative to GRCh37 and to GRCh38 published here. This is the CHM1_CHM13 truth set relative to the GRCh38 reference genome that is used in the Cameron2019 benchmark.
Clone this wiki locally