Skip to content
Caleb Lareau edited this page Oct 30, 2019 · 5 revisions

Frequently asked questions

Does bap work with fastq files?

Generally no. We wrote the bap-barcode module to de-barcode data from the dscATAC and dsciATAC-seq platforms, which can be explored here. Otherwise, the input to bap should generally be bam files that contain a SAM tag that specifies the cell / barcode and with all reads present (i.e. duplicates should ideally be present). For this, we recommend running the pre-processing tools from other workflows (e.g. CellRanger-ATAC).

What's going on with bap versus bap2?

In terms of user experience, very little is different, but we absolutely recommend running bap2 once this module is installed. Both modules perform the same essential steps to nominate abundant barcodes, identify barcode multiplets, and then merge corresponding barcodes. The majority of the differences are in internal data structures. These updates enable bap2 to be 1) significantly faster 2) handle 10X scATAC data as input and 3) automatically produce fragment files for resulting datasets.

Unless you have a very specific use case of needing to reproduce results from this paper, use bap2.

In other words, when you execute bap, an older version of the software (which we kept around for legacy reasons) will be executed but will be sub-optimal in terms of time, memory requirements, and output files.

When you execute bap2 the most recent CLI of the software will be utilized and will provide the best user experience. We are also actively maintaining bap2 and will respond to user issues in this module.

What is bap-atac?

The actual Python package name is bap-atac, which was established for hosting on PyPi since bap is a common name and already taken.

Why isn't there a CB tag in the default workflow?

This is due to philosophical differences in terms of what oligonucleotide barcode represents. In the 10X standard, CB is generally equated to be a cell barcode. Since a major component of bap is that we've now shown 1 barcode =/= 1 cell, we adopt a different tag logic. Namely, XB = 1 observed oligonucleotide barcode and DB = 1 inferred droplet barcode. In other words, DB represents the compilation of merged barcodes after they've been identified to be barcode multiplets.

In order to make the software work with 10X scATAC-seq data (see here), one must minimally specify -bt CB that indicates that the oligonucleotide tag is indicated in the CB SAM tag (assuming a default execution from the CellRanger pipeline).