-
Notifications
You must be signed in to change notification settings - Fork 8
FAQ
Generally no. We wrote the bap-barcode
module to de-barcode data from the dscATAC and dsciATAC-seq platforms, which can be explored here. Otherwise, the input to bap
should generally be bam files that contain a SAM tag that specifies the cell / barcode and with all reads present (i.e. duplicates should ideally be present). For this, we recommend running the pre-processing tools from other workflows (e.g. CellRanger-ATAC).
In terms of user experience, very little is different, but we absolutely recommend running bap2
once this module is installed. Both modules perform the same essential steps to nominate abundant barcodes, identify barcode multiplets, and then merge corresponding barcodes. The majority of the differences are in internal data structures. These updates enable bap2
to be 1) significantly faster 2) handle 10X scATAC data as input and 3) automatically produce fragment files for resulting datasets.
Unless you have a very specific use case of needing to reproduce results from this paper, use bap2
.
In other words, when you execute bap
, an older version of the software (which we kept around for legacy reasons) will be executed but will be sub-optimal in terms of time, memory requirements, and output files.
When you execute bap2
the most recent CLI of the software will be utilized and will provide the best user experience. We are also actively maintaining bap2
and will respond to user issues in this module.
The actual Python package name is bap-atac
, which was established for hosting on PyPi since bap
is a common name and already taken.
This is due to philosophical differences in terms of what oligonucleotide barcode represents. In the 10X standard, CB
is generally equated to be a cell barcode. Since a major component of bap
is that we've now shown 1 barcode =/= 1 cell, we adopt a different tag logic. Namely, XB = 1 observed oligonucleotide barcode
and DB = 1 inferred droplet barcode
. In other words, DB
represents the compilation of merged barcodes after they've been identified to be barcode multiplets.
In order to make the software work with 10X scATAC-seq data (see here), one must minimally specify -bt CB
that indicates that the oligonucleotide tag is indicated in the CB
SAM tag (assuming a default execution from the CellRanger pipeline).
Please raise an issue here