Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
alexomics authored Feb 25, 2021
1 parent be9c448 commit 97fddaf
Showing 1 changed file with 32 additions and 21 deletions.
53 changes: 32 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,49 @@
readpaf
=======
Scripts for reading minimap2 PAF files
[![Build](https://github.com/alexomics/read-paf/actions/workflows/main.yml/badge.svg)](https://github.com/alexomics/read-paf/actions/workflows/main.yml)
[![PyPI](https://img.shields.io/pypi/v/readpaf)](https://pypi.org/p/readpaf)

Installation
===
readpaf is a fast parser for [minimap2](https://github.com/lh3/minimap2) PAF (**P**airwise m**A**pping **F**ormat) files. It is
pure python with no dependencies (unless you want a DataFrame).

readpaf is contained in a single module so can be installed via PyPI:

Installation
===
```bash
pip install readpaf
```

inlcuding pandas:

```bash
pip install readpaf[pandas]
```

Or [readpaf.py](readpaf.py) can be directly downloaded like so
<details>
<summary>Other install methods</summary>

### Install with pandas:
This is only needed if you want to manipulate the PAF file as a `pandas.DataFrame`

using cURL
```bash
pip install readpaf[pandas]
```

```bash
curl -O https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
```
### Direct download:
using cURL

or wget
```bash
curl -O https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
```

```bash
wget https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
or wget

```
```bash
wget https://raw.githubusercontent.com/alexomics/read-paf/main/readpaf.py
```
</details>

Usage
===

readpaf only has one user function, `parse_paf` that accepts of file-like object; this
is any object in python that has a file-oriented API (`sys.stdin`, `stdout` from subprocess,
`io.StringIO`, open files from `gzip` or `open`).

The following script demonstrates how minimap2 output can be piped into read-paf

```python
Expand All @@ -43,7 +52,6 @@ from sys import stdin

for record in parse_paf(stdin):
print(record.query_name, record.target_name)

```

read-paf can also generate a pandas DataFrame:
Expand Down Expand Up @@ -81,7 +89,10 @@ Parameters:

If used as an iterator, then each object returned is a named tuple representing a single line in the PAF file.
Each named tuple has field names as specified by the `fields` parameter. The SAM-like tags are converted into
their correct types and stored in a dictionary.
their correct types and stored in a dictionary. When `print` or `str` are called on `PAF` record (named tuple)
a formated PAF string is returned, which is useful for writing records to a file. The `PAF` record also has a
method `blast_identity` which calculates the [blast identity](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity) for
that record.

If used to generate a pandas DataFrame, then each row represents a line in the PAF file and the SAM-like tags
are expanded into individual series.

0 comments on commit 97fddaf

Please sign in to comment.