Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsequence counting #722

Open
terrycojones opened this issue Mar 10, 2020 · 1 comment
Open

Subsequence counting #722

terrycojones opened this issue Mar 10, 2020 · 1 comment
Assignees

Comments

@terrycojones
Copy link
Member

  1. Add a utils function to extract subsequence counts from a string. It will take a start, stop, window, step offset option (all could be optional, or window could have a default of 15 or whatever), and a sequence string. It will return a Counter (https://docs.python.org/2/library/collections.html)
  2. You know number 2, right? :-)
  3. Add a script in bin called fasta-extract-subsequences.py (or something like that). Model the script on fasta-ids.py (in terms of reading its input, allowing us to pipe FASTA into it on stdin). It can just print the subsequences and their counts for now. It could optionally sort them (reversed) by count, or not print the counts. But those things can also just be done by external tools like sort, cut, etc. Hint: Counters can be added!
  4. Update the version number in dark/__init__.py, the CHANGELOG, and add the script's name to setup.py.

This might be of use if we want to provide that woman who Victor brought by yesterday with a list of all peptides from all Wuhan sequences.

@terrycojones
Copy link
Member Author

The script could have a --translate option, to convert DNA to AA. Note the translations function on dark/DNARead (but you only want the first translation that comes back).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants