- Sort and cap reference sequences by confidence (and other seq_info attributes) in blast db(s) (GL: 38)
- Redownload records that have an updated modified_date to pick up changes in pubmed_id (GL: 17)
- Do not re-download vsearch filtered sequences if training set has not been updated (GL: 44)
- Move data/ignore.txt to /molmicro/lists dir and settings.conf and change file name to do_not_download.txt (GL: 47)
- Include copy of /molmicro/common/uw/taxonomy/taxdmp.zip in dated output dir (GL: 48)
- Update record tax_ids using accession2taxid from ncbi ftp site (GL: 54)
- Create a "Trusted" list to compliment "Do Not Trust List" (GL: 57)
- New dash based filter outlier plots (GL: 37)
- Inclusion of tm7 Candidatus Saccharibacteria records (GL: 36)
- Updated medirect to fix "Too Many Requests" issue with NCBI (GL: 42)
- New api-key feature to increase the number of NCBI reqs/sec to 10 (GL: 43)
- all paths (binaries, image locations, database credentials, etc) will be defined in a user defined file paths.conf (GH: 8, 9)
- species only records to be paritioned (GH: 13)
- new filtered/trusted/types directory (GH: 14)
- new filtered/details_out.feather, [named, named/trusted, types, types/trusted]/lineages.[csv, txt] outputs (GH: 5)
- new folder structure dedup -> 1200bp -> named -> filtered -> trusted
- deduplicated refseq and original sequences, preferring the refseq sequence (GH: 20)
- sequences determined obsolete by ncbi via esearch query are removed from database (GH: 13)
- type strain sequences in filter_outliers plot will not align to themselves as nearest type strain (but might if another allele of itself is closest match) (GH: 29)
- using JUST ncbi sequence_from_type filter to mark type strain sequences (GH: 14)
- imported mkrefpkg SConstruct-ncbi