Skip to content

Commit

Permalink
Merge pull request #112 from sanger-tol/dev
Browse files Browse the repository at this point in the history
Release 0.6
  • Loading branch information
muffato authored Sep 13, 2024
2 parents 0a5b28b + ca10b9b commit 86ee868
Show file tree
Hide file tree
Showing 27 changed files with 733 additions and 234 deletions.
1 change: 1 addition & 0 deletions .github/workflows/sanger_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ jobs:
parameters: |
{
"outdir": "${{ secrets.TOWER_WORKDIR_PARENT }}/results/${{ github.repository }}/results-${{ env.REVISION }}",
"use_work_dir_as_temp": true,
}
profiles: test,sanger,singularity,cleanup
- uses: actions/upload-artifact@v3
Expand Down
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,41 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[0.6.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.6.0)] – Bellsprout – [2024-09-13]

The pipeline has now been validated for draft (unpublished) assemblies.

- The pipeline now queries the NCBI database instead of GoaT to establish the
taxonomic classification of the species and the relevant Busco lineages.
In case the taxon_id is not found, the pipeline falls back to GoaT, which
is aware of upcoming taxon_ids in ENA.
- New `--busco_lineages` parameter to choose specific Busco lineages instead of
automatically selecting based on the taxonomy.
- All parameters are now passed the regular Nextflow way. There is no support
for the original Yaml configuration files of the Snakemake version.
- New option `--skip_taxon_filtering` to skip the taxon filtering in blast searches.
Mostly relevant for draft assemblies.
- Introduced the `--use_work_dir_as_temp` parameter to avoid leaving files in `/tmp`.

### Parameters

| Old parameter | New parameter |
| ------------- | ---------------------- |
| --yaml | |
| | --busco_lineages |
| | --skip_taxon_filtering |
| | --use_work_dir_as_temp |

> **NB:** Parameter has been **updated** if both old and new parameter information is present. </br> **NB:** Parameter has been **added** if just the new parameter information is present. </br> **NB:** Parameter has been **removed** if new parameter information isn't present.
### Software dependencies

Note, since the pipeline is using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only `Docker` or `Singularity` containers are supported, `conda` is not supported.

| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| goat | 0.2.5 | |

## [[0.5.1](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.5.1)] – Snorlax (patch 1) – [2024-08-22]

### Enhancements & fixes
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ It takes a samplesheet of BAM/CRAM/FASTQ/FASTA files as input, calculates genome

1. Calculate genome statistics in windows ([`fastawindows`](https://github.com/tolkit/fasta_windows))
2. Calculate Coverage ([`blobtk/depth`](https://github.com/blobtoolkit/blobtk))
3. Fetch associated BUSCO lineages ([`goat/taxonsearch`](https://github.com/genomehubs/goat-cli))
3. Determine the appropriate BUSCO lineages from the taxonomy.
4. Run BUSCO ([`busco`](https://busco.ezlab.org/))
5. Extract BUSCO genes ([`blobtoolkit/extractbuscos`](https://github.com/blobtoolkit/blobtoolkit))
6. Run Diamond BLASTp against extracted BUSCO genes ([`diamond/blastp`](https://github.com/bbuchfink/diamond))
Expand Down
166 changes: 166 additions & 0 deletions assets/mapping_taxids-busco_dataset_name.2019-12-16.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
422676 aconoidasida
7898 actinopterygii
5338 agaricales
155619 agaricomycetes
33630 alveolata
5794 apicomplexa
6854 arachnida
6656 arthropoda
4890 ascomycota
8782 aves
5204 basidiomycota
68889 boletales
3699 brassicales
134362 capnodiales
33554 carnivora
91561 cetartiodactyla
34395 chaetothyriales
3041 chlorophyta
5796 coccidia
28738 cyprinodontiformes
7147 diptera
147541 dothideomycetes
3193 embryophyta
33392 endopterygota
314146 euarchontoglires
33682 euglenozoa
2759 eukaryota
5042 eurotiales
147545 eurotiomycetes
9347 eutheria
72025 fabales
4751 fungi
314147 glires
1028384 glomerellales
5178 helotiales
7524 hemiptera
7399 hymenoptera
5125 hypocreales
50557 insecta
314145 laurasiatheria
147548 leotiomycetes
7088 lepidoptera
4447 liliopsida
40674 mammalia
33208 metazoa
6029 microsporidia
6447 mollusca
4827 mucorales
1913637 mucoromycota
6231 nematoda
33183 onygenales
9126 passeriformes
5820 plasmodium
92860 pleosporales
38820 poales
5303 polyporales
9443 primates
4891 saccharomycetes
8457 sauropsida
4069 solanales
147550 sordariomycetes
33634 stramenopiles
32523 tetrapoda
155616 tremellomycetes
7742 vertebrata
33090 viridiplantae
71240 eudicots
57723 acidobacteria
201174 actinobacteria_phylum
1760 actinobacteria_class
28211 alphaproteobacteria
135622 alteromonadales
200783 aquificae
1385 bacillales
91061 bacilli
2 bacteria
171549 bacteroidales
976 bacteroidetes
68336 bacteroidetes-chlorobi_group
200643 bacteroidia
28216 betaproteobacteria
80840 burkholderiales
213849 campylobacterales
1706369 cellvibrionales
204428 chlamydiae
1090 chlorobi
200795 chloroflexi
135613 chromatiales
1118 chroococcales
186801 clostridia
186802 clostridiales
84999 coriobacteriales
84998 coriobacteriia
85007 corynebacteriales
1117 cyanobacteria
768507 cytophagales
768503 cytophagia
68525 delta-epsilon-subdivisions
28221 deltaproteobacteria
213118 desulfobacterales
213115 desulfovibrionales
69541 desulfuromonadales
91347 enterobacterales
186328 entomoplasmatales
29547 epsilonproteobacteria
1239 firmicutes
200644 flavobacteriales
117743 flavobacteriia
32066 fusobacteria
203491 fusobacteriales
1236 gammaproteobacteria
186826 lactobacillales
118969 legionellales
85006 micrococcales
31969 mollicutes
2085 mycoplasmatales
206351 neisseriales
32003 nitrosomonadales
1161 nostocales
135619 oceanospirillales
1150 oscillatoriales
135625 pasteurellales
203682 planctomycetes
85009 propionibacteriales
1224 proteobacteria
72274 pseudomonadales
356 rhizobiales
227290 rhizobium-agrobacterium_group
204455 rhodobacterales
204441 rhodospirillales
766 rickettsiales
909929 selenomonadales
117747 sphingobacteriia
204457 sphingomonadales
136 spirochaetales
203691 spirochaetes
203692 spirochaetia
85011 streptomycetales
85012 streptosporangiales
1890424 synechococcales
508458 synergistetes
544448 tenericutes
68295 thermoanaerobacterales
200918 thermotogae
72273 thiotrichales
1737405 tissierellales
1737404 tissierellia
74201 verrucomicrobia
135623 vibrionales
135614 xanthomonadales
2157 archaea
2266 thermoproteales
2281 sulfolobales
114380 desulfurococcales
183967 thermoplasmata
651137 thaumarchaeota
2182 methanococcales
2191 methanomicrobiales
183925 methanobacteria
183924 thermoprotei
2235 halobacteriales
1644060 natrialbales
224756 methanomicrobia
1644055 haloferacales
183963 halobacteria
28890 euryarchaeota
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit 86ee868

Please sign in to comment.