Skip to content

Commit

Permalink
Merge pull request #239 from BU-ISCIII/develop
Browse files Browse the repository at this point in the history
Release 2.1.0 from develop.
  • Loading branch information
saramonzon authored Apr 19, 2024
2 parents d95b738 + 44e0bad commit 0fdd4a1
Show file tree
Hide file tree
Showing 30 changed files with 299 additions and 55 deletions.
48 changes: 46 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.1.0dev] - 2024-0X-0X : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.1.X
## [2.2.Xdev] - 2024-0X-XX : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.2.X

### Credits

Code contributions to the hotfix:
Code contributions to the new version:

### Template fixes and updates

Expand Down Expand Up @@ -44,6 +44,50 @@ Code contributions to the hotfix:

### Requirements

## [2.1.0] - 2024-04-19 : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.1.0

### Credits

Code contributions to the new version:
- [Sarai Varona](https://github.com/svarona)
- [Pablo Mata](https://github.com/Shettland)
- [Daniel Valle](https://github.com/Daniel-VM)

### Template fixes and updates

- Added blast_nt template to services.json [#208](https://github.com/BU-ISCIII/buisciii-tools/pull/208)
- Included new user to sftp_user.json
- Included a missing sed inside IRMA's 04-irma/lablog [#213](https://github.com/BU-ISCIII/buisciii-tools/pull/213)
- Changed singularity mount options in Viralrecon template to fix errors with Nextflow v23.10.0
- excel_generator.py reverted to last state, now lineage tables are merged when argument -l is given
- Adapted viralrecon_results lablog to new excel_generator.py argument
- IRMA/RESULTS now creates a summary of the different types of flu found in irma_stats.txt
- Updated IRMA to v1.1.4 date 02-2024 and reduced threads to 16
- IRMA 04-irma/lablog now creates B and C dirs only if those flu-types are present
- Fixed characterization template [#220](https://github.com/BU-ISCIII/buisciii-tools/pull/220)
- Created Chewbbaca template [#230](https://github.com/BU-ISCIII/buisciii-tools/pull/230)

### Modules

#### Added enhancements

- [#207](https://github.com/BU-ISCIII/buisciii-tools/pull/207) - Bioinfo-doc updates: email password can be given in buisciii_config.yml and delivery notes in a text file

#### Fixes

- Added missing url for service assembly_annotation in module list
- Autoclean-sftp refined folder name parsing with regex label adjustment
- Autoclean_sftp does not crash anymore. New argument from 'utils.prompt_yn_question()' in v2.0.0 was missing: 'dflt'
- Bioinfo-doc now sends email correctly to multiple CCs

#### Changed

#### Removed

- Removed empty strings from services.json

### Requirements

## [2.0.0] - 2024-03-01 : https://github.com/BU-ISCIII/buisciii-tools/releases/tag/2.0.0

### Credits
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Output:
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Service name ┃ Description ┃ Github ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ assembly_annotation │ Nextflow assembly pipeline to assemble │
│ assembly_annotation │ Nextflow assembly pipeline to assemble │ https://github.com/Daniel-VM/bacass/...
│ │ bacterial genomes │ │
│ mtbseq_assembly │ Mycobacterium tuberculosis mapping, │ https://github.com/ngs-fzb/MTBseq_source │
│ │ variant calling and detection of │ │
Expand Down
6 changes: 4 additions & 2 deletions bu_isciii/__main__.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def run_bu_isciii():
)

# stderr.print("[green] `._,._,'\n", highlight=False)
__version__ = "1.0.1"
__version__ = "2.0.0"
stderr.print(
"[grey39] BU-ISCIII-tools version {}".format(__version__), highlight=False
)
Expand Down Expand Up @@ -507,6 +507,7 @@ def bioinfo_doc(
"""
Create the folder documentation structure in bioinfo_doc server
"""
email_pass = email_psswd if email_psswd else ctx.obj.get("email_password")
new_doc = bu_isciii.bioinfo_doc.BioinfoDoc(
type,
resolution,
Expand All @@ -517,7 +518,7 @@ def bioinfo_doc(
results_md,
ctx.obj["api_user"],
ctx.obj["api_password"],
email_psswd,
email_pass,
)
new_doc.create_documentation()

Expand Down Expand Up @@ -564,6 +565,7 @@ def bioinfo_doc(
default=None,
help="Tsv output path + filename with archive stats and info",
)
@click.pass_context
def archive(
ctx,
service_id,
Expand Down
10 changes: 7 additions & 3 deletions bu_isciii/autoclean_sftp.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,9 @@ class AutoremoveSftpService:
def __init__(self, path=None, days=14):
# Parse input path
if path is None:
use_default = bu_isciii.utils.prompt_yn_question("Use default path?: ")
use_default = bu_isciii.utils.prompt_yn_question(
"Use default path?: ", dflt=False
)
if use_default:
data_path = bu_isciii.config_json.ConfigJson().get_configuration(
"global"
Expand Down Expand Up @@ -107,7 +109,7 @@ def check_path_exists(self):
def get_sftp_services(self):
self.sftp_services = {} # {sftp-service_path : last_update}
service_pattern = (
r"^[SRV][A-Z]+[0-9]+_\d{8}_[A-Z0-9]+_[a-zA-Z]+(?:\.[a-zA-Z]+)?_[a-zA-Z]$"
r"^[SRV][A-Z]+[0-9]+_\d{8}_[A-Z0-9.-]+_[a-zA-Z]+(?:\.[a-zA-Z]+)?_[a-zA-Z]$"
)

stderr.print("[blue]Scanning " + self.path + "...")
Expand Down Expand Up @@ -149,7 +151,9 @@ def remove_oldservice(self):
"The following services are going to be deleted from the sftp:\n"
+ service_elements
)
confirm_sftp_delete = bu_isciii.utils.prompt_yn_question("Are you sure?: ")
confirm_sftp_delete = bu_isciii.utils.prompt_yn_question(
"Are you sure?: ", dflt=False
)
if confirm_sftp_delete:
for service in self.marked_services:
try:
Expand Down
61 changes: 45 additions & 16 deletions bu_isciii/bioinfo_doc.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -262,13 +262,34 @@ def create_structure(self):
return

def post_delivery_info(self):
delivery_notes = bu_isciii.utils.ask_for_some_text(
msg="Write some delivery notes:"
)
if bu_isciii.utils.prompt_yn_question(
msg="Do you wish to provide a text file for delivery notes?", dflt=False
):
for i in range(3, -1, -1):
self.provided_txt = bu_isciii.utils.prompt_path(
msg="Write the path to the file with RAW text as delivery notes"
)
if not os.path.isfile(os.path.expanduser(self.provided_txt)):
stderr.print(f"Provided file doesn't exist. Attempts left: {i}")
else:
stderr.print(f"File selected: {self.provided_txt}")
break
else:
stderr.print("No more attempts. Delivery notes will be given by prompt")
self.provided_txt = None
else:
self.provided_txt = None

if self.provided_txt:
with open(os.path.expanduser(self.provided_txt)) as f:
self.delivery_notes = " ".join([x.strip() for x in f.readlines()])
else:
self.delivery_notes = bu_isciii.utils.ask_for_some_text(
msg="Write some delivery notes:"
)
delivery_dict = {
"resolution_number": self.resolution_id,
"delivery_notes": delivery_notes,
"delivery_notes": self.delivery_notes,
}

# How json should be fully formatted:
Expand Down Expand Up @@ -568,9 +589,15 @@ def email_creation(self):
if bu_isciii.utils.prompt_yn_question(
"Do you want to add some delivery notes to the e-mail?", dflt=False
):
email_data["email_notes"] = bu_isciii.utils.ask_for_some_text(
msg="Write email notes"
)
if self.provided_txt:
if bu_isciii.utils.prompt_yn_question(
f"Do you want to use notes from {self.provided_txt}?", dflt=False
):
email_data["email_notes"] = self.delivery_notes
else:
email_data["email_notes"] = bu_isciii.utils.ask_for_some_text(
msg="Write email notes"
)

email_data["user_data"] = self.resolution_info["service_user_id"]
email_data["service_id"] = self.service_name.split("_", 5)[0]
Expand Down Expand Up @@ -604,7 +631,7 @@ def send_email(self, html_text, results_pdf_file):
server.login(user=email_host_user, password=email_host_password)
except Exception as e:
stderr.print("[red] Unable to send e-mail" + e)

default_cc = "[email protected]"
msg = MIMEMultipart("alternative")
msg["To"] = self.resolution_info["service_user_id"]["email"]
msg["From"] = email_host_user
Expand All @@ -617,18 +644,21 @@ def send_email(self, html_text, results_pdf_file):
+ self.service_name.split("_", 5)[2]
)
if bu_isciii.utils.prompt_yn_question(
"Do you want to add any other sender? appart from "
+ self.resolution_info["service_user_id"]["email"],
"Do you want to add any other sender? apart from %s. Note: %s is the default CC."
% (self.resolution_info["service_user_id"]["email"], default_cc),
dflt=False,
):
stderr.print(
"[red] Write emails to be added in semicolon separated format: bioinformatica@isciii.es;icuesta@isciii.es"
"[red] Write emails to be added in semicolon separated format: icuesta@isciii.es;user2@isciii.es"
)
msg["CC"] = bu_isciii.utils.ask_for_some_text(msg="E-mails:")
rcpt = msg["CC"].split(";") + [msg["To"]]
cc_address = bu_isciii.utils.ask_for_some_text(msg="E-mails:")
else:
rcpt = self.resolution_info["service_user_id"]["email"]

cc_address = str()
if cc_address:
msg["CC"] = str(default_cc + ";" + str(cc_address))
else:
msg["CC"] = default_cc
rcpt = msg["CC"].split(";") + [msg["To"]]
html = MIMEText(html_text, "html")
msg.attach(html)
with open(results_pdf_file, "rb") as f:
Expand All @@ -639,7 +669,6 @@ def send_email(self, html_text, results_pdf_file):
filename=str(os.path.basename(results_pdf_file)),
)
msg.attach(attach)

server.sendmail(
email_host_user,
rcpt,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ mkdir logs

scratch_dir=$(echo $PWD | sed "s/\/data\/bi\/scratch_tmp/\/scratch/g")

cat ../samples_id.txt | while read in; do echo "srun --partition short_idx --cpus-per-task 32 --mem 35000M --chdir $scratch_dir --time 01:00:00 --output logs/IRMA.${in}.%j.log /data/bi/pipelines/flu-amd/IRMA FLU_AD ../02-preprocessing/${in}/${in}_R1_filtered.fastq.gz ../02-preprocessing/${in}/${in}_R2_filtered.fastq.gz ${in} &"; done > _01_irma.sh
cat ../samples_id.txt | while read in; do echo "srun --partition short_idx --cpus-per-task 16 --mem 35000M --chdir $scratch_dir --time 01:00:00 --output logs/IRMA.${in}.%j.log /data/bi/pipelines/flu-amd-202402/IRMA FLU_AD ../02-preprocessing/${in}/${in}_R1_filtered.fastq.gz ../02-preprocessing/${in}/${in}_R2_filtered.fastq.gz ${in} &"; done > _01_irma.sh

echo 'bash create_irma_stats.sh' > _02_create_stats.sh

echo "ls */*HA*.fasta | cut -d '/' -f2 | cut -d '.' -f1 | sort -u | cut -d '_' -f3 | sed '/^\$/d' | sed 's/^/A_/g' > HA_types.txt" > _03_post_processing.sh

echo "cat HA_types.txt | while read in; do mkdir \${in}; done" >> _03_post_processing.sh

echo "mkdir B" >> _03_post_processing.sh
echo "if grep -qw 'B__' irma_stats.txt; then mkdir B; fi" >> _03_post_processing.sh

echo "mkdir C" >> _03_post_processing.sh
echo "if grep -qw 'C__' irma_stats.txt; then mkdir C; fi" >> _03_post_processing.sh

echo "ls */*.fasta | cut -d '/' -f2 | cut -d '.' -f1 | cut -d '_' -f1,2 | sort -u | grep 'A_' > A_fragment_list.txt" >> _03_post_processing.sh

Expand All @@ -29,7 +29,7 @@ echo 'grep -w 'B__' irma_stats.txt | cut -f1 | while read sample; do cat B_fragm

echo 'grep -w 'C__' irma_stats.txt | cut -f1 | while read sample; do cat C_fragment_list.txt | while read fragment; do if test -f ${sample}/${fragment}*.fasta; then cat ${sample}/${fragment}*.fasta | sed "s/^>/\>${sample}_/g" | sed s/_H1//g | sed s/_H3//g | sed s/_N1//g | sed s/_N2//g | sed s@-@/@g | sed s/_C_/_/g ; fi >> C/${fragment}.txt; done; done' >> _03_post_processing.sh

echo 'cat ../samples_id.txt | while read in; do cat ${in}/*.fasta | sed "s/^>/\>${in}_/g" | sed 's/_H1//g' | sed 's/_H3//g' | sed 's/_N1//g' | sed 's/_N2//g' | sed 's@-@/@g' | 's/_A_/_/g' | sed 's/_B_/_/g' | sed 's/_C_/_/g' >> all_samples_completo.txt; done' >> _03_post_processing.sh
echo 'cat ../samples_id.txt | while read in; do cat ${in}/*.fasta | sed "s/^>/\>${in}_/g" | sed 's/_H1//g' | sed 's/_H3//g' | sed 's/_N1//g' | sed 's/_N2//g' | sed 's@-@/@g' | sed 's/_A_/_/g' | sed 's/_B_/_/g' | sed 's/_C_/_/g' >> all_samples_completo.txt; done' >> _03_post_processing.sh

echo 'sed -i "s/__//g" irma_stats.txt' >> _03_post_processing.sh
echo 'sed -i "s/_\t/\t/g" irma_stats.txt' >> _03_post_processing.sh
echo 'sed "s/__//g" irma_stats.txt > clean_irma_stats.txt' >> _03_post_processing.sh
echo 'sed "s/_\t/\t/g" irma_stats.txt > clean_irma_stats.txt' >> _03_post_processing.sh
3 changes: 2 additions & 1 deletion bu_isciii/templates/IRMA/RESULTS/irma_results
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ ln -s ../../ANALYSIS/*_MET/99-stats/multiqc_report.html ./krona_results.html
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/all_samples_completo.txt .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/A_H* .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/B .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/C .
ln -s ../../ANALYSIS/*FLU_IRMA/04-irma/C .
tail -n +2 ../../ANALYSIS/*_FLU_IRMA/04-irma/clean_irma_stats.txt | cut -f4 | sort | uniq -c > flu_type_summary.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# module load fastp
# if assembly pipeline was performed first and the trimmed sequences were saved, this should work:
# cat ../samples_id | xargs -I mkdir @@; cd $_; ln -s ../../*/01-preprocessing/trimmed_sequences/@@*.gz @@; cd -
# cat ../samples_id.txt | xargs -I @@ mkdir @@; cd @@; ln -s ../../../*/01-processing/fastp/@@_1.fastp.fastq.gz ./@@_R1_filtered.fastq.gz; ln -s ../../../*/01-processing/fastp/@@_2.fastp.fastq.gz ./@@_R2_filtered.fastq.gz ; cd -
# else:
mkdir logs
scratch_dir=$(echo $(pwd) | sed 's@/data/bi/scratch_tmp/@/scratch/@g')
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# conda activate ariba
# ARIBA runs local assembli/processing_Data/bioinformatics/services_and_colaborations/CNM/bacteriologia/20190821_QCASSEMBLT_s.gonzalez_T/RAW/fastqc_2/.

mkdir logs
scratch_dir=$(echo $PWD | sed 's/\/data\/bi\/scratch_tmp/\/scratch/g')
downloaded_ref=$(find ../../../../REFERENCES/ -type d -name 'ref_db')

# Cartesian product of the two files to avoid double looping
join -j 2 ../../samples_id.txt ../databases.txt | sed 's/^ //g' > sample_database.txt

# col 1 (arr[0]): sample
# col 2 (arr[1]): database
cat sample_database.txt | while read in; do arr=($in); echo "mkdir -p ${arr[0]}; srun --chdir $scratch_dir --output logs/ARIBA${arr[0]}_${arr[1]}.%j.log --job-name ARIBA_${arr[0]}_${arr[1]} --cpus-per-task 5 --mem 5G --partition short_idx --time 02:00:00 ariba run /data/bi/references/ariba/20211216/${arr[1]}/out.${arr[1]}.prepareref ../../../*ASSEMBLY/01-preprocessing/trimmed_sequences/${arr[0]}_1.trim.fastq.gz ../../../*ASSEMBLY/01-preprocessing/trimmed_sequences/${arr[0]}_2.trim.fastq.gz ${arr[0]}/out_${arr[1]}_${arr[0]}_run &"; done > _01_ariba.sh
cat sample_database.txt | while read in; do arr=($in); echo "mv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/report.tsv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/${arr[0]}_${arr[1]}_report.tsv"; done > _02_fix_tsvreport.sh
cat sample_database.txt | grep -v 'pubmlst' | while read in; do arr=($in); echo "mkdir -p ${arr[0]}; srun --chdir $scratch_dir --output logs/ARIBA_${arr[0]}_${arr[1]}.%j.log --job-name ARIBA_${arr[0]}_${arr[1]} --cpus-per-task 5 --mem 5G --partition short_idx --time 02:00:00 ariba run /data/bi/references/ariba/20211216/${arr[1]}/out.${arr[1]}.prepareref ../../01-preprocessing/${arr[0]}/${arr[0]}_R1_filtered.fastq.gz ../../01-preprocessing/${arr[0]}/${arr[0]}_R2_filtered.fastq.gz ${arr[0]}/out_${arr[1]}_${arr[0]}_run &"; done > _01_ariba.sh

cat ../samples_id.txt | while read in; echo "mkdir -p ${arr[0]}; srun --chdir $scratch_dir --output logs/ARIBA_${in}_pubmlst.%j.log --job-name ARIBA_${in}_pubmlst --cpus-per-task 5 --mem 5G --partition short_idx --time 02:00:00 ariba run ${downloaded_ref} ../../01-preprocessing/${in}/${in}_R1_filtered.fastq.gz ../../01-preprocessing/${in}/${in}_R2_filtered.fastq.gz ${in}/out_pubmlst_${in}_run &"; done > _01_ariba.sh

cat sample_database.txt | while read in; do arr=($in); echo "mv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/report.tsv ${arr[0]}/out_${arr[1]}_${arr[0]}_run/${arr[0]}_${arr[1]}_report.tsv"; done > _02_fix_tsvreport.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ scratch_dir=$(echo $PWD | sed 's/\/data\/bi\/scratch_tmp/\/scratch/g')
# 1 - Use the ls in parenthesis to find the reports for a certain db, and xargs to make it into a single line
# 2 - Integrate this into the ariba summary command

cat ../databases.txt | while read in; do echo "srun --chdir $scratch_dir --output logs/ARIBA_SUMMARY_${in}.log --job-name ARIBA_${in} --cpus-per-task 5 --mem 5G --partition short_idx --time 00:30:00 ariba summary --cluster_cols ref_seq,match out_summary_${in} $(ls ../run/*/out*_${in}*/*${in}*_report.tsv | xargs)"; done > _01_ariba_summary_prueba.sh
cat ../databases.txt | while read in; do echo "srun --chdir $scratch_dir --output logs/ARIBA_SUMMARY_${in}.log --job-name ARIBA_${in} --cpus-per-task 5 --mem 5G --partition short_idx --time 00:30:00 ariba summary --cluster_cols ref_seq,match out_summary_${in} $(ls ../run/*/out*_${in}*/*${in}*_report.tsv | xargs) &"; done > _01_ariba_summary_prueba.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
python3 /data/bi/pipelines/bacterial_qc/parse_ariba.py --path ../02-ariba/summary/out_summary_card.csv --database card --output_bn ariba_card.bn --output_csv ariba_card.csv
python3 /data/bi/pipelines/bacterial_qc/parse_ariba.py --path ../02-ariba/summary/out_summary_plasmidfinder.csv --database plasmidfinder --output_bn ariba_plasmidfinder.bn --output_csv ariba_plasmidfinder.csv
python3 /data/bi/pipelines/bacterial_qc/parse_ariba.py --path ../02-ariba/summary/out_summary_vfdb_full.csv --database vfdb_full --output_bn ariba_vfdb_full.bn --output_csv ariba_vfdb_full.csv

paste <(echo "sample_id") <(cat ../02-ariba/run/*/out_pubmlst_*_run/mlst_report.tsv | head -n1) > ariba_mlst_full.tsv; cat ../samples_id.txt | while read in; do paste <(echo ${in}) <(tail -n1 ../02-ariba/run/${in}/out_pubmlst_${in}_run/mlst_report.tsv); done >> ariba_mlst_full.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,2 @@
ln -s ../samples_id.txt .
ln -s ../00-reads .
mkdir 01-preprocessing
mkdir 02-srst2
mkdir 03-ariba
mkdir 99-stats
Loading

0 comments on commit 0fdd4a1

Please sign in to comment.