Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: smaller BAM files #80

Open
euweiss opened this issue Jan 15, 2025 · 15 comments
Open

Bug: smaller BAM files #80

euweiss opened this issue Jan 15, 2025 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@euweiss
Copy link

euweiss commented Jan 15, 2025

I have a BAM file for a positive control sample, which I am trying to confirm with VNtyper. I have previously run the program with other BAM files and it worked without problems producing positive and negative results as expected. These BAM files were > 10GB in size. The current file in question is only ~ 7 GB and doesn't produce all the same output. One indication that something is not all right is given at run time:

[bam_sort_core] merging from 0 files and 8 in-memory blocks...

Usually it would say
[bam_sort_core] merging from 8 files and 8 in-memory blocks...

I have tested the program with other BAM files from the same sequecing run (of similar size) with the same result.
When I used BAM files from another sequencing run, it ran normally for larger files (>10 GB) but produced the same issue in smaller ones (~ 7GB) despite being from the same run and quality parameters being fine.

Can you explain where this problem may stem from? Is there a minimum for sequencing depth that may not be reached for these samples?

@berntpopp berntpopp self-assigned this Jan 15, 2025
@berntpopp berntpopp added the bug Something isn't working label Jan 15, 2025
@berntpopp
Copy link
Collaborator

Please provide the command line you are using.
Have you tried vntyper.org as alternative to running locally?

@berntpopp berntpopp changed the title smaller BAM files Bug: smaller BAM files Jan 15, 2025
@euweiss
Copy link
Author

euweiss commented Jan 16, 2025

I am using the following command:
sudo docker run --rm -it -v [local path to shared]:/SOFT/shared saei/vntyper:1.0.0 -t 8 --bam -p /SOFT/VNtyper/ -ref /SOFT/VNtyper/Files/chr1.fa -ref_VNTR /SOFT/VNtyper/Files/MUC1-VNTR.fa -m /SOFT/VNtyper/Files/hg19_genic_VNTRs.db -a /SOFT/shared/[sampleID].bam -t 8 -w /SOFT/shared/ -o [sampleID]

The link is being blocked by FortiGuard for being in violation of company internet policy.

@berntpopp
Copy link
Collaborator

I believe you are running an older version of the tool in docker. Have you tried using the current version 2.0.0? It is way faster.
Besides docker we offer instructions to install it using pip and conda/mamba now.

Could you provide me with more information why https://vntyper.org/ is blocked in your environment? What is the message you get?

@euweiss
Copy link
Author

euweiss commented Jan 16, 2025

I have now pulled the current docker image and tried to run it as described on your readme but it is failing completely now

sudo docker run -w /opt/vntyper --rm \ -v [local path to shared]:/opt/vntyper/input \ -v [local path to shared]:/opt/vntyper/output \ saei/vntyper:main \ vntyper pipeline --bam [local path to shared]/[sampleID].bam \ -o [local path to shared]/[sampleID]/

error:
ERROR conda.cli.main_run:execute(41): conda run vntyper pipeline --bam [local path to shared]/[sampleID]/.bam -o [local path to shared]/[sampleID]/` failed. (See above for error)
Traceback (most recent call last):
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '[local path to shared]/[sampleID]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '[local path to shared]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '[local path to directory containing shared in home]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/vntyper/bin/vntyper", line 8, in
sys.exit(main())
File "/opt/conda/envs/vntyper/lib/python3.9/site-packages/vntyper/cli.py", line 403, in main
log_file_path.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1327, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1327, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1327, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '[path to home]'`

Since I am running with sudo and it worked with v1.0.0 I cannot explain the sudden issues with permissions.

Regarding the website:
FortiGuard Intrusion Prevention - Access Blocked
Web Page Blocked
You have tried to access a web page that is in violation of your Internet usage policy.
Category | Unrated
URL | http://vntyper.org/

@berntpopp
Copy link
Collaborator

I just updated the README file for the Docker instructions, which were out of date.
This works now:

# pull the docker image
docker pull saei/vntyper:main

# run the pipeline using the docker image
docker run -w /opt/vntyper --rm \
    -v /local/input/folder/:/opt/vntyper/input \
    -v /local/output/folder/:/opt/vntyper/output \
    saei/vntyper:main \
    vntyper pipeline \
    --bam /opt/vntyper/input/filename.bam \
    -o /opt/vntyper/output/filename/

You should just have to replace the "/local/input/folder/" with the path to your BAM file and "/local/output/folder/" with the path where you want your results saved.

@berntpopp
Copy link
Collaborator

Regarding FortiGuard:
Your company's security settings seem very restricted here, blocking unrated sites.
I have applied to get a rating for the site, though I think moist users will not be behind a FortiGuard filter.

@berntpopp
Copy link
Collaborator

Update for FortiGuard filtering:

Image

It should work now hopefully.

@euweiss
Copy link
Author

euweiss commented Jan 17, 2025

I needed to specify my user and group to be able to run the program in docker due to some permission problems that were not present in v1. However, it does run now and I do not see any marked difference between the samples as before.
The problem is that my positive sample still turns gets a negative result. I have looked at the log and found that in true negative samples the rows are filtered out at "is_framshift" or latest "is_valid_frameshift"

2025-01-17 08:40:10,668 - root - INFO - Filter column 'is_frameshift' exists; 420 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,668 - root - INFO - Filter column 'is_valid_frameshift' exists; 0 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,669 - root - INFO - Filter column 'depth_confidence_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,669 - root - INFO - Filter column 'alt_filter_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,670 - root - INFO - Filter column 'motif_filter_pass' exists; 0 -> 0 rows remain after requiring True.

2025-01-16 13:40:59,337 - root - INFO - Filter column 'is_frameshift' exists; 788 -> 156 rows remain after requiring True.
2025-01-16 13:40:59,337 - root - INFO - Filter column 'is_valid_frameshift' exists; 156 -> 0 rows remain after requiring True.
2025-01-16 13:40:59,338 - root - INFO - Filter column 'depth_confidence_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-16 13:40:59,338 - root - INFO - Filter column 'alt_filter_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-16 13:40:59,339 - root - INFO - Filter column 'motif_filter_pass' exists; 0 -> 0 rows remain after requiring True.

whereas for my positive control, the data is filtered out at "alt_filter_pass".

2025-01-16 12:58:25,981 - root - INFO - Filter column 'is_frameshift' exists; 614 -> 39 rows remain after requiring True.
2025-01-16 12:58:25,982 - root - INFO - Filter column 'is_valid_frameshift' exists; 39 -> 39 rows remain after requiring True.
2025-01-16 12:58:25,982 - root - INFO - Filter column 'depth_confidence_pass' exists; 39 -> 39 rows remain after requiring True.
2025-01-16 12:58:25,983 - root - INFO - Filter column 'alt_filter_pass' exists; 39 -> 0 rows remain after requiring True.
2025-01-16 12:58:25,983 - root - INFO - Filter column 'motif_filter_pass' exists; 0 -> 0 rows remain after requiring True.

What is being removed in this filter?
The sample is confirmed positive by SNaPshot and a different bioinformatics tool has detected the MUC1 mutation on the same data that is being used for VNtyper.

@euweiss
Copy link
Author

euweiss commented Jan 17, 2025

The website can be accessed now and generates the same results as the pipeline version.

@hassansaei
Copy link
Owner

Could you please send us the zipped version of the output folder for the positive sample generated with the latest version? Please make sure that the intermediate files are included so we can better look over the issue.
You can send it to [email protected]. Thank you!

@berntpopp
Copy link
Collaborator

Hi euweiss,

Thank you for helping us debug. Glad that the webservice works for you.

Could you please specify some things:

  • What did you have to do in Docker to work? Also, please state your specific docker version and environment (operating system).
  • Is the case you are working on a typical dupC variant or an alternative variant?
  • What other tool have you used to conform? adVNTR?
  • Can you send us your logs with the original data and version od vntyper where the case was postive?

@euweiss
Copy link
Author

euweiss commented Jan 17, 2025

  • What did you have to do in Docker to work? Also, please state your specific docker version and environment (operating system).

I had to specify the user:group in the doker command using the --user flag
The version I am running is 24.0.7 on linux

  • Is the case you are working on a typical dupC variant or an alternative variant?

yes a typical dupC

  • What other tool have you used to conform? adVNTR?

I have used a modified version of HotCount
https://github.com/[mafouille/HotCount](https://github.com/mafouille/HotCount)

  • Can you send us your logs with the original data and version od vntyper where the case was postive?

Since I am working with patient data I will need to check I am allowed to do that.

@berntpopp
Copy link
Collaborator

Thank you @euweiss,

vntyper 2.0 sets a non root user in the Docker container which is recommended for security reasons. This might explain your problems with the new image. I will look into it and document this better. It will be a new issue.

Regarding your core problem, I would like to unravel the case a little bit and summarize. Please correct:

  1. You have a dupC snapshot positive (e.g. confirmed) case that you had identified with vntyper 1.0.0 running in Docker from exome sequencing data.
  2. You tried to re-run that same original data in the old Docker container, which now does not work anymore. Or was this new sequencing data?
  3. Using the new Docker container, you get a negative result on the data. Or was this also new sequencing data?

For debugging, it would be great if you could send us just the MUC1 subset of the BAM files of both your NGS data. Because MUC1 is so small, it barely holds any genetic information that can be identifiable. You can also remove the header information from the BAM ( I have a script for that here: https://github.com/hassansaei/VNtyper/blob/main/reference/pseudonymize.py).

@euweiss
Copy link
Author

euweiss commented Jan 17, 2025

vntyper 2.0 sets a non root user in the Docker container which is recommended for security reasons. This might explain your problems with the new image. I will look into it and document this better. It will be a new issue.

That makes sense. When I tried changing some directory permissions the container generated files as user Administrator in the Administrator group

Regarding your core problem, I would like to unravel the case a little bit and summarize. Please correct:

  1. You have a dupC snapshot positive (e.g. confirmed) case that you had identified with vntyper 1.0.0 running in Docker from exome sequencing data.
  2. You tried to re-run that same original data in the old Docker container, which now does not work anymore. Or was this new sequencing data?
  3. Using the new Docker container, you get a negative result on the data. Or was this also new sequencing data?

No, the snapshot positive case has never produced a positiv result with VNtyper for me. I apologise if hat was unclear. In the old version it gave me the strange log, which sparked opening this issue, as did any other (presumably) negative case if the BAM file was around 7 GB as opposed to >10 GB. I was able to confirm a random positive call (with a BAM >10 GB) from the other software, which is has not been snapshot tested. Therefore, I suspect coverage to be the issue in same way.

For debugging, it would be great if you could send us just the MUC1 subset of the BAM files of both your NGS data. Because MUC1 is so small, it barely holds any genetic information that can be identifiable. You can also remove the header information from the BAM ( I have a script for that here: https://github.com/hassansaei/VNtyper/blob/main/reference/pseudonymize.py).

I would gladly provide you this but I need to check with my manager if we are legally allowed to share this. The laws regarding patient data are very strict.

@hassansaei
Copy link
Owner

I needed to specify my user and group to be able to run the program in docker due to some permission problems that were not present in v1. However, it does run now and I do not see any marked difference between the samples as before. The problem is that my positive sample still turns gets a negative result. I have looked at the log and found that in true negative samples the rows are filtered out at "is_framshift" or latest "is_valid_frameshift"

The user permission issue should be resolved in the latest update. Please pull the newest version and let us know the results.

What is being removed in this filter? The sample is confirmed positive by SNaPshot and a different bioinformatics tool has detected the MUC1 mutation on the same data that is being used for VNtyper.

Have you tested both methods (--extra-modules advntr) to check the output? It’s quite unusual for two independent methods to miss the dupC variant. Have you sequenced the SNaPshot product to rule out any SNVs or indels in the MwoI site? Could you confirm if you’re using the Twist capturing protocol? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants