Bug: smaller BAM files #80

euweiss · 2025-01-15T13:57:30Z

I have a BAM file for a positive control sample, which I am trying to confirm with VNtyper. I have previously run the program with other BAM files and it worked without problems producing positive and negative results as expected. These BAM files were > 10GB in size. The current file in question is only ~ 7 GB and doesn't produce all the same output. One indication that something is not all right is given at run time:

[bam_sort_core] merging from 0 files and 8 in-memory blocks...

Usually it would say
[bam_sort_core] merging from 8 files and 8 in-memory blocks...

I have tested the program with other BAM files from the same sequecing run (of similar size) with the same result.
When I used BAM files from another sequencing run, it ran normally for larger files (>10 GB) but produced the same issue in smaller ones (~ 7GB) despite being from the same run and quality parameters being fine.

Can you explain where this problem may stem from? Is there a minimum for sequencing depth that may not be reached for these samples?

The text was updated successfully, but these errors were encountered:

berntpopp · 2025-01-15T14:21:16Z

Please provide the command line you are using.
Have you tried vntyper.org as alternative to running locally?

euweiss · 2025-01-16T07:54:47Z

I am using the following command:
sudo docker run --rm -it -v [local path to shared]:/SOFT/shared saei/vntyper:1.0.0 -t 8 --bam -p /SOFT/VNtyper/ -ref /SOFT/VNtyper/Files/chr1.fa -ref_VNTR /SOFT/VNtyper/Files/MUC1-VNTR.fa -m /SOFT/VNtyper/Files/hg19_genic_VNTRs.db -a /SOFT/shared/[sampleID].bam -t 8 -w /SOFT/shared/ -o [sampleID]

The link is being blocked by FortiGuard for being in violation of company internet policy.

berntpopp · 2025-01-16T09:29:08Z

I believe you are running an older version of the tool in docker. Have you tried using the current version 2.0.0? It is way faster.
Besides docker we offer instructions to install it using pip and conda/mamba now.

Could you provide me with more information why https://vntyper.org/ is blocked in your environment? What is the message you get?

euweiss · 2025-01-16T10:02:02Z

I have now pulled the current docker image and tried to run it as described on your readme but it is failing completely now

sudo docker run -w /opt/vntyper --rm \ -v [local path to shared]:/opt/vntyper/input \ -v [local path to shared]:/opt/vntyper/output \ saei/vntyper:main \ vntyper pipeline --bam [local path to shared]/[sampleID].bam \ -o [local path to shared]/[sampleID]/

error:
ERROR conda.cli.main_run:execute(41): conda run vntyper pipeline --bam [local path to shared]/[sampleID]/.bam -o [local path to shared]/[sampleID]/` failed. (See above for error)
Traceback (most recent call last):
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '[local path to shared]/[sampleID]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '[local path to shared]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '[local path to directory containing shared in home]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/vntyper/bin/vntyper", line 8, in
sys.exit(main())
File "/opt/conda/envs/vntyper/lib/python3.9/site-packages/vntyper/cli.py", line 403, in main
log_file_path.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1327, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1327, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1327, in mkdir
self.parent.mkdir(parents=True, exist_ok=True)
File "/opt/conda/envs/vntyper/lib/python3.9/pathlib.py", line 1323, in mkdir
self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '[path to home]'`

Since I am running with sudo and it worked with v1.0.0 I cannot explain the sudden issues with permissions.

Regarding the website:
FortiGuard Intrusion Prevention - Access Blocked
Web Page Blocked
You have tried to access a web page that is in violation of your Internet usage policy.
Category | Unrated
URL | http://vntyper.org/

berntpopp · 2025-01-16T10:15:55Z

I just updated the README file for the Docker instructions, which were out of date.
This works now:

# pull the docker image
docker pull saei/vntyper:main

# run the pipeline using the docker image
docker run -w /opt/vntyper --rm \
    -v /local/input/folder/:/opt/vntyper/input \
    -v /local/output/folder/:/opt/vntyper/output \
    saei/vntyper:main \
    vntyper pipeline \
    --bam /opt/vntyper/input/filename.bam \
    -o /opt/vntyper/output/filename/

You should just have to replace the "/local/input/folder/" with the path to your BAM file and "/local/output/folder/" with the path where you want your results saved.

berntpopp · 2025-01-16T10:17:24Z

Regarding FortiGuard:
Your company's security settings seem very restricted here, blocking unrated sites.
I have applied to get a rating for the site, though I think moist users will not be behind a FortiGuard filter.

berntpopp · 2025-01-16T10:40:26Z

Update for FortiGuard filtering:

It should work now hopefully.

euweiss · 2025-01-17T08:54:30Z

I needed to specify my user and group to be able to run the program in docker due to some permission problems that were not present in v1. However, it does run now and I do not see any marked difference between the samples as before.
The problem is that my positive sample still turns gets a negative result. I have looked at the log and found that in true negative samples the rows are filtered out at "is_framshift" or latest "is_valid_frameshift"

2025-01-17 08:40:10,668 - root - INFO - Filter column 'is_frameshift' exists; 420 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,668 - root - INFO - Filter column 'is_valid_frameshift' exists; 0 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,669 - root - INFO - Filter column 'depth_confidence_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,669 - root - INFO - Filter column 'alt_filter_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-17 08:40:10,670 - root - INFO - Filter column 'motif_filter_pass' exists; 0 -> 0 rows remain after requiring True.

2025-01-16 13:40:59,337 - root - INFO - Filter column 'is_frameshift' exists; 788 -> 156 rows remain after requiring True.
2025-01-16 13:40:59,337 - root - INFO - Filter column 'is_valid_frameshift' exists; 156 -> 0 rows remain after requiring True.
2025-01-16 13:40:59,338 - root - INFO - Filter column 'depth_confidence_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-16 13:40:59,338 - root - INFO - Filter column 'alt_filter_pass' exists; 0 -> 0 rows remain after requiring True.
2025-01-16 13:40:59,339 - root - INFO - Filter column 'motif_filter_pass' exists; 0 -> 0 rows remain after requiring True.

whereas for my positive control, the data is filtered out at "alt_filter_pass".

2025-01-16 12:58:25,981 - root - INFO - Filter column 'is_frameshift' exists; 614 -> 39 rows remain after requiring True.
2025-01-16 12:58:25,982 - root - INFO - Filter column 'is_valid_frameshift' exists; 39 -> 39 rows remain after requiring True.
2025-01-16 12:58:25,982 - root - INFO - Filter column 'depth_confidence_pass' exists; 39 -> 39 rows remain after requiring True.
2025-01-16 12:58:25,983 - root - INFO - Filter column 'alt_filter_pass' exists; 39 -> 0 rows remain after requiring True.
2025-01-16 12:58:25,983 - root - INFO - Filter column 'motif_filter_pass' exists; 0 -> 0 rows remain after requiring True.

What is being removed in this filter?
The sample is confirmed positive by SNaPshot and a different bioinformatics tool has detected the MUC1 mutation on the same data that is being used for VNtyper.

euweiss · 2025-01-17T08:55:04Z

The website can be accessed now and generates the same results as the pipeline version.

hassansaei · 2025-01-17T09:14:03Z

Could you please send us the zipped version of the output folder for the positive sample generated with the latest version? Please make sure that the intermediate files are included so we can better look over the issue.
You can send it to [email protected]. Thank you!

berntpopp · 2025-01-17T11:09:13Z

Hi euweiss,

Thank you for helping us debug. Glad that the webservice works for you.

Could you please specify some things:

What did you have to do in Docker to work? Also, please state your specific docker version and environment (operating system).
Is the case you are working on a typical dupC variant or an alternative variant?
What other tool have you used to conform? adVNTR?
Can you send us your logs with the original data and version od vntyper where the case was postive?

euweiss · 2025-01-17T12:50:03Z

What did you have to do in Docker to work? Also, please state your specific docker version and environment (operating system).

I had to specify the user:group in the doker command using the --user flag
The version I am running is 24.0.7 on linux

Is the case you are working on a typical dupC variant or an alternative variant?

yes a typical dupC

What other tool have you used to conform? adVNTR?

I have used a modified version of HotCount
https://github.com/[mafouille/HotCount](https://github.com/mafouille/HotCount)

Can you send us your logs with the original data and version od vntyper where the case was postive?

Since I am working with patient data I will need to check I am allowed to do that.

berntpopp · 2025-01-17T13:46:31Z

Thank you @euweiss,

vntyper 2.0 sets a non root user in the Docker container which is recommended for security reasons. This might explain your problems with the new image. I will look into it and document this better. It will be a new issue.

Regarding your core problem, I would like to unravel the case a little bit and summarize. Please correct:

You have a dupC snapshot positive (e.g. confirmed) case that you had identified with vntyper 1.0.0 running in Docker from exome sequencing data.
You tried to re-run that same original data in the old Docker container, which now does not work anymore. Or was this new sequencing data?
Using the new Docker container, you get a negative result on the data. Or was this also new sequencing data?

For debugging, it would be great if you could send us just the MUC1 subset of the BAM files of both your NGS data. Because MUC1 is so small, it barely holds any genetic information that can be identifiable. You can also remove the header information from the BAM ( I have a script for that here: https://github.com/hassansaei/VNtyper/blob/main/reference/pseudonymize.py).

euweiss · 2025-01-17T14:07:50Z

vntyper 2.0 sets a non root user in the Docker container which is recommended for security reasons. This might explain your problems with the new image. I will look into it and document this better. It will be a new issue.

That makes sense. When I tried changing some directory permissions the container generated files as user Administrator in the Administrator group

Regarding your core problem, I would like to unravel the case a little bit and summarize. Please correct:

You have a dupC snapshot positive (e.g. confirmed) case that you had identified with vntyper 1.0.0 running in Docker from exome sequencing data.

You tried to re-run that same original data in the old Docker container, which now does not work anymore. Or was this new sequencing data?

Using the new Docker container, you get a negative result on the data. Or was this also new sequencing data?

No, the snapshot positive case has never produced a positiv result with VNtyper for me. I apologise if hat was unclear. In the old version it gave me the strange log, which sparked opening this issue, as did any other (presumably) negative case if the BAM file was around 7 GB as opposed to >10 GB. I was able to confirm a random positive call (with a BAM >10 GB) from the other software, which is has not been snapshot tested. Therefore, I suspect coverage to be the issue in same way.

For debugging, it would be great if you could send us just the MUC1 subset of the BAM files of both your NGS data. Because MUC1 is so small, it barely holds any genetic information that can be identifiable. You can also remove the header information from the BAM ( I have a script for that here: https://github.com/hassansaei/VNtyper/blob/main/reference/pseudonymize.py).

I would gladly provide you this but I need to check with my manager if we are legally allowed to share this. The laws regarding patient data are very strict.

hassansaei · 2025-01-20T08:30:29Z

I needed to specify my user and group to be able to run the program in docker due to some permission problems that were not present in v1. However, it does run now and I do not see any marked difference between the samples as before. The problem is that my positive sample still turns gets a negative result. I have looked at the log and found that in true negative samples the rows are filtered out at "is_framshift" or latest "is_valid_frameshift"

The user permission issue should be resolved in the latest update. Please pull the newest version and let us know the results.

What is being removed in this filter? The sample is confirmed positive by SNaPshot and a different bioinformatics tool has detected the MUC1 mutation on the same data that is being used for VNtyper.

Have you tested both methods (--extra-modules advntr) to check the output? It’s quite unusual for two independent methods to miss the dupC variant. Have you sequenced the SNaPshot product to rule out any SNVs or indels in the MwoI site? Could you confirm if you’re using the Twist capturing protocol? Thank you.

berntpopp self-assigned this Jan 15, 2025

berntpopp added the bug Something isn't working label Jan 15, 2025

berntpopp changed the title ~~smaller BAM files~~ Bug: smaller BAM files Jan 15, 2025

berntpopp assigned hassansaei Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: smaller BAM files #80

Bug: smaller BAM files #80

euweiss commented Jan 15, 2025

berntpopp commented Jan 15, 2025

euweiss commented Jan 16, 2025

berntpopp commented Jan 16, 2025

euweiss commented Jan 16, 2025

berntpopp commented Jan 16, 2025

berntpopp commented Jan 16, 2025

berntpopp commented Jan 16, 2025

euweiss commented Jan 17, 2025

euweiss commented Jan 17, 2025

hassansaei commented Jan 17, 2025

berntpopp commented Jan 17, 2025

euweiss commented Jan 17, 2025

berntpopp commented Jan 17, 2025

euweiss commented Jan 17, 2025

hassansaei commented Jan 20, 2025

Bug: smaller BAM files #80

Bug: smaller BAM files #80

Comments

euweiss commented Jan 15, 2025

berntpopp commented Jan 15, 2025

euweiss commented Jan 16, 2025

berntpopp commented Jan 16, 2025

euweiss commented Jan 16, 2025

berntpopp commented Jan 16, 2025

berntpopp commented Jan 16, 2025

berntpopp commented Jan 16, 2025

euweiss commented Jan 17, 2025

euweiss commented Jan 17, 2025

hassansaei commented Jan 17, 2025

berntpopp commented Jan 17, 2025

euweiss commented Jan 17, 2025

berntpopp commented Jan 17, 2025

euweiss commented Jan 17, 2025

hassansaei commented Jan 20, 2025