You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running ConFindr on a specific sample we encountered a KeyError where the key could not be found in the fastq records. However, the key is present in the fastq records.
After some digging, we could trace the cause to be the load_fastq_records method in the methods.py script. We have read names that contain :1: (referring to the lane number (https://help.basespace.illumina.com/files-used-by-basespace/fastq-files)) but also end in /1. This causes /1 to be added to the record. As a result, the key it is looking for does not match the fastq read names (it is looking for read_name/1).
Is there a reason why you first check if :1: is present in the record before checking if the record already contains /1? Can this be swapped and can you check if the read ends with /1 instead of containing it? Also, it is documented above the first condition (if ':1:' in record.id) that you change a :1: to /1 in the record id, but you just add /1. Is this a mistake in the documentation or in the code?
Current code:
if forward:
# Change a :1: to /1 in the record.id
if ':1:' in record.id:
record.id = record.id + '/1'
# Don't worry if the record.id already has a /1
elif '/1' in record.id:
pass
# If the record.id doesn't have a read direction, add /1
else:
record.id = record.id + '/1'
# Process reverse reads in a similar fashion to forward reads
else:
if ':2:' in record.id:
record.id = record.id + '/2'
elif '/2' in record.id:
pass
else:
record.id = record.id + '/2'
Suggested code:
if forward:
# Don't worry if the record.id already has a /1
if record.id.endswith('/1'):
pass
# Change a :1: to /1 in the record.id
elif ':1:' in record.id:
record.id = record.id + '/1'
# If the record.id doesn't have a read direction, add /1
else:
record.id = record.id + '/1'
# Process reverse reads in a similar fashion to forward reads
else:
if record.id.endswith('/2'):
pass
elif ':2:' in record.id:
record.id = record.id + '/2'
else:
record.id = record.id + '/2'
Thanks in advance for your reply!
The text was updated successfully, but these errors were encountered:
Hi @miliskato, I'm sorry for the slow response to your issue, and thank you for including those suggested code changes.
It seems that your FASTQ headers may be in an unconventional format. In the link you've provided above, there's a space character separating information such as the <lane> with the <read> (read direction, 1 or 2).
When ConFindr reads the paired-end FASTQ files, it only uses the first contiguous string (no whitespace) in the FASTQ header line as the record.id, and it assumes that this is the same for both forward and reverse reads, except for instances of ":1:" and "/1" which are handled according to the code provided above in load_fastq_records().
When a pair of forward and reverse FASTQ files is provided where the headers do not match between mate pairs, a KeyError is raised by characterise_read() for the reverse read, as this function assumes that the key for the reverse read in the fastq_records dictionary is the record.id of the forward read + '/1'. I believe this error has the same root cause as #52, and I plan to modify #54 accordingly to track this.
Would you be able to provide an example of a pair of FASTQ files which are causing this issue? I would like to test the proposed changes to the code on these files before submitting a pull request to the ConFindr repository. I attempted to reproduce the issue using some modified FASTQ headers, but your suggested changes didn't prevent the KeyError on these, unfortunately.
Hi,
When running ConFindr on a specific sample we encountered a KeyError where the key could not be found in the fastq records. However, the key is present in the fastq records.
After some digging, we could trace the cause to be the load_fastq_records method in the methods.py script. We have read names that contain :1: (referring to the lane number (https://help.basespace.illumina.com/files-used-by-basespace/fastq-files)) but also end in /1. This causes /1 to be added to the record. As a result, the key it is looking for does not match the fastq read names (it is looking for read_name/1).
Is there a reason why you first check if :1: is present in the record before checking if the record already contains /1? Can this be swapped and can you check if the read ends with /1 instead of containing it? Also, it is documented above the first condition (if ':1:' in record.id) that you change a :1: to /1 in the record id, but you just add /1. Is this a mistake in the documentation or in the code?
Current code:
Suggested code:
Thanks in advance for your reply!
The text was updated successfully, but these errors were encountered: