Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF.parse modifies input order of base_dict #144

Open
percyfal opened this issue Mar 28, 2024 · 0 comments
Open

GFF.parse modifies input order of base_dict #144

percyfal opened this issue Mar 28, 2024 · 0 comments

Comments

@percyfal
Copy link

The function GFF.parse has the option base_dict that is a dictionary of SeqRecord object to which gff entries are added upon parsing. If base_dict is an OrderedDict (default in newer Python versions), the input order gets scrambled due to code in the function parse_in_parts:

 def parse_in_parts(self, gff_files, base_dict=None, limit_info=None,
            target_lines=None):
        """Parse a region of a GFF file specified, returning info as generated.

        target_lines -- The number of lines in the file which should be used
        for each partial parse. This should be determined based on available
        memory.
        """
        for results in self.parse_simple(gff_files, limit_info, target_lines):
            if base_dict is None:
                cur_dict = dict()
            else:
                cur_dict = copy.deepcopy(base_dict)
            cur_dict = self._results_to_features(cur_dict, results)
            all_ids = list(cur_dict.keys())
            all_ids.sort()
            for cur_id in all_ids:
                yield cur_dict[cur_id]

The statement all_ids.sort() reorders the keys. Is this necessary, and if so, would it be possible to add an option preserve_order to GFF.parse to allow the possibility to avoid this behaviour?

We are using this function in a package for generating annotated genome assembly files, cf NBISweden/EMBLmyGFF3#83.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant