Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a csv file with alignment information #47

Open
roblanf opened this issue Aug 30, 2023 · 2 comments
Open

add a csv file with alignment information #47

roblanf opened this issue Aug 30, 2023 · 2 comments

Comments

@roblanf
Copy link
Owner

roblanf commented Aug 30, 2023

At the moment all the useful information is in nexus format, which can be annoying to work with.

E.g. we have this:

begin SETS;

	[partitions]
	CHARSET	COI_1stpos = 1-1592\3;
	CHARSET	COI_2ndpos = 2-1592\3;
	CHARSET	COI_3rdpos = 3-1592\3;
	CHARSET	16S = 1593-3037;

	[loci]
	CHARPARTITION COI = 1:COI_1stpos, 2:COI_2ndpos, 3:COI_3rdpos;
	CHARPARTITION 16S = 1:16S;

	CHARPARTITION loci = 1:COI, 2:16S;

	[genomes]
	CHARPARTITION	mitochondrial_genome = 1:COI, 2:16S;

	CHARPARTITION genomes = 1:mitochondrial_genome;

But this could be represented as a csv file with the following columns:

  • alignment_name (e.g. "Anderson_2012")
  • partition (e.g. "COI_1stpos")
  • partition_sites (e.g. "1-1592\3")
  • locus (e.g. "COI")
  • genome (e.g. "mitochondrial")

We could then use the csv file when entering the data, and build the nexus block directly from the csv file.

@roblanf
Copy link
Owner Author

roblanf commented Aug 30, 2023

also include a column for 'datatype' e.g. DNA, AA, etc. This comes from the top of the nexus alignment file.

@DS4B-ANU
Copy link

include a column for codon position too (NA if it's not a codon position), so now the columns are:

  • alignment_name (e.g. "Anderson_2012")
  • partition_name (e.g. "COI_1stpos")
  • partition_start (e.g. 1)
  • partition_end (e.g. 100)
  • partition_skip (e.g. 3; so if start is 1, end is 100, and skip is 3, the nexus format would be 1-100\3)
  • locus_ name (e.g. "COI")
  • genome (e.g. "mitochondrial")
  • data_type (e.g. "DNA", "AA", "RNA")
  • codon_position (e.g. 1, 2, 3, or NA when it's not protein-coding)

HuaiyanRen added a commit that referenced this issue Dec 4, 2023
Write correct csv according to #47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants