You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to get a table that looks like this, where we can get a list of non-reference heterozygous and homozygous variants (vid) by gene (gene_symbol) for each sample (s):
s
gene_symbol
vid
consequence
1
TESK2
[‘1-3414321-G-A’, ‘1-3414321-T-A’]
[‘missense_variant’, ‘missense_variant’]
1
PEX26
[‘22-18561317-T-A’, ‘22-18561317-TG-A’]
[‘missense_variant’, ‘frameshift_variant’]
2
TESK2
[‘1-3414321-G-A’]
[‘missense_variant’]
3
PEX26
[‘22-18561317-C-A’]
[‘missense_variant’]
…
…
…
…
where vid is the locus.contig - locus.position - ref_allele - alt_allele.
I tried to get this table by doing the following, but when I show the table, it is empty:
# filter for het_non_ref and hom_varmt=mt.filter_entries(mt.GT.is_het_non_ref() |mt.GT.is_hom_var())
# get entriespv_table=mt.entries()
# select columnscandidate_genes=pv_table.select(pv_table.s,
pv_table.ancestry_pred,
pv_table.locus,
pv_table.alleles,
pv_table.vid,
pv_table.gene_symbol,
pv_table.gene_id,
pv_table.consequence,
)
# group by gene symbol and sample id candidate_genes=candidate_genes.group_by("s", "gene_symbol").aggregate(
vid=hl.agg.collect(
candidate_genes.vid
),
consequence=hl.agg.collect(
candidate_genes.consequence
)
)
candidate_genes.show()
I thought about using make_table but I have over 1000 samples.
What is the best strategy to get a table like the one above?
Thanks!
The text was updated successfully, but these errors were encountered:
Note
The following post was exported from discuss.hail.is, a forum for asking questions about Hail which has since been deprecated.
(Jan 08, 2024 at 12:59) nistha said:
I’m pretty new to working with Hail, so I’m a little confused about how to work with the MatrixTables. I would appreciate any advice!
I have a MatrixTable (
mt
) that looks like this:I want to get a table that looks like this, where we can get a list of non-reference heterozygous and homozygous variants (
vid
) by gene (gene_symbol
) for each sample (s
):where
vid
is thelocus.contig
-locus.position
-ref_allele
-alt_allele
.I tried to get this table by doing the following, but when I show the table, it is empty:
I thought about using
make_table
but I have over 1000 samples.What is the best strategy to get a table like the one above?
Thanks!
The text was updated successfully, but these errors were encountered: