Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added testing functionality #5

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ cd goldfinder
python -m pip install -r requirements.txt
```

If you run into an error message like `error: command 'gcc' failed: No such file or directory`, gcc might not be installed. Check this with gcc --version.

On Windows, if pip throws an error "Cannot open include file: 'io.h': No such file or directory", you might need to install Microsoft C++ compiler. Get it here: https://visualstudio.microsoft.com/visual-cpp-build-tools/
And also tick SDK for Desktop C++. Refer to: https://stackoverflow.com/questions/40018405/cannot-open-include-file-io-h-no-such-file-or-directory

##### Dependencies:
`Bio`==1.6.0 \
`DendroPy`==4.6.1 \
Expand Down Expand Up @@ -174,7 +179,7 @@ Miscellaneous:
#### `association_clusters.txt`
This file defines gene clusters as found by Markov clustering based on association scores. Each cluster starts with `>` followed by clulster ID and its size. In the following lines, all genes contained in the cluster are listed.

#### `{score}_{association/dissociation}_significant_pairs.txt`
#### `{score}_{association/dissociation}_significant_pairs.csv`
This comma-separated file lists all gene pairs that are significantly associated/dissociated according to the chosen score. If appropriate, it also contains a `Cluster` column with the 1-based number of the cluster, or a `-` if the genes do not belong to the same cluster.

#### `cytoscape_input.csv`
Expand Down
2 changes: 2 additions & 0 deletions example_files/known_assocs.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
a b
r q
21 changes: 21 additions & 0 deletions example_files/metadata.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Gene Meta1 Meta2
a a 2
b d 4
c cx 3
d l 7
e x 5
f a 2
g l 1
h e 4
I sd 6
j d 7
k sd 8
l 6
m d 4
n 4
o hf 3
p 2
q 1
r d 2
s c 4
t c 5
2 changes: 1 addition & 1 deletion example_files/roary_mini_example.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Gene,Non-unique Gene name,Annotation,No. isolates,No. sequences,Avg sequences per isolate,Genome Fragment,Order within Fragment,Accessory Fragment,Accessory Order with Fragment,QC,Min group size nuc,Max group size nuc,Avg group size nuc,s1,s2,s3,s4,s5
a,,,,,,,,,,,,,,x,,x,x,
a,abc,hypothetical protein,,,,,,,,,,,,x,,x,x,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use something else than "abc" for the gene name, as this might be confusing with the genes a, b, c. How about "gene_a"?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "geneA"

b,,,,,,,,,,,,,,x,x,x,x,
c,,,,,,,,,,,,,,x,,x,x,x
d,,,,,,,,,,,,,,x,,,x,x
Expand Down
4 changes: 2 additions & 2 deletions goldfinder/data_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ def load_input(pinput, filetype, pmetadata, pknown_associations):
exit("Error: input matrix has zero columns or rows. Maybe the matrix is not properly "
"formatted?")
if not df.isin([0, 1]).all().all():
exit("Error: Values other than 0 and 1 enountered in input matrix.")

exit("Error: Values other than 0 and 1 enountered in input matrix. If you provide input"
" in another format than a binary matrix please use the -f parameter.")
df = df.astype(int)

"""
Expand Down
6 changes: 3 additions & 3 deletions goldfinder/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def result_procedure(p_values_adj, p_values_unadj, significant_score_indices, cl
cluster_size_viz(clusters, hist_file)

print("Writing significant gene pairs to output")
gene_pair_file = f'{poutput}/{pscore}_{mode}_significant_pairs.txt'
gene_pair_file = f'{poutput}/{pscore}_{mode}_significant_pairs.csv'
write_significant_gp(p_values_adj, p_values_unadj, significant_score_indices, cluster_dict,
locus_dict, gene_pair_file, pfile_type, perform_clustering, metadata,
known_assoc)
Expand Down Expand Up @@ -238,9 +238,9 @@ def assemble_gp_line(gene_1, gene_2, file_type, p_unadj, p_adj, locus_dict, perf

if metadata is not None:
s += ","
s += ",".join(metadata.loc[gene_1, :])
s += ",".join(metadata.loc[gene_1, :].astype(str))
s += ","
s += ",".join(metadata.loc[gene_2, :])
s += ",".join(metadata.loc[gene_2, :].astype(str))

if known_assoc_to_write is not None:
s += ","
Expand Down