Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of command option -size #2

Open
YoonjinTKim opened this issue Nov 21, 2017 · 13 comments
Open

Usage of command option -size #2

YoonjinTKim opened this issue Nov 21, 2017 · 13 comments

Comments

@YoonjinTKim
Copy link

I have a question about the usage of -size commend option. In bin/tiedie code, it says that opts.size sets the desired relative size of the linker set of genes to be found by the algorithm. opts.size passed as size_control global variable and it only used in findLinkerCutoff in line 133 and line 402.

In def extractSubnetwork(up_heats, down_heats, up_heats_diffused, down_heats_diffused, size_control, set_alpha) in bin/tiedie line 106, it calls the findLinkerCutoff when the alpha value is none. findLinkerCutoff calls findLinkerCutoffMulti when there are two sets(source set, target set),and at findLInkerCutoffMulti line 411 tiedie_util.py.

The parameter size in findLInkerCutoffMulti only used to check if the size is 0 or not, and does not have any other use in the function, and filterLinekrs call (up_heat_diffused,down_heat_diffused,1) with the set size of 1 for filtering the minimum score at 411 tiedie_util.py.

I am having hard time understanding the proper size of cutoff for large network, and would you explain what would be appropriate way to find the cutoff? I have tried setting different sizes with using pagerank, but they all resulting the entire graph in tiedie.sif file.

Please let me know if you have any questions, and thanks in advance.

@epaull
Copy link
Owner

epaull commented Nov 22, 2017 via email

@YoonjinTKim
Copy link
Author

Thanks for your prompt response! For my input, the network contains 512 nodes and 158776 edges, with 14 upstream source nodes and 14 downstream target nodes. Since background network relatively large, I was not able to run the commend with pagerank option due to the memory issue. Would you elaborate what informal 'prior' means?

Thanks!

@epaull
Copy link
Owner

epaull commented Nov 22, 2017 via email

@YoonjinTKim
Copy link
Author

Sorry for the misunderstanding, the network graph actually contains 11470 nodes instead of 512. Also, since I intended to run this graph as unweighed graph, I manually assigned each heat as 1. Would this make a difference in result?

I ran commend as: python ./TieDIE/bin/tiedie --network (network file in .sif) --size 1.0 --up_heats (source file in .input) --down_heats (target file in .input) --output_folder (output directory) --pagerank true

Thank you so much for you feed back and I appreciate it. Please let me know if I made any mistake in here. This commend produces entire network in tiedie.sif file.

Regards,
Yoonjin Kim

@epaull
Copy link
Owner

epaull commented Nov 27, 2017 via email

@YoonjinTKim
Copy link
Author

YoonjinTKim commented Nov 28, 2017

Yes. Since all the raw data I am using is edge set, source set, and target set, I assigned weight of 1 in upheat.input file and downheat.input file, and I created .sif file where each row has the format "(source) '>' (target)". I just used place holder character for the second column. Since I did not assign 'inhibits" or "activates" in second column, it makes sense that it produces an empty tiedie.cn.sif file, but it still prints out all the edges in tiedie.sif file rather than 56-node solution that you mentioned above.

@epaull
Copy link
Owner

epaull commented Nov 29, 2017 via email

@YoonjinTKim
Copy link
Author

When I was trying to run small subset of network file, such as first 5000 edges, it normally produces zero division error saying:

Traceback (most recent call last):
File "../../bin/tiedie", line 391, in
score = scoreSubnet(subnet_soln_nodes, up_heats, down_heats, report_fh)
File "../../bin/tiedie", line 239, in scoreSubnet
score = float(len(Sr))/(len(S)*2) + float(len(Tr))/(len(T)*2) - penalty
ZeroDivisionError: float division by zero
make: *** [result] Error 1

Since there is no edge connected to the target set in particular subset.

If you could check inputs on your environment, that would be great! I can send them to your email in README.md

Regards,
Yoonjin Kim

@YoonjinTKim
Copy link
Author

So, here's my theory. Since Minimum heat that diffused from the large network is relatively very small such as shown:

min_heat

EPSILON value(initially 0.0001) in findLinkerCutoffMulti method for the cutoff might be too large. For example, if we set EPSILON value for 0.1, and calculate the minimum value in set of {0.01, 0.001, -0.01, -0.001}, cutoff of first element {0.01} would be - 0.09 which includes every elements in the given set.

I will investigate this on my end

@epaull
Copy link
Owner

epaull commented Nov 30, 2017 via email

@Januka-K
Copy link

Hi Evan,
I am having facing similar trouble as mentioned in the thread. I have a very large PPI network( 15747 nodes, 3527164 edges). I have two inputs on with 17 nodes and one with 86 nodes. I am getting almost the size of PPI network as a solution. I am wondering if the problem mentioned here has been solved or if you can help me figure out? Thank you in advance for your help.

@epaull
Copy link
Owner

epaull commented Sep 13, 2021 via email

@Januka-K
Copy link

Januka-K commented Sep 14, 2021

Thank you for your response. I have tried running it with several size settings ( 1,0.75,0.25). No matter the size factor, I get the same sized network solution ( nodes around 2300, and edges over a million). Note that I don't have the interaction type (inhibits, activates column)information in the search network, therefore there isn't any filter for logical pathway consistency. I am only looking at the 'tiedie.sif ' not at the 'tiedie.cn.sif' file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants