-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of command option -size #2
Comments
Yoonjin,
Hi, yes I think I understand your question, but let me make sure I
understand your data. Can you tell me about how many nodes and edges are in
the input network you're using, and also how many nodes are in each of the
input sets? The size is always a fraction of the input set size, and it
represents a kind of informal 'prior'. Let me know, and we can figure it
out from there, thanks!
…-Evan
On Tue, Nov 21, 2017 at 6:29 PM, Yoonjin Kim ***@***.***> wrote:
I have a question about the usage of -size commend option. In bin/tiedie
code, it says that opts.size sets the desired relative size of the linker
set of genes to be found by the algorithm. opts.size passed as size_control
global variable and it only used in findLinkerCutoff in line 133 and line
402.
In def extractSubnetwork(up_heats, down_heats, up_heats_diffused,
down_heats_diffused, size_control, set_alpha) in bin/tiedie line 106, it
calls the findLinkerCutoff when the alpha value is none. findLinkerCutoff
calls findLinkerCutoffMulti when there are two sets(source set, target
set),and at findLInkerCutoffMulti line 411 tiedie_util.py.
The parameter size in findLInkerCutoffMulti only used to check if the size
is 0 or not, and does not have any other use in the function, and
filterLinekrs call (up_heat_diffused,down_heat_diffused,1) with the set
size of 1 for filtering the minimum score at 411 tiedie_util.py.
I am having hard time understanding the proper size of cutoff for large
network, and would you explain what would be appropriate way to find the
cutoff? I have tried setting different sizes with using pagerank, but they
all resulting the entire graph in tiedie.sif file.
Please let me know if you have any questions, and thanks in advance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASH6GgFZdeQizfe-ZjfQ0cxQOFS0oxJks5s41zOgaJpZM4QmpHA>
.
|
Thanks for your prompt response! For my input, the network contains 512 nodes and 158776 edges, with 14 upstream source nodes and 14 downstream target nodes. Since background network relatively large, I was not able to run the commend with pagerank option due to the memory issue. Would you elaborate what informal 'prior' means? Thanks! |
Interesting, sounds like a fully connected network as input: 512 choose 2
is ~131 thousand, so you must have multiple edges in there as well.
Normally, if you supply a size factor of 1.0 to the algorithm, assuming the
2 input gene sets don't overlap, you should get a new subnetwork with
(14+14 + 1.0*(14+14) = 56) subnetwork nodes, and that would mean the prior
just means you have an assumption that it should take about 28 "linker"
nodes to connect your two input sets. But in this case the size is more
like zero, since if everything is connected to everything you'd expect no
linker nodes to be needed to connect your set...I think that must be where
the issue is. Does that make sense, or maybe I'm missing some part of this.
Thanks!
…-Evan
On Wed, Nov 22, 2017 at 3:18 PM, Yoonjin Kim ***@***.***> wrote:
Thanks for your prompt response! For my input, the network contains 512
nodes and 158776 edges, with 14 upstream source nodes and 14 downstream
target nodes. Since background network relatively large, I was not able to
run the commend with pagerank option due to the memory issue. Would you
elaborate what informal 'prior' means?
Thanks!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASH6P5pddosGoIPp2lOSlWCHoSIEMgmks5s5IGUgaJpZM4QmpHA>
.
|
Sorry for the misunderstanding, the network graph actually contains 11470 nodes instead of 512. Also, since I intended to run this graph as unweighed graph, I manually assigned each heat as 1. Would this make a difference in result? I ran commend as: python ./TieDIE/bin/tiedie --network (network file in .sif) --size 1.0 --up_heats (source file in .input) --down_heats (target file in .input) --output_folder (output directory) --pagerank true Thank you so much for you feed back and I appreciate it. Please let me know if I made any mistake in here. This commend produces entire network in tiedie.sif file. Regards, |
Ok, that makes more sense, thanks. There's no way to set the graph/edge
weights--when you say you assigned each heat to '1', you mean the input
nodes? That would be fine, let me know.
…On Mon, Nov 27, 2017 at 1:20 PM, Yoonjin Kim ***@***.***> wrote:
Sorry for the misunderstanding, the network graph actually contains 11470
nodes instead of 512. Also, since I intended to run this graph as unweighed
graph, I manually assigned each heat as 1. Would this make a difference in
result?
I ran commend as: python ./TieDIE/bin/tiedie --network (network file in
.sif) --size 1.0 --up_heats (source file in .input) --down_heats (target
file in .input) --output_folder (output directory) --pagerank true
Thank you so much for you feed back and I appreciate it. Please let me
know if I made any mistake in here. This commend produces entire network in
tiedie.sif file.
Regards,
Yoonjin Kim
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASH6FH-t-UhY57gXYylB7bBV5QdyRTDks5s6v1ygaJpZM4QmpHA>
.
|
Yes. Since all the raw data I am using is edge set, source set, and target set, I assigned weight of 1 in upheat.input file and downheat.input file, and I created .sif file where each row has the format "(source) '>' (target)". I just used place holder character for the second column. Since I did not assign 'inhibits" or "activates" in second column, it makes sense that it produces an empty tiedie.cn.sif file, but it still prints out all the edges in tiedie.sif file rather than 56-node solution that you mentioned above. |
Really strange, I haven't seen that before. If you want to send me your
input files I can try running it myself, otherwise there isn't much I can
do other than say I'd try running smaller inputs until you get it working
on a smaller example, and then scale up until you see what exactly breaks
it. (or look at the python code and debug)
…-Evan
On Tue, Nov 28, 2017 at 3:28 PM, Yoonjin Kim ***@***.***> wrote:
Yes. Since all the raw data I am using is edge set, source set, and target
set, I assigned weight of 1 in upheat.input file and downheat.input file,
and modified the set of edge as (source) '>' (target). Since I did not
assign 'inhibits" or "activates" in second column, it makes sense that it
produces empty tiedie.cn.sif file, but it still prints all the edges in
tiedie.sif file
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASH6NHT6vTsOCkLiGj4ImpRs8yO9oYMks5s7GzigaJpZM4QmpHA>
.
|
When I was trying to run small subset of network file, such as first 5000 edges, it normally produces zero division error saying: Traceback (most recent call last): Since there is no edge connected to the target set in particular subset. If you could check inputs on your environment, that would be great! I can send them to your email in README.md Regards, |
So, here's my theory. Since Minimum heat that diffused from the large network is relatively very small such as shown: EPSILON value(initially 0.0001) in findLinkerCutoffMulti method for the cutoff might be too large. For example, if we set EPSILON value for 0.1, and calculate the minimum value in set of {0.01, 0.001, -0.01, -0.001}, cutoff of first element {0.01} would be - 0.09 which includes every elements in the given set. I will investigate this on my end |
Yoonjin,
Yes, I think you may be right, that's a good hypothesis and this may be an
interesting bug. I've never seen the minimum heat get that low--this is a
much larger and more diffuse network (also the use of pagerank rather than
diffusion is probably contributing to this--I'd recommend heat diffusion
instead, if you can get it working with memory constraints). I can think of
a few solutions, but what if you scale up the input heats from '1' to
'1e20' until you get min heats in the > 0.1 range? If that fixes it we'll
know that's the problem. (and if it is I'll credit you for finding the bug
in the github, if you like--either you can fix in your branch and send me a
pull request, or I can add your github handle or name in the comments)
thanks!
-Evan
…On Thu, Nov 30, 2017 at 11:30 AM, Yoonjin Kim ***@***.***> wrote:
So, here's my theory. Since Minimum heat that diffused from the large
network is relatively very small such as shown:
[image: min_heat]
<https://user-images.githubusercontent.com/9061404/33441630-21e48008-d5c1-11e7-8e65-1eca0734e96f.png>
EPSILON value(initially 0.0001) in findLinkerCutoffMulti method for the
cutoff might be too large. For example, if we set EPSILON value for 0.1,
and calculate the minimum value in set of {0.01, 0.001, -0.01, -0.001},
cutoff of first element {0.01} would be - 0.09 which includes every
elements in the given set.
I will investigate this on my end
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASH6BoThCMxDuss6gettdmesYBqB9ftks5s7tg4gaJpZM4QmpHA>
.
|
Hi Evan, |
Yes, that must be a very highly connected network, with over 3.5 million
edges, it's not surprising you get so many connected nodes from the input
set. However, did you set the size factor "--size option" when running it?
The default is 1, meaning you should get an equivalent number of linker
genes to your input set size (86+17=103 genes). That's still a large
subnetwork, but only about 200 nodes. Can you try running it with smaller
size settings of 1, 0.5, and 0.25?
…On Mon, Sep 13, 2021 at 4:56 PM Januka-K ***@***.***> wrote:
Hi Evan,
I am having facing similar trouble as mentioned in the thread. I have a
very large PPI network( 15747 nodes, 3527164 edges). I have two inputs on
with 17 nodes and one with 86 nodes. I am getting almost the size of PPI
network as a solution. I am wondering if the problem mentioned here has
been solved or if you can help me figure out? Thank you in advance for your
help.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACIP2GMUMG7GCDCXZKHCYLUBZQRPANCNFSM4EE2SHAA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Thank you for your response. I have tried running it with several size settings ( 1,0.75,0.25). No matter the size factor, I get the same sized network solution ( nodes around 2300, and edges over a million). Note that I don't have the interaction type (inhibits, activates column)information in the search network, therefore there isn't any filter for logical pathway consistency. I am only looking at the 'tiedie.sif ' not at the 'tiedie.cn.sif' file. |
I have a question about the usage of -size commend option. In bin/tiedie code, it says that opts.size sets the desired relative size of the linker set of genes to be found by the algorithm. opts.size passed as size_control global variable and it only used in findLinkerCutoff in line 133 and line 402.
In def extractSubnetwork(up_heats, down_heats, up_heats_diffused, down_heats_diffused, size_control, set_alpha) in bin/tiedie line 106, it calls the findLinkerCutoff when the alpha value is none. findLinkerCutoff calls findLinkerCutoffMulti when there are two sets(source set, target set),and at findLInkerCutoffMulti line 411 tiedie_util.py.
The parameter size in findLInkerCutoffMulti only used to check if the size is 0 or not, and does not have any other use in the function, and filterLinekrs call (up_heat_diffused,down_heat_diffused,1) with the set size of 1 for filtering the minimum score at 411 tiedie_util.py.
I am having hard time understanding the proper size of cutoff for large network, and would you explain what would be appropriate way to find the cutoff? I have tried setting different sizes with using pagerank, but they all resulting the entire graph in tiedie.sif file.
Please let me know if you have any questions, and thanks in advance.
The text was updated successfully, but these errors were encountered: