-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for vertex attribute 'author.role' & fix default aggregation levels #102
Comments
I've already fixed Issue 1 in commit 136c604. Issue 2 and Issue 3 still need to be addressed somehow. |
Issue 2
That is easily possible, I guess. We just need to do the check here and then clone the object as many times as needed. We may need to think about whether we need a deep clone or not. And the fix will likely only contain three to five lines, including documentation. Issue 3
That is definitely a problem. How about just adding something like a "recommended aggregation level" to the documentation? |
Let me just document our decisions here: Issue 2
As we already checked, a shadow clone is enough (it is ok that the cloned data objects hold the same project configuration). Issue 3
I've implemented a new function |
As all three issues in here are fixed now, I will close this issue. |
When adding vertex attributes, we sometimes split the project data several times even if we do not need the project data. Therefore, we should further improve the outcome of PR #93. This is also related to #92.
In the following, I list three issues, which occurred when using add-vertex-attribute functions, and provide possible approaches. Excuse me for putting so much information into one issue, but the three issues are closely related to each other.
Issue 1: Add already existing classification results as author role
When using the following function to use already existing classification results as network attributes, we do not need to split the project data:
In this specific function, we do not need the
project.data
and also theaggregation.level
does not make sense here, as we already have the classification results.Suggestion: Remove the
project.data
andaggregation.level
parameters here (as they are confusing) and instead of callingsplit.and.add.vertex.attribute
directly calladd.vertex.attribute
(with emptydata
).Issue 2: Repeated splitting for several aggregation levels (complete, all.ranges, project.all.ranges)
In addition, is it somehow possible to improve the performance of the
split.data.time.based.by.ranges
function when passing identicalranges
?Example: The function
split.data.time.based.by.ranges
is called when adding vertex attributes to a list of networks. Letaggregation.level = "complete"
and the length of the list be 60, for instance. Then the functionsplit.data.time.based.by.ranges
splits the data 60 times according to exactly the same ranges and also the log output contains 60 times the following statements:To reduce the log output (log this only once instead of 60 times exactly the same output) and also to reduce the computation time (compute only once instead of 60 times), it would be helpful to check whether all elements in
ranges
are identical, and if so, just split the data once and replicate the splitting result.Nevertheless, that is just an idea to improve the log output and possible improves the computation time -- but it will mess up the code...
Issue 3: Undocumented default aggregation levels and unexpected default aggregation levels
In addition, we should reconsider the default values of the
aggregation.level
parameters of all the functions. In the documentations of those functions, we did not specify a default value, but the default value isrange
in all cases. So, we should add the default value to the documentation.In addition, there are some functions where the default value should not be
range
, butcomplete
as this would make more sense (or is anyone interested in the first activity within the current range?):add.vertex.attribute.first.activity
add.vertex.attribute.artifact.first.occurrence
However, if we would change the default value here, the vector of possible aggregation levels would be different for different functions, and therefore you can't just copy and paste it any more. Hm...
I just wanted to bring those issues to mind. Issue 1, Issue 2, and Issue 3 should be somehow addressed, that is something we should discuss. @clhunsen What's your opinion here?
The text was updated successfully, but these errors were encountered: