Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vertex attributes (updated) #93

Merged
merged 34 commits into from
Feb 19, 2018
Merged

Add vertex attributes (updated) #93

merged 34 commits into from
Feb 19, 2018

Conversation

clhunsen
Copy link
Collaborator

@clhunsen clhunsen commented Feb 16, 2018

This is the updated version of PR #67, bringing some important improvements (diff:
https://github.com/flx5/codeface-extraction-r/compare/vertex-attributes...clhunsen:vertex-attributes). Basically, this tackles #34.

Thanks to @flx5 for this high effort! 😃

The open issues related to vertex attributes, all originating from the discussion in PR #67, are collected in issue #92.

Felix Prasse and others added 30 commits December 21, 2017 17:10
Add methods required to add vertex attributes to the graph.
Add method to calculate commit count as vertex attribute

Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
- Rename aggregation to aggregation.level and all to project
- Rename add.vertex.attributes.* methods to add.vertex.attribute.*
- Rename compute.vertex.attribute.with.n2r to add.vertex.attribute
- Rename compute.vertex.attribute to split.and.add.vertex.attribute
- Move networks.to.ranges to util-split and
  rename it to split.data.by.networks
- Remove explicit return statement where possible
- Add missing documentation

Signed-off-by: Felix Prasse <[email protected]>
Signed-off-by: Felix Prasse <[email protected]>
Add test for add.vertex.attribute
Add test for split.and.add.vertex.attribute
Add test for add.vertex.attribute.commit.count
Add test for add.vertex.attribute.author.email
Add test for add.vertex.attribute.artifact.count
Add test for add.vertex.attribute.first.activity

Signed-off-by: Felix Prasse <[email protected]>
The tests were referencing to a wrong project data variable

Signed-off-by: Felix Prasse <[email protected]>
Add test for add.vertex.attribute.author.role.simple
Add test for add.vertex.attribute.active.ranges

Signed-off-by: Felix Prasse <[email protected]>
Add artifact attributes
Add commit count based on committer

Signed-off-by: Felix Prasse <[email protected]>
In this patch, we mainly streamline the tests on network attributes, by
applying proper line breaks, introducing global constants, and fixing
the accidental introduction of factors for the developer roles.

This tackles some comments in PR #67.

Signed-off-by: Claus Hunsen <[email protected]>
With respect to PR #67, logging statements, line breaks, and
capitalization in sqldf-statements are updated.

Signed-off-by: Claus Hunsen <[email protected]>
With respect to PR #67, indentation, line breaks, coding flaws, and
anything else is tackled.

Adds a FIXME item regarding the first activity of an author, which will
be tackled in an upcoming commit.

Signed-off-by: Claus Hunsen <[email protected]>
The section dividers always end at column 77. ;)

Signed-off-by: Claus Hunsen <[email protected]>
To get the very first and the very last activity across all data
sources, i.e., the earliest/latest timestamp of all data sources, for a
ProjectData object, we add a parameter 'outermost' to
'ProjectData$get.data.timestamps' that makes this possible.

Props to @bockthom for his support.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
Formerly, we provided three different aggregation levels in the function
'split.data.by.networks': namely, "range", "cumulative", and "project".
We now extend and adapt this to the levels "range", "cumulative",
"all.ranges", "project.cumulative", "project.all.ranges", and
"complete".

The various aggregation levels work as follows:
- range: The project data will be split exactly to the time ranges
         specified by the networks' names.
- cumulative: The project data will be split exactly to the time ranges
         specified by the networks' names, but in a cumulative manner.
- all.ranges"}: The project data will be split exactly to the time range
         specified by the start of the first network and end of the last
         network. All data instances will contain the same data.
- project.cumulative: The same splitting as for \code{"cumulative"}, but
         all data will start at the beginning of the project data and
         *not* at the beginning of the first network.
- project.all.ranges: The same splitting as for \code{"all.ranges"}, but
         all data will start at the beginning of the project data and
         *not* at the beginning of the first network. All data instances
         will contain the same data.
- complete: The same splitting as for \code{"all.ranges"}, but all data
         will start at the beginning of the project data and end at the
         end of the project data. All data instances will contain the
         same data.

In this patch, we properly extend all functions and tests.

Props to @bockthom for his support in parts of this patch.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
The function 'add.vertex.attribute.author.role' now adds author-
classification attributes based on classification results. The function
'add.vertex.attribute.author.role.function' is now the one using a
classification function as a parameter.

For this to work, we add the range identifier as a parameter to the
anonymous calculation functions for adding the vertex attributes.

Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
Signed-off-by: Claus Hunsen <[email protected]>
Signed-off-by: Thomas Bock <[email protected]>
For better modularity, the function 'get.commit.data' is moved to the
data module, right where it obviously belongs.

Signed-off-by: Claus Hunsen <[email protected]>
This code originates from @flx5's PR on vertex attributes (#67).

Signed-off-by: Claus Hunsen <[email protected]>
@clhunsen clhunsen added this to the v3.1 milestone Feb 16, 2018
@clhunsen clhunsen mentioned this pull request Feb 16, 2018
@clhunsen clhunsen changed the title Add vertex attributes Add vertex attributes (updated) Feb 16, 2018
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After finally reviewing this pull request, let me conclude:

This pull request looks pretty good, but there is still room for improvement. Basically, there are two different types of issues:

  • Wrong or insufficient documentation (needs to be fixed, some of my comments may be too meticulous -- @clhunsen it is up to you whether you address all of them or just the important ones)
  • General consistency problems regarding copyright headers (discussion needed, see individual comments)

Please don't be angry about me for spotting all the small stuff...

util-data.R Outdated
#' @param split A list of numerics, indicating numbers of weeks into which the selected data
#' is to be split
#'
#' @return A data.frame indicating the slected \code{columns}, split into the given numbers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "slected"

## [email protected]
## (c) Felix Prasse, 2017
## [email protected]
## (c) Thomas Bock, 2018
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why you have added my name in all of the copyright headers in those files. Maybe it is due to the Signed-Off-Tags. Currently, in the released version, my name is not part of any copyright header (except for the network-metrics script). If you would like to make this consistent, we would have to search for all Signed-Off-Tags containing my name and adapt the copyright headers of the corresponding files accordingly...

Long story short, I don't like inconsistencies. Either we remove my name from all copyright headers in this pull request or I will search for all Signed-Off-Tags containing my name to make all copyright headers consistent (not any more in this pull request, but possibly in one of the next ones...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, this affects also commit 4810ac5, in which the function get.commit.data is moved from util-core-peripheral.R to util-data.R: The function get.commit.data originally was implemented by Ferdinand. Ferdinand is part of the copyright header of util-core-peripheral.R (which still contains functions implemented by him), but Ferdinand is not part of the copyright header of util-data.R. As the moved function was implemented by him, he should also get part of the copyright header there...

My suggestion: Remove commit 536965e from this pull request (just in order to merge consistent things and not block other subsequent pull requests from being merged) and postpone updating the copyright headers to a later pull request. Then we should carefully fix all the copyright headers... I guess, we should discuss this personally...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the commit for now. We need to discuss this especially with respect to the addition of a license in #90.

net.conf$update.values(list(author.relation = "cochange", simplify = FALSE))

## retrieve project data and network builder
project.data = ProjectData$new(proj.conf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems not to be up to date as the network builder is not retrieved here...

#' @param aggregation.level Determines the data to use for the attribute calculation.
#' One of \code{"range"}, \code{"cumulative"}, \code{"all.ranges"},
#' \code{"project.cumulative"}, \code{"project.all.ranges"}, and
#' \code{"complete"}. See a\code{split.data.by.networks} for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "a\code{split.data.by.networks}".
Also affects all the functions below...

return(nets.with.attr)
}

#' Add author role attribute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe start a new subsection here for a better a structure of the file?
## * Role --------------------------------------------------------------

#'
#' @param list.of.networks The network list
#' @param project.data The project data
#' @param name The attribute name to add [default: "commit.count"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong default value here. It should be commit.count.author.not.committer.

#'
#' @param list.of.networks The network list
#' @param project.data The project data
#' @param name The attribute name to add [default: "commit.count"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wrong default value here. It should be commit.count.committer.not.author.

#' @param name.column The name of the author or committer column
#'
#' @return A list of networks with the added attribute
add.vertex.attribute.commit.count = function(list.of.networks, project.data, name = "commit.count",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a short comment here regarding the name of the helper function: Currently, a non-expert user of the network library may use this function as the default function for commit count attributes, as the name of this functions sounds reasonable, instead of add.vertex.attribute.commit.count.author as auto-completion may also suggest add.vertex.attribute.commit.count in the first place. Are there any suggestions for a better name of the helper function to not be confused with the actual function to use?
(I would like to have this function as a private function if we would have dealt with classes here... - just to make my intention clear.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just renamed it to add.vertex.attribute.commit.count.helper. This should be enough for now.

Would we have a proper R package, this would be easy to hide. 😉

#' @param default.value The default value to add if a vertex has no matching value [default: 0]
#'
#' @return A list of networks with the added attribute
add.vertex.attribute.commit.count.committer = function(list.of.networks, project.data, name = "commit.count",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about changing the default name here (and also in the documentation) to commit.count.committer?
In the end, only the usual commit count -- which is the author commit count -- should have the default name commit.count -- or would you also prefer to rename that to commit.count.author then for consistency reasons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talk mainly about authors for now, we should withhold from renaming until we tackle #84.

return(nets.with.attr)
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this pull request, I hit on the (crazy) idea to perform author-role classification not based on author networks or author commit count, but on committer networks or committer commit count.
I know, it's getting freaky now, but I will add this idea to issues #84 or #92 soon to just not forget the idea.

This patch introduces only some minor fixes and inprovements to the
documentation for the changes in PR #93.

The most important change is the renaming of the function
'add.vertex.attribute.commit.count' to
'add.vertex.attribute.commit.count.helper', mainly for unambiguity and
consistency reasons.

Props to @bockthom for being that precise.

Signed-off-by: Claus Hunsen <[email protected]>
@clhunsen clhunsen merged commit 5e677d2 into se-sic:dev Feb 19, 2018
@clhunsen clhunsen deleted the vertex-attributes branch February 27, 2018 08:13
@clhunsen clhunsen mentioned this pull request Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants