You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 24, 2022. It is now read-only.
Referring to issue #34, the behavior and abilities of Codeface need to be documented. What kinds of From-line formats are supported when supplying mbox files to the mailing-list analysis of Codeface?
With the patch from issue #34, the following "abominations" are supported, additionally to the standard format Hans Huber <[email protected]> (according to @wolfgangmauerer on the maling list):
Hans Huber [email protected]
Hans Huber huber at hubercorp.com
Hans Huber ("AT" instead of "at" also works) [email protected] Hans Huber
hans huber @ hubercorp.com Hans Huber
hans huber @ hubercorp.com (Hans Huber)
Furthermore, we have the via pattern (such as Hans Huber via corp-dev <[email protected]>) and likely others. Documentation on the treatment would help users (e.g., "The via pattern gets treated as follows: Remove the 'via ...' part and use the mail address as is." [I am not sure that this is actually the way it is handled, hence, this ticket...]).
Things to do
Document the various formats (abominations or not) that are supported by Codeface.
Factor out the processing routines and make them independent of document processing.
Implement a unit test case for all possibilities
The text was updated successfully, but these errors were encountered:
Am 17/11/2015 um 17:01 schrieb Andreas Ringlstetter:
Which of these edge cases are specific to transforming incompatible mbox
formats,
which are specific to the ML analysis,
and which are possibly also effecting the parsing of Sign-Off patterns
in the VCS analysis?
none of them is specific to anything -- it's just that the amount of
creativity that goes into coming up with bogus formats for email
addresses in mails considerably exceeds the amount found in
tags.
As I suggested in the corresponding thread, it is surely useful to
separate the cleanup operations from document processing and make
the routines generically available.
There is also the |Huber, Hans| variation of names for all patterns.
This is already handled in the idManager.py, but not in the ML analysis.
thanks for catching this -- I was discussing this with Mitchell in this
thread, and he's currently looking into what the majority of bogus
use-cases for this pattern is.
—
Reply to this email directly or view it on GitHub #35 (comment).
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Referring to issue #34, the behavior and abilities of Codeface need to be documented. What kinds of
From
-line formats are supported when supplying mbox files to the mailing-list analysis of Codeface?With the patch from issue #34, the following "abominations" are supported, additionally to the standard format
Hans Huber <[email protected]>
(according to @wolfgangmauerer on the maling list):Furthermore, we have the via pattern (such as
Hans Huber via corp-dev <[email protected]>
) and likely others. Documentation on the treatment would help users (e.g., "The via pattern gets treated as follows: Remove the 'via ...' part and use the mail address as is." [I am not sure that this is actually the way it is handled, hence, this ticket...]).Things to do
The text was updated successfully, but these errors were encountered: