Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve English subtypes refs finder, add tests #1155

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

goose-life
Copy link
Contributor

No description provided.

self.subtypes_string = '|'.join([re.escape(s) for s in self.subtype_names + self.subtype_abbreviations])
self.full_subtypes = Subtype.objects.all()
self.subtypes = [s.name for s in self.full_subtypes] + \
[s.abbreviation for s in self.full_subtypes]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what we've learnt from term matching and italics, does it make sense to sort these longest first?

(No\.?\s*)?
(?P<num>\d+)
(\s+of\s+|/)
(?P<year>\d{{4}})
.{{0,60}}(\s|/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a bit nervous, it's pretty broad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neil had an interesting idea which maybe we need to start exploring: start with a broad regex, and then filter down edge cases with supplemental regexes. His argument is it helps to keep the regexes simple and understandable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg. this would be a false positive: as published in GN No 102. In section 4378 of the Act...

This stuff is tricky.

<paragraph eId="sec_1.paragraph-0">
<content>
<p>Something to do with <ref href="/akn/za/act/gn/2012/102">GN no 102 published on 5 March 2012</ref>.</p>
<p>And another thing about <ref href="/akn/za/act/si/1998/4">SI 4, published in Government Gazette 12345 of 31 January 1998</ref>.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should only markup the GN no 102 and SI 4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants