Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge "sibling" RSS item that belong to a multi-entry docket item #217

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

johnhawkinson
Copy link
Contributor

Fixes freelawproject/recap#248.

Sorry this history is a little messier than it should be, but it's all going to be squashed so there's probably not much point in cleaning it up. I am really sad that we don't get to keep 49cbaf2 which the non-Python progarmmer in me really likes as a solution to avoid if expressions that don't line up:

+        def twin_entries(a, b):
+            fields = ['title', 'link', 'id', 'published']
+            matching_fields = (a[f] == b[f] for f in fields)
+            return all(matching_fields)
@@ -122,10 +127,7 @@ def data(self):
             if (
                     preventry and
                     prevdata[u'docket_entries'] and
-                    entry.title == preventry.title and
-                    entry.link == preventry.link and
-                    entry.id == preventry.id and
-                    entry.published == preventry.published and
+                    twin_entries(entry, preventry) and
                     len(de) > 0  # xxx
             ):

I'm also not 100% the explanation in a978447 is accurate; and if it is, why was that the right way to go?:

-        """Override this to create a list of docket-like objects instead of the
-         usual dict that is usually provided by the docket report.
+        """Return a list of docket-like objects instead of the usual dict that
+         is usually provided by the BaseDocketReport superclass.

Attempt to explain why we override this function.
I...actually don't understand why we do, so this explanation may be wrong.
It seems like we could have fit within the BaseDocketReport framework
of returning our information as `docket_entries` rather than `data` but
maybe I'm missing something.
Snazzier than setting an old= var at the end of the loop.
prevdata -> lastdata (well, not really)

preventry -> previous_entry

Elminate `de` and just use data['docket_entries'] which we moved up
Because otherwise tests break. I think there may be a platform issue
here, but this fix is obviously correct, so I'm not worried about
masking it.
# the previous entry's and continue the loop.
if (
data_list and data_list[-1][u'docket_entries']
and data[u'docket_entries']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these two lines are of a different nature than the rest of the equality lines. Is it worth pulling them out of the if statement here? For example, maintain the logic, but do these types of tests in an earlier if statement? Something like:

if data_list and data_list[-1][u'docket_entries'] and data[u'docket_entries']:
    if all([entry.title == previous_entry.title, ...]):

It might allow you to put the comments about equality closer to where the equality is tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err...I think it is advantageous to not materially increase the nesting level of logic with gratuitous nested ifs.
The comments didn't seem to me to be far removed from equality — to me they made sense before the word if.

Perhaps your reaction is to the fact that the comments don't discuss the first 2 lines of the if expressions…that's because I was focusing on what was maybe a little specialized. But certainly the comment could be broadened to discus the first 2 lines, i.e. "If there are docket entries associated with the previous entry and the current entry, and if those two entries match in metdata…"

But also, there's nothing stopping us from putting comments in the midst of the if expressions, so if there's really a desire for "Instructions at the point of need" then that can just happen, there's no need to change the structure of the if statement.

So if you want to change the structure of the if, probably a better reason is required. I actually did try the all() style and concluded erroneously that it didn't work here (I think I omitted the []), but that said it's not terribly familiar to me so I wouldn't vote for it though I don't oppose it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the extraneous nesting bugged me too. If you pull all this logic into a function, you could reverse the logic here and do:

if not all([this, that, the other thing]):
    return  # Or what have you

That'd avoid the deep logic and keep the two things apart?

and entry.id == previous_entry.id
and entry.published == previous_entry.published
):
data_list[-1][u'docket_entries'][0][u'short_description'] += (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a line like this deserves a comment, if even a short one. Or maybe break the bits into variables for readability?

previous_entry_desc = data_list[-1][u'docket_entries'][0][u'short_description']
current_entry_desc = data[u'docket_entries'][0][u'short_description']
combined_desc = ' AND '.join([previous_entry_desc, current_entry_desc])
data_list[-1][u'docket_entries'][0][u'short_description'] = combined_desc

That's a bit wordy, but clearer. This multiline thing is pretty bad.

Copy link
Contributor Author

@johnhawkinson johnhawkinson May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think that is clearer.
Forcing the repetition of data_list[-1][u'docket_entries'][0][u'short_description'] is something I would like to avoid, because it requires the reader to parse the 5 levels of hierarchy in their brain twice, and it's not obvious until inspection that they are the same. Edit: esp. when they're not consecutive.

Of course, the flipside is that I suspect you have an unfamiliarity with the += operator and do not love it the way you should, and no amount of cajoling will instill that love in you until you have worked with it for a long time. To my eye the avoidance of complex repetition is the strongest case for the so-called "addition assignment" operator. (The real problem is you can't do many of the usual fixes because the complexity is in an lvalue not just an rvalue.

And I definitely don't think that ' AND '.join([a, b])
is somehow clearer than a+' AND '+b.

I'll think about how to write this better.

At one point I had prevdata (or previous_data) instead of data_list[-1] but I axed it because I felt it wasn't helping. (d59e007)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm familiar with the +=. It's more about how you have three lines of code to do one small thing. That's a smell, but I'll await further approaches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me:

I suspect A and B

Mike:

Not A.

So B then. As I feared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I suspect you won't go for this one, but it kind of jumped out at me so I thought I'd offer it here:

    def entry(d):
        return d[u'docket_entries'][0]

    def get_append_shorttext(de, text=''):
        de[u'short_description'] += text
        return de[u'short_description']

    previous_entry = entry(data_list[-1])
    current_text = get_append_shorttext(entry(data))
    get_append_shorttext(previous_entry, ' AND ' + current_text)

Think of it as a conversation-starter :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a lot of indirection in there, and get_append_shorttext is either getting or setting depending on the text argument....not great.

Hm. What about something like:

previous_entry = data_list[-1][u'docket_entries'][0]
current_entry = data[u'docket_entries'][0]
previous_entry[u'short_description'] += ' AND ' + current_entry[u'short_description']

That gets us back down to three lines, avoids repetition of the 5 levels of hierarchy, and uses += instead of a join? No additional helper functions needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, while I'm functionally happen with that, it fails PEP-8. And it's actually kind of tricky to fix, and it's why I introduced the helper function, which I agree is totally ridiculous and I'm not sure I could dignify by calling it a strawman:

#        1         2         3         4         5         6         7         8
#2345678901234567890123456789012345678901234567890123456789012345678901234567890
                previous_entry[u'short_description'] += ' AND ' +current_entry[u'short_description']  #1
                previous_entry[u'short_description'] += (  #2
                    ' AND ' + current_entry[u'short_description'])
                previous_entry[u'short_description'] += (  #3
                    ' AND ' +
                    current_entry[u'short_description'])
                previous_entry[u'short_description'] += (  #4
                    ' AND ' +
                    current_entry[u'short_description']
                )
                previous_entry[u'short_description'] += \ 
                    ' AND ' + current_entry[u'short_description']  #5

Although I find 5 tempting, I think that backslash terminates are disfavored (although not against PEP8?).
And of 2,3,4 I dunno. I had the sense you wanted to avoid multiple lines for this expression, and yet there's a certain additional clarity. What's your preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 5 has my vote followed closely by 1 (a minor PEP8 violation isn't the end of the world in my book). 2, 3, and 4 just don't seem worth it. I leave the rest of this to you. Let's land this.

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One other thing that doesn't work in a line comment: What about pulling all this logic, comments, etc into its own little method/function so that the data method is still small? This feels like a lot for it.

@johnhawkinson
Copy link
Contributor Author

What about pulling all this logic, comments, etc into its own little method/function so that the data method is still small?

Well, doing that would basically pull 90% of the bulk out of data() into newmethod() and not make newmethod() any smaller. It didn't seem like the complexity really affected one's ability to understand what data() is doing, but I can see the structural argument for it.

I don't think there's much benefit but I'll give it a whirl.

It's super-confusing to talk about "docket entries" as well as "RSS
entries" and variables called "entry" don't help disambiguate this.

Unfortunately we can't do much about the fact that feedparser's dict
of items is `.entries` but hopefully this is still a readability
improvement.
@johnhawkinson
Copy link
Contributor Author

What about pulling all this logic, comments, etc into its own little method/function so that the data method is still small?

Well, doing that would basically pull 90% of the bulk out of data() into newmethod() and not make newmethod() any smaller. It didn't seem like the complexity really affected one's ability to understand what data() is doing, but I can see the structural argument for it.

I don't think there's much benefit but I'll give it a whirl.

Err, so...maybe you had some better idea, but doing this really doesn't seem [to me] to work: johnhawkinson@ad5be23 (specifically these lines).

I mean, the function needs access to a lot of internal state from the principle for loop of data(), which already hurts, so we have to pass a lot of args (unless we make it sub-function of data(), which I guess we could, but...)

And it also has to tell that for loop whether to iterate normally or to continue to avoid adding the item to data_list.

So I don't think this is a great way to go. But I'm a little tired, so maybe I'm missing something important.

On the other hand, we could also duplicate the for loop (thus reverting data()'s for loop to one that doesn't use complexity like previous_and_next(). It's not like performance is critical, right? But...

Hoping for a better suggestion...

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping for a better suggestion...

I can't figure out Github's reply mechanism for your comment, but what about doing this as a separate step after we loop through and create all the dockets. So, instead of injecting doing the merging in the currently loop, we create a separate function after the loop terminated that does the merging?

and entry.id == previous_entry.id
and entry.published == previous_entry.published
):
data_list[-1][u'docket_entries'][0][u'short_description'] += (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's a lot of indirection in there, and get_append_shorttext is either getting or setting depending on the text argument....not great.

Hm. What about something like:

previous_entry = data_list[-1][u'docket_entries'][0]
current_entry = data[u'docket_entries'][0]
previous_entry[u'short_description'] += ' AND ' + current_entry[u'short_description']

That gets us back down to three lines, avoids repetition of the 5 levels of hierarchy, and uses += instead of a join? No additional helper functions needed?

@johnhawkinson
Copy link
Contributor Author

I can't figure out Github's reply mechanism for your comment,

Ha! It was just a regular "reply" at the bottom of the page.

but what about doing this as a separate step after we loop through and create all the dockets. So, instead of injecting doing the merging in the currently loop, we create a separate function after the loop terminated that does the merging?

Umm, sure. It's what I suggested above:

On the other hand, we could also duplicate the for loop (thus reverting data()'s for loop to one that doesn't use complexity like previous_and_next(). It's not like performance is critical, right? But...

I figured the ugliness of looping twice would doom this idea, but if it's your preference I can do that.

@mlissner
Copy link
Member

I figured the ugliness of looping twice would doom this idea, but if it's your preference I can do that.

I actually don't mind processing data multiple times like that. Reminds me of map/reduce/filter kinds of processing.

@CLAassistant
Copy link

CLAassistant commented Dec 18, 2021

CLA assistant check
All committers have signed the CLA.

@flooie
Copy link
Contributor

flooie commented Jan 15, 2023

@johnhawkinson hi 👋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants