Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embed sixified jsondate into jurisraper/lib #249

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

umeboshi2
Copy link
Contributor

In another discussion, I took note that jsondate wasn't compatible with py3.

I decided to fork the repo and fix it, but since the package consists of a single small __init__.py file and a single test file, I decided it would be just as easy to embed it into the project. Porting the package to use six was pretty straightforward, and it seems that the upstream repo is effectively dead.

It seems that jsondate is only being used i pacerdocket and tests/test_pacer, so I don't think that this code will be too difficult to maintain in the future. The test module adds 9 tests, which are the same tests as upstream, though slightly modified to use six for future capability. The tests should execute at the very end of the test run.

@mlissner
Copy link
Member

mlissner commented Jan 4, 2019

Neat! I just wrote this issue: rconradharris/jsondate#8 in hopes that they'll transfer the code to us, but I'm doubtful they get Github notifications. Let's give it a few days.

If they don't respond, I think I'd rather fork jsondate and release freelawproject/jsondate2 or something instead of vendoring it. Seems better for the community and for separation of concerns.

@umeboshi2
Copy link
Contributor Author

Good luck getting a response! There seems to be no response to any issue or pr in the repo. The repo is four commits deep, with the latest being a few years ago. If it comes to a fork, you can use the one I made yesterday, https://github.com/umeboshi2/jsondate. I agree that forking is better than embedding. I just did this since the package is so small and simple, and upstream is very, very quiet.

@arderyp
Copy link
Collaborator

arderyp commented Jan 4, 2019

looks like this fork has six implementation and this fork has py3 support

While I commend to effor to fork yourself @umeboshi2, I'm wondering if an institutional fork might be more robust. If, for example, you ever go silent on GH, then we are in the same position with your fork. But, if the fork is owned by an institution that we know will remain active, and which has multiple developers, then I think there's less likelihood of the project being abandoned again--and I think FLP checks both of these boxes.

@umeboshi2
Copy link
Contributor Author

Yes, I created the fork because I only saw nitz's fork on pypi, but that fork is not py2 compatible. Ilya's fork seems to be better since it uses six and should work for each python version. However, with respect to an institutional fork, it seems that rharris is the better option, since he's listed as part of HP. Nevertheless, on the overview pages, it seems that Ilya is the most active on github of all of these candidates. Interestingly, it seems that the amount of ascii characters in just this conversation exceeds the amount of characters in the actual code being discussed.

@arderyp
Copy link
Collaborator

arderyp commented Jan 4, 2019

"Interestingly, it seems that the amount of ascii characters in just this conversation exceeds the amount of characters in the actual code being discussed."

This is a hilarious observation

@umeboshi2
Copy link
Contributor Author

Feel glad that we aren't using nodejs, where many important packages are essentially one-liners. :)

Example: https://github.com/then/is-promise

umeboshi@bard:~/tmp/is-promise$ wc  *
  13   32  301 component.json
   5   24  165 index.js
  18  167 1058 LICENSE
  22   47  453 package.json
  32   55  829 readme.md
  56  162 1483 test.js
 146  487 4289 total

@mlissner
Copy link
Member

mlissner commented Jan 8, 2019

I think we have a decision from email to create freelawproject/jsondate3 or 6 or something. @umeboshi2 can you tell me what repo to make? I'll give you permissions to set it up, and then we can do the pypi rigamarole.

@umeboshi2
Copy link
Contributor Author

I sent a PR to nitz at nitz14/jsondate#1 late last night. He's already uploading to pypi using "jsondate3". If he accepts the PR and uploads a new version, the problem will be solved. The only change I made between his fork and my fork, excepting the javascript date handling, is to replade u'' with six.text_type(), since I don't know the py3 versions he may have to support.

If he doesn't respond, a new repo can be made under freelaw and filled with my fork. My fork didn't include the javascript format feature. If the new repo is to upload to pypi, we can't use "jsondate3", though "jsondate6" should be available.

I'm not sure how you want to do the pypi rigamarole, but I have been using the travis deploy feature. I started using the audreyr/cookiecutter-pypackage repo to start new python packages, which makes pypi uploads a bit easier. I actually use my own fork for yaml config, instead of json config.

@umeboshi2
Copy link
Contributor Author

Oops, I forgot to mention that it would be nice to wait a little bit for nitz to see if he's agreeable before adding yet another "jsondate" package to pypi.

@mlissner
Copy link
Member

mlissner commented Jan 8, 2019

Well, nitz does not look like a very active Github user, but we can hope.

@umeboshi2
Copy link
Contributor Author

True, but I feel we can wait a little bit. The jsondate package isn't really holding things back from juriscraper being py3 capable. There are also json comparison issues when testing in py3 that are more important. Since that "futureproofing" PR has been merged, I was hoping one of y'all would run the tests in py3 and provide some suggestions on what to do about the json compare problems. The jsondate problem will be easy to solve, regardless of which direction we take, but even so, the json compare problem will still persist.

@umeboshi2
Copy link
Contributor Author

Also forgot to mention that opinion scrapers don't load when testing in py3, but produce a failure. I submitted the "futureproofing" PR because the code changes were needed, and the tests passed in py2, but more work will need to be done to get juriscraper working with py3. It's been a while since I ran the tests in py3, so I forgot the details. I ran them again a minute ago, so here are some snippets of the results:

test_parsing_rss_parsing (tests.test_pacer.PacerRssFeedTest) ... /freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/juriscraper/lib/html_utils.py:73: DeprecationWarning: The unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.
  return h.unescape(s)
FAIL
======================================================================
FAIL: test_scrape_all_example_files (tests.test_everything.ScraperExampleTest)
Finds all the $module_example* files and tests them with the sample
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 109, in __call__
    function, args, kwargs
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 124, in _execute_function
    return self._handle_function(fn=handle_function)
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 148, in _handle_function
    return fn(cassette)
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 117, in handle_function
    return function(*args, **kwargs)
  File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_everything.py", line 201, in test_scrape_all_example_files
    json_data[i],
AssertionError: {'download_urls': 'tests/examples/opinions/u[329 chars]alse} != {'date_filed_is_approximate': False, 'case_d[331 chars]alse}
  {'blocked_statuses': False,
   'case_dates': '2014-08-13',
   'case_names': 'Bizhan Niazi Logistic Services Company',
   'date_filed_is_approximate': False,
   'docket_numbers': 'ASBCA No. 59205',
   'download_urls': 'tests/examples/opinions/united_states/2014/59205%20Bizhan%20Niazi%20Logistic%20Services%20Company%208.13.14.pdf',
-  'judges': "O'Sullivan",
+  'judges': "O\\'Sullivan",
?              ++

   'precedential_statuses': 'Published'}

======================================================================
FAIL: test_parsing_rss_parsing (tests.test_pacer.PacerRssFeedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_pacer.py", line 645, in test_parsing_rss_parsing
    self.parse_files(path_root, '*.xml', PacerRssFeed)
  File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_pacer.py", line 566, in parse_files
    self.assertEqual(j, data)
AssertionError: Lists differ: [{'da[38 chars]ne, 'jurisdiction': '', 'date_converted': None[1047 chars]12'}] != [{'da[38 chars]ne, 'docket_entries': [{'date_filed': datetime[1037 chars]12'}]

First differing element 1:
{'dat[37 chars]ne, 'jurisdiction': '', 'date_converted': None[467 chars]212'}
{'dat[37 chars]ne, 'docket_entries': [{'date_filed': datetime[457 chars]212'}

  [{'assigned_to_str': '',
    'case_name': 'Dozier v. CN of Dundalk, Inc. d/b/a CNAC of D',
    'cause': '',
    'court_id': 'mdb',
    'date_converted': None,
    'date_discharged': None,
    'date_filed': None,
    'date_terminated': None,
    'demand': '',
    'docket_entries': [{'date_filed': datetime.date(2018, 4, 24),
                        'description': '',
                        'document_number': None,
                        'pacer_doc_id': '',
                        'pacer_seq_no': None,
                        'short_description': 'Close Adversary Case'}],
    'docket_number': '18-00079',
    'jurisdiction': '',
    'jury_demand': '',
    'nature_of_suit': '',
    'pacer_case_id': '711557',
    'parties': None,
    'referred_to_str': ''},
   {'assigned_to_str': '',
    'case_name': 'Unknown Case Title',
    'cause': '',
    'court_id': 'mdb',
    'date_converted': None,
    'date_discharged': None,
    'date_filed': None,
    'date_terminated': None,
    'demand': '',
    'docket_entries': [{'date_filed': datetime.date(2018, 4, 24),
                        'description': '',
-                       'document_number': '1',
?                                          ^^^

+                       'document_number': None,
?                                          ^^^^

-                       'pacer_doc_id': '092041355320',
?                                        ------------

+                       'pacer_doc_id': '',
-                       'pacer_seq_no': '2',
?                                       ^^^

+                       'pacer_seq_no': None,
?                                       ^^^^

                        'short_description': 'Miscellaneous Motion'}],
    'docket_number': '18-90003',
    'jurisdiction': '',
    'jury_demand': '',
    'nature_of_suit': '',
    'pacer_case_id': '713212',
    'parties': None,
    'referred_to_str': ''}]

----------------------------------------------------------------------
Ran 70 tests in 188.509s

FAILED (failures=2, skipped=13)
Test failed: <unittest.runner.TextTestResult run=70 errors=0 failures=2>
0. Doing /freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/examples/pacer/rss_feeds/mdb_1.xml   error: Test failed: <unittest.runner.TextTestResult run=70 errors=0 failures=2>

@mlissner
Copy link
Member

Small note here that I forked and released a new tool called jsondate3-aware: https://pypi.org/project/jsondate3-aware/. I'll land it as part of Juriscraper in a bit, but it should help here once it's ready.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants