-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create DATA_DIR constant that is pulled from env var #395
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,13 +4,12 @@ | |
formatting details (paragraph level, font, etc), but is surprisingly consistent. It is | ||
infrequently updated by a research group at Columbia University. | ||
""" | ||
from pathlib import Path | ||
from typing import Dict, List, Optional | ||
|
||
import docx | ||
import pandas as pd | ||
|
||
from dbcp.constants import US_STATES | ||
from dbcp.constants import DATA_DIR, US_STATES | ||
|
||
|
||
class ColumbiaDocxParser(object): | ||
|
@@ -63,7 +62,7 @@ def __init__(self) -> None: | |
} | ||
|
||
def load_docx( | ||
self, source_path=Path("/app/data/raw/RELDI report updated 9.10.21 (1).docx") | ||
self, source_path=DATA_DIR / "raw/RELDI report updated 9.10.21 (1).docx" | ||
) -> None: | ||
"""Read the .docx file with python-docx. | ||
|
||
|
@@ -91,7 +90,7 @@ def _remove_intro( | |
return paragraphs[idx:] | ||
raise ValueError("Could not find starting state") | ||
|
||
def _parse_values(self, text: str) -> None: | ||
def _parse_values(self, text: str) -> None: # noqa: C901 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. non-blocking: it might be worth pulling each section handler out into its own function, then using Are the sections always in a specific order? Or can they be a random jumble? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I'm not sure tbh. Trenton did this integration way back when. I think this is something to revisit when we update the data. |
||
"""Parse and assign values to the correct dataset based on the current hierarchical headings. | ||
|
||
Args: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: If I wanted to change
DATA_DIR
or somesuch could I put that inlocal.env
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the .env file!