Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAJOR Code Refactoring #5

Open
njfritter opened this issue Jun 11, 2018 · 6 comments
Open

MAJOR Code Refactoring #5

njfritter opened this issue Jun 11, 2018 · 6 comments
Assignees

Comments

@njfritter
Copy link
Owner

njfritter commented Jun 11, 2018

Had Raul review my project and he absolutely grilled me on stuff. So here's what I need to do:

  • Redo formatting of scripts; some have 2 spaces as indent
  • Remove all extraneous packages (i.e. using both pandas and csv modules)
  • Reformat helper_functions, remove unnecessary ones, add purpose and inputs
  • Redo tokenization section to account for special twitter stuffs

Will add on as I find more

@njfritter
Copy link
Owner Author

njfritter commented Jun 11, 2018

Preliminary Checklist:

  • Redo formatting of scripts; some have 2 spaces as indent
  • Remove all extraneous packages (i.e. using both pandas and csv modules)
  • Reformat helper_functions, remove unnecessary ones, add purpose and inputs
  • Redo tokenization section to account for special twitter stuffs

@njfritter njfritter self-assigned this Jun 11, 2018
@njfritter
Copy link
Owner Author

Did everything above; still haven't gotten the tokenization down pat but things are looking good.

Here is a pull request detailing the work so far

@njfritter
Copy link
Owner Author

njfritter commented Jun 11, 2018

In order to continue, I will need a successful run where I can generated a tokenized dataset. From there I will look into removing stopwords.

Issue here

Moving to "On Hold"

Update: Issue complete, moving back to "In Progress"

@njfritter
Copy link
Owner Author

njfritter commented Jun 12, 2018

Next steps:

  • Make sure code can run on different machine (or different directory location)
  • Make code to generate various actionable insights (exploratory analysis)
    • Hashtag Frequency
    • Url frequency
    • N-gram frequency

Update: I will only be making sure the code can run on a different machine/directory location. I will be taking the exploratory analysis points and moving them to a different issue

@njfritter
Copy link
Owner Author

Will be doing exploratory analysis for the time being (issue here), moving to "On Hold"

@njfritter
Copy link
Owner Author

Update:

  • Changed directory structure so that there would be a separate directory for raw, untouched data and for processed data
  • Added directory for model objects
  • Updated READMEs
  • Still altering functions and helper_functions.py script (will likely split up by category)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant