Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scripts to automatically grow datasets #81

Merged
merged 20 commits into from
Oct 14, 2016
Merged

Conversation

wallnerryan
Copy link
Contributor

@wallnerryan wallnerryan commented Oct 12, 2016

  • these scripts expect data to exit in the app, probably by dataimport initial snapshot.
  • use fake factory for names and uses vehicle inventory to add more vehicles

Fixes #32
Fixes #49

@wallnerryan
Copy link
Contributor Author

Example Usage

$ ./dataimport/grow-datasets/dealers/Dockerbuild.sh
682a6c45a4b2365ec2d6237ea3f7167037d2cfc8f89e0936444698fe06fe2917

$ ./dataimport/grow-datasets/vehicles/Dockerbuild.sh
fcc7b2304c67417fea962a0af3314d635d8081b469ad2e1a404d721947a9774e

$ docker ps
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS              PORTS               NAMES
f22cb68e6bcf        clusterhq/add-dealers-loop    "python add_dealers.p"   2 seconds ago       Up 2 seconds                            sleepy_euler
f3fd3823635f        clusterhq/add-vehicles-loop   "python add_vehicles."   17 seconds ago      Up 17 seconds                           agitated_leakey


$:-> docker logs f22cb68e6bcf
Added Dealer: {"phone": "555-555-5555", "name": "Lauren Thompson's, Gregory Smith DDS's & Gwendolyn Mcfarland's Auto Dealership", "addr": "62097 Cox Mission Suite 281 East Brett, NY 57755"}


$:-> docker logs f3fd3823635f
19825
Added Vehicle: {"model": "ACCORD", "year": "2013", "dealership": "a2c1f596-d1fe-4096-849f-b4b4b2f85b96", "make": "HONDA", "vin": "SY2RPPUI8I4X4HIDV"}
17449
Added Vehicle: {"model": "NISSAN FRONTIER 4WD K/C PRO-4X", "year": "2013", "dealership": "2cc315a0-0984-4438-962f-babe40d7bdb4", "make": "NISSAN", "vin": "IK5IU7VLXJZH53RIB"}
24762
Added Vehicle: {"model": "F150 2WD FFV", "year": "2010", "dealership": "3d4cc8e0-10e2-41d3-94ed-f1b5ad91006b", "make": "FORD", "vin": "8VN2LKTN9J3IMOSVB"}
18306
Added Vehicle: {"model": "ES 350", "year": "2011", "dealership": "41a9c461-76e8-4fe0-91d6-480e5fcf55b2", "make": "LEXUS", "vin": "12X1LRJYUXR8Q434H"}
18201
Added Vehicle: {"model": "911 Carrera", "year": "2016", "dealership": "7e2ba903-318c-4072-b832-c5cc6a0e8271", "make": "Porsche", "vin": "ZR15NIG1DB8WAV703"}
6274
Added Vehicle: {"model": "XV CROSSTREK AWD", "year": "2013", "dealership": "13ee2e0b-1f6b-49fd-8c27-05e2084b3a90", "make": "SUBARU", "vin": "9OSRVNSCCIGFSD8SU"}

Vehicles get added every second, and dealers get added every 5 seconds.

REST API URLs are set within the script.

@wallnerryan
Copy link
Contributor Author

cc @pcgeek86

@wallnerryan
Copy link
Contributor Author

wallnerryan commented Oct 12, 2016

We can see how many records returned easily by monitoring the API

https://stedolan.github.io/jq/download/

curl 'http://ec2-54-237-204-239.compute-1.amazonaws.com:32787/dealerships' | jq length

curl 'http://ec2-54-237-204-239.compute-1.amazonaws.com:32787/vehicles' | jq length



def main():
app_url="http://ec2-54-237-204-239.compute-1.amazonaws.com:32787/dealerships"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be abstracted out to be excepted as input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

print(r.status_code)

def main():
dealer_url="http://ec2-54-237-204-239.compute-1.amazonaws.com:32787/dealerships"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be abstracted out to be excepted as input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def main():
dealer_url="http://ec2-54-237-204-239.compute-1.amazonaws.com:32787/dealerships"
vehicle_url="http://ec2-54-237-204-239.compute-1.amazonaws.com:32787/vehicles"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be abstracted out to be excepted as input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Ryan Wallner added 2 commits October 13, 2016 10:37
…this, eliminate the time sleeps as it was not adding data fast enough
@wallnerryan
Copy link
Contributor Author

wallnerryan commented Oct 13, 2016

rethink_db starts about 50MB with our base snapshot, been running for about 20 min and its at 70M.

Slow road, but eventually we can grow to GB size.

$ du -csh --block-size=1M /chq/1c030aa0-f301-4a42-a19a-8c63c8977f88/9924c5c7-1cb0-45ad-bd03-a2d7b818bca9/*
70  /chq/1c030aa0-f301-4a42-a19a-8c63c8977f88/9924c5c7-1cb0-45ad-bd03-a2d7b818bca9/rethinkdb_data

@wallnerryan
Copy link
Contributor Author

wallnerryan commented Oct 14, 2016

cc @pcgeek86 you can untar this into /chq/<uuid-of-volset>/<uuid-of-volume>/rethinkdb_data and you can either take a snapshot of it and push to FH if we dont have a snap of it before, or use it to test. I found a lot of issue with our app once we have this much data :)

also, im still working on growing the dataset so its at least 1G

@wallnerryan
Copy link
Contributor Author

It was also hard to get the size from using jq as it download then counted locally. Instead, I added APIs to do that faster.

curl http://ec2-52-90-32-15.compute-1.amazonaws.com:32799/dealershipssize
{"status":"Size: 762311"}

curl http://ec2-52-90-32-15.compute-1.amazonaws.com:32799/vehiclessize
{"status":"Size: 747310"}

@wallnerryan wallnerryan merged commit b7c8c24 into master Oct 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant