From 0d354f0d5a8e0ca91e90e8ad04eef4cbb97acba8 Mon Sep 17 00:00:00 2001
From: Daniel Kristiyanto <danielkr@uw.edu>
Date: Thu, 6 Oct 2016 01:30:49 -0700
Subject: [PATCH] =?UTF-8?q?=F0=9F=93=96?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md | 50 +++++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/README.md b/README.md
index dcefc11e..ab48dc7c 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
 "Swipe right to your next adventurous ride"
 
 
-Venturo is a ridesharing platform with common destinations, e.g. to complement city passes. Users select multiple destinations; and matched with a driver and other users. Currently serving 2 cities: Chicago and New York City.
+Venturo is a ridesharing platform with common destinations, e.g. to complement city passes. Users select multiple destinations and matched with a driver and other users. Currently serving 2 cities: Chicago and New York City.
 
 ## Demo
 Demo available at:
@@ -11,25 +11,25 @@ Demo available at:
 
 
 ## Data
-The plaform takes 2 data stream, both in JSON format:
+The platform takes 2 data stream, both in JSON format:
 
-1. 🙋 From passenger : {id, current location, status, destinations ..}
+1.  From passenger : {id, current location, status, destinations ..}
 
-2. 🚕 From the driver: {id, current location, status, destination, passenger, ..}
+2.  From the driver: {id, current location, status, destination, passenger, ..}
 
-click here for a complete schema. The data is generated by placing both drivers and passenger on a map, and simulate a moving location toward destinations within 1-2 hops (euclidian).
+click here for a complete schema. The data is generated by placing both drivers and passenger on a map and simulate a moving location toward destinations within 1-2 hops (euclidean).
 
 ## Matching
-The matching is heuristic, and driver-centric:
+The matching is heuristic and driver-centric:
 
-1. if a driver available / idle, scan for passenger nearby
-2. Dispatch 
-3. On trip to next destination. Along the way, unless cab is full, driver continuously scan other passenger nearby with common destinations, and re-route if necesary.
-4. Once arrived, drivers's status are set back to idle to pick up other passengers. Passengers removed after 2 hours later.
+1. if a driver available / idle, scan for passenger nearby.
+2. Dispatch to pick up the passenger.
+3. Switch status to 'On trip' to the destination. Along the way, unless cab is full, driver continuously scan another passenger nearby with common destinations, and re-route if necessary.
+4. Once arrived, drivers' status are set back to idle to pick up other passengers. Passengers removed after 2 hours later.
 
 
 ## Dashboard and Queries
-The dashboard put the drivers and passengers on the map in realtime fashion within 1 hour window. Since the data is simulated, some passengers and drivers may appear on unlikely places (e.g. river/lake).
+The dashboard put the drivers and passengers on the map in real-time fashion within 1-hour window. Since the data is simulated, some passengers and drivers may appear in unlikely places (e.g. river/lake).
 
 
 <img src="media/screenshot.png" width=500px />
@@ -37,32 +37,40 @@ The dashboard put the drivers and passengers on the map in realtime fashion with
 Green dots represent passengers, Blue represents drivers. Green inside blue represents passenger(s) on a trip.
 The stats:
 - Total cabs: total active drivers (including idle, on trip, and dispatched drivers)
-- Idle driver: cabs without pasenger
-- Total passenger: total passengers (including passengers on trip, waiting passenger, or passengers being pickup, etc)
+- Idle driver: cabs without passenger
+- Total passenger: total passengers (including passengers on trip, waiting for passenger, or passengers being pickup, etc)
 - On going trips: total cabs with passengers on going to destination
 - Passenger waiting: total passengers not assigned to any drivers
-- Average waiting time: in seconds, average time between waiting passengers to the time passengers hop on into the car.
+- Average waiting time: in seconds, the average time between waiting for passengers to the time passengers hop on into the car.
 
 Other queries that performed internally:
-- Check nearby passangers
+- Check nearby passengers
 
 ## Architecture
 __Ingestion layer__
-Data streams for passengers and drivers generated separately in python (kafka producers), and ingested in Kafka as different topics. Kafka is set with 4 hours retention policy.
 
-Secor is used to save all raw streams into Amazon S3 for later purposes (batch, re-play, forensics, or analytics).
+Data streams for passengers and drivers generated separately in python (Kafka producers) and ingested in Kafka as different topics. Kafka is set with 4 hours retention policy.
+
+Secor is used to saving all raw streams into Amazon S3 for later purposes (batch, re-play, forensics, or analytics).
 
 __Stream processing__
-Stream processing is performed in Spark Streaming with window 3 seconds, consuming data streams from both drivers and passangers. Every incoming messages are subject of sanity check: e.g. driver's reported status is matched with previous status, etc. to anticipate latency.
+ed in Spark Streaming with window 3 seconds, consuming data streams from both drivers and passengers. Every incoming message is subject to sanity check: e.g. driver's reported status is matched with the previous status, etc. to anticipate latency.
 
 __Sink__
-Elasticsearch is used as the buffer/transactional interface for the resulted messages. Elasticsearch is also called by Spark to enable assigments. Elasticsearch geo location and boolean queries are leveraged. 
+
+Elasticsearch is used as the buffer/transactional interface for the resulted messages. Elasticsearch is also called by Spark to enable assignments. Elasticsearch geo-location and boolean queries are leveraged. 
 
 __Output__
-Output is served as API by using Flask. Bootstrap2, jquery and leaflet is used to prettify the output.
 
+The output is served as API by using Flask. Bootstrap2, jquery, and leaflet are used to prettify the output.
 
+# Infrastructure
+Hosted in Amazon S3 with 3 m3.large instances for Spark Processing and 4 m3. medium instances  other services (multitenant).
 
-# Contact
+# Performance
+The platform was tested to handle 3000 active drivers and 5000 passangers in a continues datastream on the forementioned infrastructure. 
+
+# About
 An Insight Data Science project by Daniel Kristiyanto.
+
 Palo Alto, California -- Autumn, 2016.