Skip to content

Commit

Permalink
📖
Browse files Browse the repository at this point in the history
  • Loading branch information
kristiyanto authored Oct 6, 2016
1 parent 0d354f0 commit c347b57
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ click here for a complete schema. The data is generated by placing both drivers
## Matching
The matching is heuristic and driver-centric:

1. if a driver available / idle, scan for passenger nearby.
1. If a driver available / idle, scan for passenger nearby.
2. Dispatch to pick up the passenger.
3. Switch status to 'On trip' to the destination. Along the way, unless cab is full, driver continuously scan another passenger nearby with common destinations, and re-route if necessary.
4. Once arrived, drivers' status are set back to idle to pick up other passengers. Passengers removed after 2 hours later.
Expand Down Expand Up @@ -54,15 +54,17 @@ Data streams for passengers and drivers generated separately in python (Kafka pr
Secor is used to saving all raw streams into Amazon S3 for later purposes (batch, re-play, forensics, or analytics).

__Stream processing__
ed in Spark Streaming with window 3 seconds, consuming data streams from both drivers and passengers. Every incoming message is subject to sanity check: e.g. driver's reported status is matched with the previous status, etc. to anticipate latency.

Recieved JSON streams are processed and transformed in Spark Streaming with window 3 seconds, consuming data streams from both drivers and passengers. Every incoming message is subject to sanity check: e.g. driver's reported status is matched with the previous status, etc. to anticipate latency.

__Sink__

Elasticsearch is used as the buffer/transactional interface for the resulted messages. Elasticsearch is also called by Spark to enable assignments. Elasticsearch geo-location and boolean queries are leveraged.

__Output__

The output is served as API by using Flask. Bootstrap2, jquery, and leaflet are used to prettify the output.
The output is served as API by using Flask, output intended mainly for dashboard and sent back to passanger and drivers.
For Dashboard: Bootstrap2, jquery, and leaflet are used to prettify the output.

# Infrastructure
Hosted in Amazon S3 with 3 m3.large instances for Spark Processing and 4 m3. medium instances other services (multitenant).
Expand Down

0 comments on commit c347b57

Please sign in to comment.