Skip to content

Scribe multi threading

rajubairishetti edited this page Aug 28, 2014 · 1 revision

This feature is added in 0.5.1 collector release. Prior to this release, scribe has onlt one writer thread per category and that thread pulls all the data and writes to the corresponding hdfs location.

In this feature, we are allowing user to provide multiple number of writer threads per category. That will improve the hdfs throughput.

Property name : num_store_threads. Value should be positive number. System behavior will be unpredictable in case if you provide negative numbers. Recommended value for this is between 1 and 10.

Design approach:

It creates single/multiple store queues for each category based on the 'num_store_threads' config property. Spawned writer threads pulls the data from respective queues periodically and writes to their primary/secondary store.

Prior to 0.5.1 release, it is used to write under hdfs://<namenode><port>/databus/data/<category>/<hostname>/. Spools data under <file_path>/<category>/.

In 0.5.1 release and configured number of threads are more than one, then each thread writes to hdfs://<namenode>:<port>/databus/data/<category>/*<hostname> _<threadname>*/ Each thread spools the data to <file_path>/<category>/<threadname/

Thrift server thread, which are listening for incoming threads and should write the incoming messages to only one queue. It writes to the least sized store queue.

Clone this wiki locally