Multiple sinks #128

szareiangm · 2019-07-02T00:41:54Z

The goals is to split the sink based on a field, i.e. app_name or event_name. This way, we would read the stream once and write to multiple clusters with one batch of read.

This also adds a little refactor to the configuration file to support this feature. It could be something like below:

{
 "splitField": "app_name"
  "sinks": [
      {
       "type":"elasticsearch",
       "mapping": {
         "app_name":["theglobeandmail-website","theglobeandmail-amp"]
       },
       "settings":{
          // current details under elasticsearch
        },
       {
       "type":"elasticsearch",
       "mapping": {
         "app_name":["globeadvisor-website","globeinvestor-amp"]
       },
       "settings":{
          // current details under elasticsearch
        }
   ]
}

This might pave the way towards having one repository that would merge snoplow-s3-loader functionality inside this repo or the other way, with "type":"s3"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple sinks #128

Multiple sinks #128

szareiangm commented Jul 2, 2019 •

edited

Loading

Multiple sinks #128

Multiple sinks #128

Comments

szareiangm commented Jul 2, 2019 • edited Loading

szareiangm commented Jul 2, 2019 •

edited

Loading