Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup for analytics data storage #185

Open
martinspaeth opened this issue Apr 28, 2018 · 1 comment
Open

Setup for analytics data storage #185

martinspaeth opened this issue Apr 28, 2018 · 1 comment

Comments

@martinspaeth
Copy link

Hi thank you for your great work at citusdata! We just setup a server with an cstore_fdw installation.
The initial copy of our table (around 80 milion data rows) into the foreign table took about 20 minutes. The query performance boost with the given approach is amazing!
The data in the origin table changes very rapidly (inserts and updates). While for our day to day analysis (which we plan to use the cstore for) 24 hour old data would be fresh enough i am not sure how the best practice for setting up a analytics only table with cstore should look like.

My current thougts are:

  1. setup a replica while the master has only a cstore installed (with an empty table) the replica shall fill the data in batches every 24 hours.
    1.1. cause the initial fill of the table takes more than the max_standby_streaming_delay this seems to be kind of problematic
  2. like 1 but with triggers on the slave that would insert/update/delete records in the cstore
    2.2. is not possible since cstore does only support insert

The setup with the trigger looks like the most up to date version and would take much less resources (assuming upserting/deleting would be not that expansive as a setup from scratch.

Is there any input you can give me about best practices here or can you give me an update about the upsert/deletion functionality of cstore_fdw (discussed also in #2 #175 and last but not least in #118)

@mtuncer
Copy link
Member

mtuncer commented Apr 30, 2018

hello @martinspaeth

Once the data gets into cstore_fdw table it becomes immutable. Although there are issues calling to support update/delete, they are not planned to be done anytime soon.

I have been recommending partitioned table approach who need this functionality. It works for use cases when you only need to modify RECENT data.

  • create a master partitioned table (with no partitions)
  • insert your data into a regular staging table where you can make update/deletes.
  • when data in your staging table gets old enough, create cstore_fdw table as a partition of the master table, move old data from staging table to cstore table.

Notice that, this is strictly for use cases when you only modify recent data, and you move data to cstore_fdw table only if you are sure you won't be modifying it anymore.

This approach also let's you selectively drop some cstore_fdw partition tables to retire old data.

Keep in mind that, due to how postgres handles partition tables (as of 10), you should insert data directly into cstore_fdw partition, not to the master partitioned table. It won't work otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants