Skip to content

Commit

Permalink
XS✔ ◾ Update data storage rule to mention read only replicas as an op…
Browse files Browse the repository at this point in the history
…tion (#8576)

* Update data storage rule to mention read only replicas as an option

* Auto-fix Markdown files

* Update rule.md

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tiago Araújo [SSW] <[email protected]>
  • Loading branch information
3 people authored May 23, 2024
1 parent 6918c20 commit eba4bba
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions rules/use-the-right-data-storage/rule.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,16 @@ A database is a structured collection of data organized in a specific format, us

Data warehouses, on the other hand, are repositories that consolidate data from multiple sources into a centralized, structured format for reporting and analysis. They typically follow a dimensional model and provide a historical view of data, allowing organizations to analyze trends and make informed decisions. Data warehouses are optimized for complex queries and aggregations across large datasets. They provide a single source of truth and maintain data integrity through data cleansing and transformation processes. However, data warehouses are often designed with a predefined schema, which can make accommodating new data sources or changing business requirements more challenging.

A read-only replica of the original database is often a simple cheap equivalent to a data warehouse for small reporting applications.

## Data Lakes

Data lakes are vast repositories that store data in its raw and unprocessed form, without a predefined structure. They can store structured, semi-structured, and unstructured data, such as text files, images, or social media posts. Data lakes offer flexibility and scalability, allowing organizations to store massive amounts of data from various sources. They are suitable for exploratory analysis and data discovery since data can be transformed and processed as needed. However, data lakes can become a "data swamp" without proper governance and metadata management. The lack of structure and schema can lead to data quality issues and make it harder to extract meaningful insights without additional processing.

![](data-lake-infographic.jpg)

## Things to consider

* When building reporting solutions that read from the original data store, take care when deciding on the refresh schedule. You'll need to balance timeliness vs cost on the system being read from. If the refresh is very expensive, try and run it out of hours to avoid affecting the application's users.

* For small applications adding a read only replica of the main database is a much simpler and more cost effective alternative to a data warehouse. It avoids the reporting queries affecting the live database. It is typically cheaper than a data warehouse, a data-lake is often the cheapest solution infrastructure wise.

0 comments on commit eba4bba

Please sign in to comment.