-
-
Notifications
You must be signed in to change notification settings - Fork 7
Architecture Decision: choice of local Database (Postgres)
So far, this project was directly writing to files for persistence, data management was happening using JSON objects, and those objects are flushed to files. this project doesn't have a database so far, understanding existing & feature requirements for this project - Research on choice of database.
Requirements:
- Light weight (simple metrics startup time - when you spawn docker DB container)
- Atomicity (for applied, skipped jobs use cases)
- ORM Support (this is required as modules get complex)
- Local RAG requirements, vector support
- High query flexibility > horizontal scaling (its local database)
- Data modeling flexibility (Research if we need this? current data model can be fit is RDMBS - as it is structed data)
- Do we have usecases / goona get thsoe uses cases where nested json is requried. (this is where mongo excels)
- Supporting Analytical Queries
- Storage constraints (nosql data must be denormaized) (low prioroty as storge cost is alwaays low)
For now the most popular nosql db is mongodb and it has pretty good support for python (https://github.com/mongodb/mongo-python-driver).
TBD #Alternatives Considered TBD #Reference tbd
from @Surapuramakhil
mongoDB will be good choice atleaste if we have nested JSON use cases (we don't even have that)
it lacks ORM Support, not good choice for data requirmetns of complex software modules
going with Postgres (for covering all rdbms usebases)
mongo satifying these usecases Currently we are using JSON file, want to present the data in a NoSQL and vector databases for bettern llm response + RAG model.
a NoSQL database to store the raw data we collect from the internet before processing it and pushing it into the vector database. As we work with unstructured text data, the flexibility of the NoSQL database fits. Integrate it with the different job board api's as a unified datawarehouse. FTI design and the LLM Twin architecture