Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to NoSQL database #383

Open
MaxFrax opened this issue Mar 3, 2020 · 0 comments
Open

Migration to NoSQL database #383

MaxFrax opened this issue Mar 3, 2020 · 0 comments
Assignees
Labels
discussion Extra attention is needed low priority story Task with multiple sub-tasks

Comments

@MaxFrax
Copy link
Collaborator

MaxFrax commented Mar 3, 2020

At the time being, we import all the targets into a MariaDB database.
From a technical viewpoint, switching to a documental database like MongoDB would hold many advantages.
Here's a list of them from the more important to the less (in my opinion):

  1. In workflow.py we perform some joins to gather all the information for an entity in a target. With a documental database, it wouldn't be necessary.
  2. In workflow.py 'extract_features', we already check if the columns are there. The same check would be done in a documental database.
  3. We tried to find a common schema among all the data sources and we failed. Introducing a documental database would save a lot of space spent on null fields and short words.
  4. A documental database would let us save strings of variable length, fixing all the errors in the import phase due to fields too small.
  5. There would be more flexibility on adding data available uniquely on a single data source.

It would be necessary to be consistent with the ontology mapping while we import the data, but nothing new under the sun.

However, the only obstacle I see is from an infrastructure viewpoint.
I wasn't able to find anything about documental databases hosting on Wikitech. We should probably ask them. Maybe @marfox is aware of something.

@MaxFrax MaxFrax added discussion Extra attention is needed story Task with multiple sub-tasks labels Mar 3, 2020
@MaxFrax MaxFrax self-assigned this Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Extra attention is needed low priority story Task with multiple sub-tasks
Projects
None yet
Development

No branches or pull requests

2 participants