Project is done with django REST framework as the HTTP requests and data processor, sqlite as the intermediate database, and minio as the permanent data storage. App is being structured as 3 docker containers:
- Minio server.
- Django app.
- Utils.
- The app is started with
docker-compose up
. minio
container starts minio server atlocalhost:9001
.django-app
container starts django app atlocalhost:8000
.my-utils
container runs acreate_bucket_and_upload_data.py
file that:- creates bucket
source
at minio that has to be used to store data to be processed; - uploads
.zip
file fromsource_data
directory inside the project. That is the initial data upload to minio; - sends POST request to django app
/data/
so the file uploaded in the previous step is processed. POST requests are sent to the app each 55 minutes, so if there is new data uploaded, it is automatically processed within an hour.
- creates bucket
- All the data in order to be processed has to be uploaded to the
source
bucket in.zip
format.
Django app replies to the:
GET
request tolocalhost:8000/data/
- returns processed data in.json
format. Can be viewed in human-readable format with the browser as well.POST
request tolocalhost:8000/data/
- manually trigger reprocessing of the data file stored in thesource
bucket. Returns string 'updated'.GET
request tolocalhost:8000/stats/
- returns average value for births field of all objects in the queryset, length of the queryset and filters used in the string format. Filters can be added as the url arguments, example:http://localhost:8000/stats/?is_image_exists=False&min_age=1&max_age=2
. If no objects were returned by the filter, a string 'Filter returned no objects' will be returned.
- Admin panel of the django app can be logged in at
localhost:8000/admin/
. Username - 'admin', password - 'admin'. - Minio can be logged in at
localhost:9001
. Username - 'minioadmin', password - 'minioadmin'.
- App stores the datetime of the last data processing.
- Django app connects to the minio and checks the file in the
source
bucket. If upload datetime of the file is later than the last processing, it will be downloaded for further steps. If not, following steps until step 9 are skipped. minio_data/minio_template.zip
is used to create via copying a new.zip
file into which.zip
fromsource
bucket is downloaded.- All data is first stored to the
sqlite
db. That implementation is done so Django ORM can be used for convenient filtering of data and returning a.json
format as the response for HTTP request. - Getting list of ids of all existing instances in
sqlite
db, to check if they have to be updated with the data from the new file, or have to be removed if not added to the new file. - Iterating through each file in the
.zip
file, an instance of theUserData
model is being created and saved to thesqlite
db. Removing ids from the list in previous step. - All the instances with ids left in the list from step 5 are being removed, as they are not added to the new data file.
- Iterating through all instances of the
UserData
and writing data to theminio_data/output.csv
. Uploading theminio_data/output.csv
to theprocessed-data
bucket. - Updating processing datetime to the
LastUpdate
model.
HTTP responses are being implemented with APIViews, models and serializers. Data for the HTTP response is being processed with django ORM.