Feature/vapaaehtoistyö.fi importer #456

Cap-JaTu · 2021-04-29T13:03:42Z

Adding new importer for Vapaaehtoistyö.fi volunteering tasks.

Note: Setting-variable VAPAAEHTOISTYOFI_API_KEY is required for successful access to the resource.

The first one is based on memcached and regex. The columns of interest are collected in a separate memcached store which is being populated via cron job. The query string is then parsed and transformed into a regex which is applied to the cache and retrieves the event ids. Second search is based on the Postgres full-text search functionality and hence is language specific. Separate tsvector columns are created in the db for Finnish, Swedish, and English. They are populated and kept up-to-date on the db side, see migration 0080. urls.py has a small correction of the imports, not related to the main topic of the commit.

KeywordMatcher takes as input the strings of words, mostly without language specification, and has to transform them into a list of Keywords. This commit suggests the following logic: first an exactly matching KeywordLabel is searched for irrespective of the language. If no exact matches found, Postgres full-text search is used to find a label matched by lexeme. The results are ranked with TrigramSimilarity as ft SearchRank is not suitable for ranking matched individual words. If no language is passed we cycle through all the options as specified in FULLTEXT_SEARCH_LANGUAGES and select the best match according to similarity. If no match is found the string is checked for the possibility that it could be split and search for matches is repeated.

…to make migration succeed on Travis

codecov-commenter · 2021-04-30T06:21:39Z

Codecov Report

Merging #456 (57a31f2) into master (307369c) will decrease coverage by 0.34%.
The diff coverage is 13.40%.

❗ Current head 57a31f2 differs from pull request most recent head 6bd3a28. Consider uploading reports for the commit 6bd3a28 to get more accurate results

@@            Coverage Diff             @@
##           master     #456      +/-   ##
==========================================
- Coverage   22.48%   22.13%   -0.35%     
==========================================
  Files         169      177       +8     
  Lines       11986    12480     +494     
==========================================
+ Hits         2695     2763      +68     
- Misses       9291     9717     +426

Impacted Files	Coverage Δ
...rter/helper/importers/vapaaehtoistyofi/__init__.py	`0.00% <0.00%> (ø)`
...porter/helper/importers/vapaaehtoistyofi/reader.py	`0.00% <0.00%> (ø)`
...porter/helper/importers/vapaaehtoistyofi/record.py	`0.00% <0.00%> (ø)`
events/importer/vapaaehtoistyofi.py	`0.00% <0.00%> (ø)`
events/keywords.py	`0.00% <0.00%> (ø)`
.../management/commands/populate_local_event_cache.py	`0.00% <0.00%> (ø)`
linkedevents/urls.py	`81.25% <0.00%> (-6.25%)`	⬇️
events/tests/test_event_get.py	`10.51% <8.00%> (-0.11%)`	⬇️
events/api.py	`51.62% <23.43%> (-1.92%)`	⬇️
events/tests/conftest.py	`66.66% <76.47%> (-0.59%)`	⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 307369c...6bd3a28. Read the comment docs.

nikomakela

I would like to request unit tests, because they would have already been for use while reviewing. They would explain the purpose, demo the usage and proof the working.

Some comments added already to #455 which seems to be included in this PR also.

nikomakela · 2021-04-30T13:25:05Z

events/importer/helper/importers/vapaaehtoistyofi/record.py

+    def to_dict(self):
+        newrecord = {
+            "id": self.id,
+            "organization_id": self.organization_id,
+            "organization_name": self.organization_name,
+            "title": self.title,
+            "address": self.address,
+            "address_coordinates": self.address_coordinates,
+
+            "no_time": self.no_time,
+            "timestamp_start": self.timestamp_start,
+            "timestamp_end": self.timestamp_end,
+
+            "description": self.description,
+            "contact_details": self.contact_details,
+            "themes": self.themes,
+            "timestamp_publish": self.timestamp_publish,
+
+            "status": self.status,
+            "creator_id": self.creator_id,
+            "timestamp_inserted": self.timestamp_inserted,
+            "timestamp_updated": self.timestamp_updated
+        }


Is this the only place this to_dict() -method is used?

def _import_event(self, event_obj): event = event_obj.to_dict()

Doesn't __dict__ do the same?

In [50]: r1.__class__ Out[50]: events.importer.helper.importers.vapaaehtoistyofi.record.Record In [51]: r1.__dict__ Out[51]: {'id': 'id', 'organization_id': 'organization_id', 'organization_name': 'organization_name', 'title': 'title', 'address': 'address', 'address_coordinates': 'address_coordinates', 'tags': [], 'no_time': None, 'timestamp_start': None, 'timestamp_end': None, 'description': None, 'contact_details': None, 'themes': None, 'timestamp_publish': None, 'status': None, 'creator_id': None, 'timestamp_inserted': None, 'timestamp_updated': None} In [52]: len(r1.__dict__) Out[52]: 18 In [53]: len(r1.to_dict()) Out[53]: 17

Tags is missing from __dict__, but is it missing in purpose and if it is, it would be better to just delete it from the new instance done with __dict__.

nikomakela · 2021-04-30T13:35:29Z

events/importer/helper/importers/vapaaehtoistyofi/record.py

+    def __copy__(self) -> object:
+        newrecord = Record(None)
+        newrecord.id = self.id
+        newrecord.organization_id = self.organization_id
+        newrecord.organization_name = self.organization_name
+        newrecord.title = self.title
+        newrecord.address = self.address
+        newrecord.address_coordinates = self.address_coordinates
+        newrecord.tags = self.tags
+
+        newrecord.no_time = self.no_time
+        newrecord.timestamp_start = self.timestamp_start
+        newrecord.timestamp_end = self.timestamp_end
+
+        newrecord.description = self.description
+        newrecord.contact_details = self.contact_details
+        newrecord.themes = self.themes
+        newrecord.timestamp_publish = self.timestamp_publish
+
+        newrecord.status = self.status
+        newrecord.creator_id = self.creator_id
+        newrecord.timestamp_inserted = self.timestamp_inserted
+        newrecord.timestamp_updated = self.timestamp_updated


Why is this needed? Doesn't self.__copy__() already do the same?

In [50]: r1.__class__ Out[50]: events.importer.helper.importers.vapaaehtoistyofi.record.Record In [51]: r1.__dict__ Out[51]: {'id': 'id', 'organization_id': 'organization_id', 'organization_name': 'organization_name', 'title': 'title', 'address': 'address', 'address_coordinates': 'address_coordinates', 'tags': [], 'no_time': None, 'timestamp_start': None, 'timestamp_end': None, 'description': None, 'contact_details': None, 'themes': None, 'timestamp_publish': None, 'status': None, 'creator_id': None, 'timestamp_inserted': None, 'timestamp_updated': None} In [57]: r2 = r1.__copy__() In [58]: r2.__dict__ == r1.__dict__ Out[58]: True

I could have tried if there would be unit tests for this. ;)

My Python:

AttributeError: 'Record' object has no attribute '__copy__'

This one solved by dropping __copy().

nikomakela · 2021-04-30T13:50:26Z

events/importer/helper/importers/vapaaehtoistyofi/record.py

+    def __init__(self, json_dict):
+        if json_dict:


I find it better to destruct the json before initializing a new class instance or at least use another constructor method like from_json(json_object). So I would map the json values to the class instance with a json serializer. Then the class instance creation would be as simple as Record(**record_dict).

Cap-JaTu · 2021-05-03T06:53:30Z

I'll start working with the requested changes.

quyenlq

There are some changes in this PR that belongs to another feature so it's quite hard to understand the whole thing without looking at #443

quyenlq · 2021-05-07T11:43:02Z

events/importer/helper/importers/vapaaehtoistyofi/reader.py

+        return self.entries[id]
+
+    def _load_entry_api(self, id):
+        http_client = self._setup_client()


IMO It'd be better to add the requests.Session() client to the Reader class property, then set it up in the constructor function init(), that way you don't have to call the _setup_client() before making every HTTP call, just use self.client_session. Isn't it the goal of using Session 😉

Added in 6bd3a28

quyenlq · 2021-05-07T12:13:47Z

events/importer/helper/importers/vapaaehtoistyofi/reader.py

+        http_client = self._setup_client()
+        page = 1
+        total_records = None
+        batch_size = 1


I'd get rid of batch_size here, change the while loop to True, then break the loop if len(data['data']['records']) == 0. This could make this shorter and a bit easier to understand

This suggestion is heavily opinionated.
I typically tend not to use while True -stucture as it can yield a forever-loop.

quyenlq · 2021-05-07T12:15:06Z

events/importer/helper/importers/vapaaehtoistyofi/reader.py

+                               response.status_code)
+
+        data = response.json()
+        if 'status' not in data or data['status'] != "ok":


This part repeats quite many times, maybe consider moving it to a separated private function

Added check-function in 6bd3a28

quyenlq · 2021-05-07T12:19:36Z

events/importer/helper/importers/vapaaehtoistyofi/record.py

+            raise ValueError("Unknown locale '%s'!" % locale)
+        return self.TASK_URL_FORMAT % (locale, self.id)
+
+    def __copy__(self) -> object:


What is the use of this function in the PR?
For copying objects, there are a copy.copy() or copy.deepcopy() already available in python copy module

__copy__() is already dropped in 65c526d.

…port into HTTP-client.

Cap-JaTu · 2021-05-10T06:27:13Z

There are some changes in this PR that belongs to another feature so it's quite hard to understand the whole thing without looking at #443

The reason #443 files have been copied here is to get database migration to work at all.

Exactly the same is happening with unit tests, they do not work. At all. Assumption is for manage.py test to run unit tests, but instead an error from module helevents is spat out. Running manage.py test events isn't much better:

psycopg2.errors.UndefinedColumn: column munigeo_administrativedivision.name_zh_hans does not exist
LINE 1: ..._sv", "munigeo_administrativedivision"."name_en", "munigeo_a...

My only conclusion is, that codebase isn't in a shape for unit testing to work for this branch.

dankiki and others added 6 commits April 13, 2021 12:20

Initial version of Vapaaehtoistyö.fi API importer

64e71ae

Bugfix: If Vapaaehtoistyö.fi API-key is missing, don't fail.

7ad2299

Added missing migration 0079 from feature/improve_search_performance …

09461db

…to make migration succeed on Travis

Minor fixes in code to make flake8 pass

57a31f2

nikomakela reviewed Apr 30, 2021

View reviewed changes

nikomakela requested a review from dankiki April 30, 2021 13:55

quyenlq self-requested a review May 7, 2021 11:32

Simplified Record structure

76a49a2

quyenlq reviewed May 7, 2021

View reviewed changes

Cap-JaTu added 2 commits May 7, 2021 15:27

Merge branch 'master' into feature/vapaaehtoistyö.fi-importer

65c526d

Started using new endpoint for single tasks. Added proper session sup…

6bd3a28

…port into HTTP-client.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/vapaaehtoistyö.fi importer #456

Feature/vapaaehtoistyö.fi importer #456

Cap-JaTu commented Apr 29, 2021

codecov-commenter commented Apr 30, 2021 •

edited

Loading

nikomakela left a comment •

edited

Loading

nikomakela Apr 30, 2021

nikomakela Apr 30, 2021

Cap-JaTu May 7, 2021

Cap-JaTu May 7, 2021

nikomakela Apr 30, 2021

Cap-JaTu commented May 3, 2021

quyenlq left a comment

quyenlq May 7, 2021

Cap-JaTu May 7, 2021

quyenlq May 7, 2021

Cap-JaTu May 7, 2021

quyenlq May 7, 2021

Cap-JaTu May 7, 2021

quyenlq May 7, 2021

Cap-JaTu May 7, 2021

Cap-JaTu commented May 10, 2021

Feature/vapaaehtoistyö.fi importer #456

Are you sure you want to change the base?

Feature/vapaaehtoistyö.fi importer #456

Conversation

Cap-JaTu commented Apr 29, 2021

codecov-commenter commented Apr 30, 2021 • edited Loading

Codecov Report

nikomakela left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cap-JaTu commented May 3, 2021

quyenlq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Cap-JaTu commented May 10, 2021

codecov-commenter commented Apr 30, 2021 •

edited

Loading

nikomakela left a comment •

edited

Loading