Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: hydrater la table du dernier évènement connu pour un email à partir des évènements passés #896

Conversation

vincentporte
Copy link
Contributor

@vincentporte vincentporte commented Jan 28, 2025

Description

🎸 Collecter les User.last_login, Event, DSP, UpVote, ForumRating, Post anonymes et authentifiés, Notification visitées (#891)
🎸 dédupliquer les emails en gardant l'evènement le plus récent
🎸 ignorer les emails déjà enregistrés dans EmailLastSeen, en considerant que l'enregistrement dans EmailLastSeen est le plus récent
🎸 Insérer l'ensemble dans EmailLastSeen

Type de changement

🚧 technique

Points d'attention

🦺 test_collect_clicked_notifs casse par principe en attendant #891
🦺 prérequis #892 & #894

simulation sur données du 27 janvier 2025

$ python manage.py populate_emaillastseen
INFO 2025-02-11 11:20:09,215 ./lacommunaute/users/management/commands/populate_emaillastseen.py : starting to populate EmailLastSeen table
INFO 2025-02-11 11:20:09,215 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing users
INFO 2025-02-11 11:20:09,493 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 19353 emails
INFO 2025-02-11 11:20:09,577 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 18353
INFO 2025-02-11 11:20:09,693 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 17353
INFO 2025-02-11 11:20:09,777 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 16353
INFO 2025-02-11 11:20:09,862 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 15353
INFO 2025-02-11 11:20:09,947 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 14353
INFO 2025-02-11 11:20:10,095 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 13353
INFO 2025-02-11 11:20:10,454 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 12353
INFO 2025-02-11 11:20:10,739 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 11353
INFO 2025-02-11 11:20:10,838 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 10353
INFO 2025-02-11 11:20:10,975 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 9353
INFO 2025-02-11 11:20:11,075 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 8353
INFO 2025-02-11 11:20:11,172 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 7353
INFO 2025-02-11 11:20:11,271 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 6353
INFO 2025-02-11 11:20:11,369 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 5353
INFO 2025-02-11 11:20:11,464 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 4353
INFO 2025-02-11 11:20:11,553 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 3353
INFO 2025-02-11 11:20:11,647 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 2353
INFO 2025-02-11 11:20:11,743 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 1353
INFO 2025-02-11 11:20:11,873 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 353
INFO 2025-02-11 11:20:11,912 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:11,912 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing events
INFO 2025-02-11 11:20:11,921 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 309 emails
INFO 2025-02-11 11:20:11,937 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:11,937 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing DSP
INFO 2025-02-11 11:20:11,994 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 3087 emails
INFO 2025-02-11 11:20:12,085 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 2087
INFO 2025-02-11 11:20:12,179 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 1087
INFO 2025-02-11 11:20:12,278 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 87
INFO 2025-02-11 11:20:12,298 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:12,298 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing upvotes
INFO 2025-02-11 11:20:12,319 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 837 emails
INFO 2025-02-11 11:20:12,367 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:12,367 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing forum ratings
INFO 2025-02-11 11:20:12,380 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 236 emails
INFO 2025-02-11 11:20:12,409 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:12,409 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing posts
INFO 2025-02-11 11:20:12,562 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 4777 emails
INFO 2025-02-11 11:20:12,654 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 3777
INFO 2025-02-11 11:20:12,741 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 2777
INFO 2025-02-11 11:20:12,880 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 1777
INFO 2025-02-11 11:20:13,019 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 777
INFO 2025-02-11 11:20:13,077 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:13,077 ./lacommunaute/users/management/commands/populate_emaillastseen.py : processing clicked notifications
INFO 2025-02-11 11:20:13,082 ./lacommunaute/users/management/commands/populate_emaillastseen.py : will process 82 emails
INFO 2025-02-11 11:20:13,100 ./lacommunaute/users/management/commands/populate_emaillastseen.py : remaining: 0
INFO 2025-02-11 11:20:13,100 ./lacommunaute/users/management/commands/populate_emaillastseen.py : that's all folks!

@vincentporte vincentporte added the python Pull requests that update Python code label Jan 28, 2025
@vincentporte vincentporte self-assigned this Jan 28, 2025
Comment on lines 94 to 125
last_seen = collect_users_logged_in()
sys.stdout.write(f"users logged in: collected {len(last_seen)}\n")

last_seen += collect_event()
sys.stdout.write(f"events: collected {len(last_seen)}\n")

last_seen += collect_DSP()
sys.stdout.write(f"DSP: collected {len(last_seen)}\n")

last_seen += collect_upvote()
sys.stdout.write(f"UpVotes: collected {len(last_seen)}\n")

last_seen += collect_forum_rating()
sys.stdout.write(f"forum ratings: collected {len(last_seen)}\n")

last_seen += collect_post()
sys.stdout.write(f"posts: collected {len(last_seen)}\n")

last_seen += collect_clicked_notifs()
sys.stdout.write(f"clicked notifications: collected {len(last_seen)}\n")

dedup_last_seen_dict = deduplicate(last_seen)
sys.stdout.write(f"deduplication: {len(dedup_last_seen_dict)}\n")

dedup_last_seen_dict = remove_known_last_seen(dedup_last_seen_dict)
sys.stdout.write(f"remove known last seen: {len(dedup_last_seen_dict)}\n")

res = insert_last_seen(dedup_last_seen_dict)
sys.stdout.write(f"insert last seen: {len(res)}\n")

sys.stdout.write("that's all folks!\n")
sys.stdout.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C’est violent niveau mémoire de tout charger, mais vu la taille de la commu 🤷

J’aurais plutôt utilisé l’identifiant utilisateur comme clé d’un dict[username, namedtuple(last_seen, kind)] et itéré sur les éléments petit à petit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Il y a aussi l'approche que j'ai utilisé pour la création des groupes de suivi GPS -> j'annote les users et je fais des petits batchs aussi :
https://github.com/gip-inclusion/les-emplois/blob/master/itou/gps/management/commands/sync_follow_up_groups_and_members.py
(je dis plus ça pour la culture que pour recommander de faire pareil)

@vincentporte vincentporte force-pushed the 893-enregistrer-le-dernier-evenement-pour-un-email-dans-emaillastseen branch 4 times, most recently from c3cbf23 to 5705efa Compare February 10, 2025 14:30
@vincentporte vincentporte force-pushed the 895-archivage---hydratation-de-la-table-emaillastseen-avec-les-données-existantes branch 2 times, most recently from 86b57b0 to 6766ac0 Compare February 11, 2025 09:40
@vincentporte vincentporte force-pushed the 893-enregistrer-le-dernier-evenement-pour-un-email-dans-emaillastseen branch from 5705efa to 37aa773 Compare February 11, 2025 10:16
@vincentporte vincentporte force-pushed the 895-archivage---hydratation-de-la-table-emaillastseen-avec-les-données-existantes branch from 6766ac0 to 481f8b7 Compare February 11, 2025 10:17
@vincentporte vincentporte force-pushed the 893-enregistrer-le-dernier-evenement-pour-un-email-dans-emaillastseen branch from 37aa773 to 9a3efca Compare February 11, 2025 14:35
@vincentporte vincentporte force-pushed the 893-enregistrer-le-dernier-evenement-pour-un-email-dans-emaillastseen branch from 9a3efca to 7c972f3 Compare February 11, 2025 16:39
@vincentporte vincentporte force-pushed the 893-enregistrer-le-dernier-evenement-pour-un-email-dans-emaillastseen branch from 7c972f3 to 6bd0e7d Compare February 11, 2025 17:07
@vincentporte vincentporte force-pushed the 895-archivage---hydratation-de-la-table-emaillastseen-avec-les-données-existantes branch from c7beaf8 to 399bb6c Compare February 11, 2025 17:13
tonial
tonial previously approved these changes Feb 11, 2025
lacommunaute/users/enums.py Outdated Show resolved Hide resolved
Comment on lines 94 to 125
last_seen = collect_users_logged_in()
sys.stdout.write(f"users logged in: collected {len(last_seen)}\n")

last_seen += collect_event()
sys.stdout.write(f"events: collected {len(last_seen)}\n")

last_seen += collect_DSP()
sys.stdout.write(f"DSP: collected {len(last_seen)}\n")

last_seen += collect_upvote()
sys.stdout.write(f"UpVotes: collected {len(last_seen)}\n")

last_seen += collect_forum_rating()
sys.stdout.write(f"forum ratings: collected {len(last_seen)}\n")

last_seen += collect_post()
sys.stdout.write(f"posts: collected {len(last_seen)}\n")

last_seen += collect_clicked_notifs()
sys.stdout.write(f"clicked notifications: collected {len(last_seen)}\n")

dedup_last_seen_dict = deduplicate(last_seen)
sys.stdout.write(f"deduplication: {len(dedup_last_seen_dict)}\n")

dedup_last_seen_dict = remove_known_last_seen(dedup_last_seen_dict)
sys.stdout.write(f"remove known last seen: {len(dedup_last_seen_dict)}\n")

res = insert_last_seen(dedup_last_seen_dict)
sys.stdout.write(f"insert last seen: {len(res)}\n")

sys.stdout.write("that's all folks!\n")
sys.stdout.flush()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Il y a aussi l'approche que j'ai utilisé pour la création des groupes de suivi GPS -> j'annote les users et je fais des petits batchs aussi :
https://github.com/gip-inclusion/les-emplois/blob/master/itou/gps/management/commands/sync_follow_up_groups_and_members.py
(je dis plus ça pour la culture que pour recommander de faire pareil)



def keep_most_recent_tuple(last_seen):
return {tup[0]: tup for tup in sorted(last_seen, key=lambda tup: (tup[0], tup[1]))}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pas mal comme méthode pour garder le dernier élément de sorted 👍

Base automatically changed from 893-enregistrer-le-dernier-evenement-pour-un-email-dans-emaillastseen to master February 12, 2025 10:50
@vincentporte vincentporte dismissed tonial’s stale review February 12, 2025 10:50

The base branch was changed.

@vincentporte vincentporte merged commit 28c4cb2 into master Feb 12, 2025
6 checks passed
@vincentporte vincentporte deleted the 895-archivage---hydratation-de-la-table-emaillastseen-avec-les-données-existantes branch February 12, 2025 11:04
vincentporte pushed a commit that referenced this pull request Feb 13, 2025
🤖 I have created a release *beep* *boop*
---


##
[2.21.0](v2.20.0...v2.21.0)
(2025-02-13)


### Features

* creation de la table de suivi RGPD
([#892](#892))
([dfc378a](dfc378a))
* enregistrer le dernier evenement connu pour une adresse mail
([#894](#894))
([9aa316c](9aa316c))
* extension des droits des utilisateurs de l'équipe - Documentation et
Partenaires
([#914](#914))
([95bea05](95bea05))
* extension des droits des utilisateurs de l'équipe
([#913](#913))
([d5add12](d5add12))
* hydrater la table du dernier évènement connu pour un email à partir
des évènements passés
([#896](#896))
([28c4cb2](28c4cb2))
* masquer le filtre sur les reponses certifiees
([a648647](a648647))
* masquer le filtre sur les reponses certifiees
([#908](#908))
([84e9a42](84e9a42))
* **notification:** enregistrer les retours sur les notifs mails
([#891](#891))
([ec67205](ec67205))
* remplacer un nom de domaine expiré dans les données utilisateurs
([#907](#907))
([fb886e1](fb886e1))
* **search:** ajout de liens de recherche vers le site des emplois
([#879](#879))
([c6ac2a3](c6ac2a3))
* supprimer les traces d'envoi d'emails de plus de 90 jours
([#902](#902))
([8abf9ef](8abf9ef))


### Bug Fixes

* suppression de type dans `EmailLastSeen`
([#905](#905))
([a21b502](a21b502))
* televersement des fichiers en erreur
([#899](#899))
([d9951e1](d9951e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
vincentporte pushed a commit that referenced this pull request Feb 13, 2025
🤖 I have created a release *beep* *boop*
---


##
[2.21.0](v2.20.0...v2.21.0)
(2025-02-13)


### Features

* creation de la table de suivi RGPD
([#892](#892))
([dfc378a](dfc378a))
* enregistrer le dernier evenement connu pour une adresse mail
([#894](#894))
([9aa316c](9aa316c))
* extension des droits des utilisateurs de l'équipe - Documentation et
Partenaires
([#914](#914))
([95bea05](95bea05))
* extension des droits des utilisateurs de l'équipe
([#913](#913))
([d5add12](d5add12))
* hydrater la table du dernier évènement connu pour un email à partir
des évènements passés
([#896](#896))
([28c4cb2](28c4cb2))
* masquer le filtre sur les reponses certifiees
([a648647](a648647))
* masquer le filtre sur les reponses certifiees
([#908](#908))
([84e9a42](84e9a42))
* **notification:** enregistrer les retours sur les notifs mails
([#891](#891))
([ec67205](ec67205))
* remplacer un nom de domaine expiré dans les données utilisateurs
([#907](#907))
([fb886e1](fb886e1))
* **search:** ajout de liens de recherche vers le site des emplois
([#879](#879))
([c6ac2a3](c6ac2a3))
* supprimer les traces d'envoi d'emails de plus de 90 jours
([#902](#902))
([8abf9ef](8abf9ef))


### Bug Fixes

* suppression de type dans `EmailLastSeen`
([#905](#905))
([a21b502](a21b502))
* televersement des fichiers en erreur
([#899](#899))
([d9951e1](d9951e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Archivage - hydratation de la table EmailLastSeen avec les données existantes
3 participants