Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5120 – Create Tags in the background #2005

Merged
merged 26 commits into from
Sep 23, 2024

Conversation

vasconsaurus
Copy link
Contributor

@vasconsaurus vasconsaurus commented Aug 27, 2024

Description

Context

While working on API/Zapier Error: The app did not respond in-time. It may or may not have completed successfully we thought tags creation could be causing the slow response.

We couldn’t pinpoint it as the cause, but it’s still a good performance improvement and should help improve the response time. So we are moving forward with it.

Implementation Notes

Sidekiq’s delayed extensions were deprecated in version 5 and removed in 7. So we are taking this opportunity to create a GenericJob, and eventually substitute the instances of .delay/.delay_for we have.

References: 5120, 4959

How has this been tested?

rails test test/models/tag_test.rb:288
rails test test/workers/generic_worker_test.rb

Things to pay attention to during code review

Please describe parts of the change that require extra attention during code review, for example:

  • File FFFF, line LL: This refactoring does this and this. Is it consistent with how it’s implemented elsewhere?
  • Etc.

Checklist

  • I have performed a self-review of my own code
  • I have added unit and feature tests, if the PR implements a new feature or otherwise would benefit from additional testing
  • I have added regression tests, if the PR fixes a bug
  • I have added logging, exception reporting, and custom tracing with any additional information required for debugging
  • I considered secure coding practices when writing this code. Any security concerns are noted above.
  • I have commented my code in hard-to-understand areas, if any
  • I have made needed changes to the README
  • My changes generate no new warnings
  • If I added a third party module, I included a rationale for doing so and followed our current guidelines

@vasconsaurus vasconsaurus changed the title WIP – 4959 – Create Tags in the background WIP – 5120 – Create Tags in the background Aug 27, 2024
@vasconsaurus
Copy link
Contributor Author

@caiosba I tried to summarize what I've been thinking and some of the issues I went through.
Could you take a look and see if this makes any sense at all?

On where I think this work could go

This is a work in progress. I thought it would be nice if we could maybe have a wrapper or something, so the usage of this worker would be similar to delay. So instead of GenericWorker.perform_in... we could call something like Tag.run_in...

Tests

I wrote two tests to try to guide this work, I think they make sense?

rails test test/models/tag_test.rb:288
rails test test/workers/generic_worker_test.rb

On Tag creation:

Contexts

  • We have different contexts: Creating tags when creating a project media, creating tags by themselves.
  • Since we are talking about this from the Zapier/Check Integration context, that means creating a tag when we create a project media.

How it works

  • If a project media has a set_tags array, it runs an after_create callback create_tags: I thought it made sense to have that callback happen in the background.

Annotations

  • This part of the codebase touches annotations.
  • I had an issue because originally create_tags sent the entire project_media object as annotator. Here:
    self.set_tags.each { |tag| Tag.create!(annotated: self, tag: tag.strip, skip_check_ability: true) } if self.set_tags.is_a?(Array)
  • But I looked through our graphql queries wiki and saw that just sending the annotator_type and annotator_id should be enough to create a Tag. Here: https://github.com/meedan/check/wiki/GraphQL-query-examples#add-a-tag-to-an-item
  • I updated create_tags to send only those two values, and it seems to work.
  • So I didn't have to touch annotations, but I do wonder if we are getting rid of them, if this change here makes sense. Or if we want to go another way about this.

Copy link
Contributor

@caiosba caiosba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vasconsaurus , the direction is correct, I left a suggestion for better encapsulation and more clear code. Please let me know if it makes sense to you.

@@ -263,6 +263,6 @@ def create_claim_description_and_fact_check
end

def create_tags
self.set_tags.each { |tag| Tag.create!(annotated: self, tag: tag.strip, skip_check_ability: true) } if self.set_tags.is_a?(Array)
self.set_tags.each { |tag| GenericWorker.perform_in(1.second, 'Tag', 'create!', annotated_type: 'ProjectMedia' , annotated_id: self.id, tag: tag.strip, skip_check_ability: true) } if self.set_tags.is_a?(Array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: I think it's going to be easier to encapsulate and to unit-test if we wrap the whole execution in a single worker run. Something like:

def create_tags
  if self.set_tags.is_a?(Array)
    GenericWorker.perform_in(1.second, 'ProjectMedia', 'create_tags_in_background', self.id, self.set_tags.to_json)
  end
end

def self.create_tags_in_background(project_media_id, tags_array_as_json)
  project_media = ProjectMedia.find(project_media_id)
  tags = JSON.parse(tags_array_as_json)
  tags.each { |tag| Tag.create! annotated: project_media, tag: tag.strip, skip_check_ability: true) }
end

Rationale:

  • For clarity, I think it's important for the signature of GenericWorker to be explicitly class/method/arguments (I see it's implicit now) - more on that in the next comment
  • Each tag in a separate background job run means background job overhead for a small task, and it can also lead to inconsistencies

Comment on lines 5 to 7
def perform(klass_name, *args)
klass = klass_name.constantize
options = args.extract_options!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for clarity: The signature could be explicitly klass_name, klass_method, method_args.

@caiosba
Copy link
Contributor

caiosba commented Aug 27, 2024

@jayjay-w @melsawy , as we talked, please support/follow @vasconsaurus here too.

@melsawy
Copy link
Contributor

melsawy commented Aug 28, 2024

@jayjay-w @melsawy , as we talked, please support/follow @vasconsaurus here too.

Sure Caio

@vasconsaurus
Copy link
Contributor Author

@melsawy @jayjay-w could you please review? (it is still a work in progress)
Please take into consideration @caiosba requests, and I added some more context in this comment

app/models/project_media.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@melsawy melsawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments related to User.current and tests

@vasconsaurus
Copy link
Contributor Author

@melsawy @caiosba

I have made some updates based on Sawy's review, and started to try to write a wrapper method.
Currently it works, and I added a couple of new tests, but it doesn't work as a drop-in replacement.

I think we could try to implement something like this

ProjectMedia.run_later_in(time).create_tags(project_media_id: self.id, tags_json: self.set_tags.to_json, user_id: self.user_id)

But, even then, we might not or want a drop in replacement? We could pass an object to delay_for, but that is, I think, one of the reasons they removed it. So we might not want to replicate that behavior, and update our code accordingly, what to you think?

Besides that, I have other questions:

  • What do you think about this so far?
  • Are there any things you think are missing or anything to change?
  • How should I move forward?
  • On errors: My tests pass and the tags get greated but I'm seeing at least a couple errors on Sidekiq:
Screenshot 2024-09-13 at 16 15 32

Let me know if you have any questions 🙂

@@ -528,6 +528,19 @@ def add_nested_objects(ms)
ms.attributes[:requests] = requests
end

def self.create_tags(**params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make the code more clear and more predictable, it's better to explicitly list all arguments that the method expects, instead of **params (**params is more common in meta-programming).


class ProjectMedia8Test < ActiveSupport::TestCase
def setup
super
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests will run faster if you don't run everything that super runs. I suggest you start with empty setup and teardown methods and then add only the steps you actually need:

def setup
end

def teardown
end

test/models/project_media_8_test.rb Show resolved Hide resolved
test/models/project_media_8_test.rb Show resolved Hide resolved

class GenericWorkerTest < ActiveSupport::TestCase
def setup
super
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests will run faster if you don't run everything that super runs. I suggest you start with empty setup and teardown methods and then add only the steps you actually need:

def setup
end

def teardown
end

test/workers/generic_worker_test.rb Show resolved Hide resolved
test/workers/generic_worker_test.rb Show resolved Hide resolved
@caiosba
Copy link
Contributor

caiosba commented Sep 14, 2024

Great stuff @vasconsaurus ! The error you're seeing in Sidekiq usually means that code was changed but the application was not restarted after that. Just remember to do touch tmp/restart.txt every time you chance code and so you should never see this error in Sidekiq.

I agree we should not do a drop-in replacement. Let's have our own implementation with our own signature. It will be easier to track during the migration too.

I added other comments directly to the code - nothing that changes the direction, and I also don't see anything missing. Once comments are addressed, all tests are passing and all the code is covered, I think this is ready for final review and tests. Great work!

I tried to create a general structure of the worker and two tests:
- one test just to make sure the GenericWorker works
- one test to make sure tags are created in the background

BUT:
- The tests are wonky
- The worker test passes but shouldn't
- The tag tests doesn't work because create_tags is private:
   - I think that is the place where/when we create the tags, seem we use
   it in a after_create callback. But I'm  not completelly sure, and if I'm
   right, not sure the best way to test it.
I think the tags are created by a after_create callback inside ProjectMedia.
I was having a bit of trouble testing that, so I added the creating of tags
to the sample_data method, ans tested that it worked.

Now that I know that works, I'll try moving it to the background.
- I added a new test for GenericWorker and specified which models we are
testing. I think that might be easier than a too generic test.
- I think now the tests make sense (though I'm not sure the tags test
should be in there or inside project media tests)
- Both tests trying to create tags fail, because we need to pass a
whole object to annotations, so that is an issue
I'm unsure if it's ok to change this, but looking in our GraphQL wiki we
should be able to create a Tag by sending the annotated_type and id.

If we update create_tags to send those two values instead of the entire
object, we are able to do it in the background job.
When creating tags while creating an item, we should wrap the whole
execution in a single worker run.

Each tag in a separate background job run means background job
overhead for a small task, and it can also lead to inconsistencies
What we are testing is that tags are being created when we create a
project media.

The method we use to do that is a method called by project_media,
so I thought it made more sense to move the tests.
Each tag in a separate background job run means background job overhead
for a small task, and it can also lead to inconsistencies
We should notify Sentry when we fail to create the tags.
(CheckSentry.notify also logs the error)

Added tests:
- make sure create_tags_in_background creates tags associated with the project_media
- make sure an error is not raised, but an error is sent to Sentry

Added .with_indifferent_access to make sure we are able to get the data
from the params hash.
We log the revisions for tag creation in case User.current exists
otherwise no logs for this action
- call the notify inside else, before I was always sending a sentry error
- it's better to send the project_media_id to Sentry, even if there isn't
a project media object, an id might exist
I think the method that was called 'in_background' is actually foreground
and vice-versa.
…d of **params

while doing this I found a bug:
GenericWorker worked for methods that took hashes as a parameter, but not
for methods that took standalone parameters. (not sure if standalone parameters
is the correct way to call this)
Tests will run faster if you don't run everything that super runs.
request here: #2005 (comment)
this way we make sure the team was created, the background job did what
it was supposed to.
@vasconsaurus vasconsaurus marked this pull request as ready for review September 20, 2024 14:02
@vasconsaurus vasconsaurus changed the title WIP – 5120 – Create Tags in the background 5120 – Create Tags in the background Sep 20, 2024
Copy link
Contributor

@caiosba caiosba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vasconsaurus for addressing all the comments from my last review! It's looking great! Just one last ask here: please add a test for this case:

klass.public_send(klass_method)
.

for check-api/app/workers/generic_worker.rb:15
klass.public_send(klass_method)
def perform(klass_name, klass_method, *method_args)
klass = klass_name.constantize
options = method_args.extract_options!.with_indifferent_access
if options
Copy link
Contributor

@caiosba caiosba Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasconsaurus the tests you added in your last commit are still not covering line 15 - I found out that the problem is here... options is always a hash, so if options is always true... you should replace this conditional by something like unless options.blank?, because when there are no parameters, options value is {}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caiosba I updated this, but have a question regarding tests:
The two tests I added were passing before, so they are not really working as expected. How could I fix it?

Also, is there any extra logging or error handling that I should add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Manu. No, I don't think any additional logging or error handling is needed here. But for the test, I suggest you actually assess that the method is executed. For now your test for the no-arguments case just verifies that no exception is raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true 🤦‍♀️ still, do you think I should verify, assert anything else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true 🤦‍♀️ still, do you think I should verify, assert anything else?

In the test I think you should make sure that the method is executed... for example, Blank.create! is a class method that doesn't take any arguments and you can use assert_difference 'Blank.count' do to make sure it was executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to use assert_difference both here and for the test in line 28.
But one last comment: even when the test wasn't hitting the else, because I was using if options, the test still succeeded. So blank was still created.

So I guess what I mean is, this test makes sure Blank is created, but it doesn't make sure of which branch we are hitting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I think that the other block was still able to handle it just fine. That said, maybe the if/else is not even needed, but it's more readable/clear to have them separate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, so I was wondering about that. I think it might be nice to have the option with just the one straightforward step, but it isn't really needed. Should we keep it or remove it (the conditional)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I said, I think it's more clear to have the conditional. Ruby Meta programming can be harder to read, otherwise :)

options is always a hash, so `if options` is always `true`.
replaced this conditional by `unless options.blank?`,
because when there are no parameters, options value is `{}`.
@caiosba
Copy link
Contributor

caiosba commented Sep 23, 2024

@melsawy , since you requested changes before, please re-review this PR.

Copy link
Contributor

@melsawy melsawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @vasconsaurus
You can merge once apply Caio's requests and check test coverage.

@vasconsaurus vasconsaurus merged commit a04be2d into develop Sep 23, 2024
10 checks passed
@vasconsaurus vasconsaurus deleted the 4959-create-tags-in-background branch September 23, 2024 17:06
vasconsaurus added a commit that referenced this pull request Sep 25, 2024
Pretty straight forward. Sets the maximum retry count to 3.

Note
We want to add custom retry logic, but that works differently between Sidekiq 5 and 6.
So we think it is better to focus later on the Sidekiq upgrade, and until there, worker instances can handle their exceptions instead of relying on the generic worker for that.

References: 5120, #2005
PR: 2052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants