Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition work_orders #1392

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

rorymckinley
Copy link
Collaborator

@rorymckinley rorymckinley commented Nov 15, 2023

Notes for the reviewer

I tried to use as much of the existing functionality from platform as possible, to reduce the risk (given that we do add and drop tables). The SQL query that was part of PartitionTableService.remove_empty started misbehaving yesterday and I could not get to the bottom of it in a reasonable amount of time, so I decided to extract some of the logic into Elixir.

The new methods to support this were added as public methods - I felt that the improved granularity of testing outweighed the polluting of the public API :).

Also, after completing the above change, I realised that I had built a footgun into the code through a beginner's mistake in the elixr code that I added. This resulted in me extending the tests to ensure that they will hopefully catch my next stupid mistake.

Related issue

Fixes #

Review checklist

  • I have performed a self-review of my code
  • I have verified that all appropriate authorization policies have been implemented and tested
  • If needed, I have updated the changelog
  • Product has QA'd this feature

end

def drop_partition(parent, partition) do
unless valid_chars?(parent) && valid_chars?(partition) do
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String interpolation in queries makes me nervous :)- so I added some simple validation against the more obvious forms of injection (as I know that we can't bind params in DDL statements) just in case this method ends up being used someplace else.

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (402fd7c) 86.08% compared to head (8a0199c) 86.16%.

Files Patch % Lines
...b/lightning/maintenance/partition_table_service.ex 97.43% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1392      +/-   ##
==========================================
+ Coverage   86.08%   86.16%   +0.07%     
==========================================
  Files         218      220       +2     
  Lines        6452     6496      +44     
==========================================
+ Hits         5554     5597      +43     
- Misses        898      899       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rorymckinley rorymckinley marked this pull request as draft November 16, 2023 05:28
@rorymckinley rorymckinley force-pushed the 1254-partition-work-orders-2 branch 6 times, most recently from 52ce270 to 3411599 Compare November 20, 2023 09:43
@rorymckinley rorymckinley marked this pull request as ready for review November 20, 2023 09:56
end

@impl Oban.Worker
def perform(%Oban.Job{args: %{"drop_older_than" => %{"weeks" => weeks}}})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get this method to work as originally implemented in platform so I modelled the arguments on the other perform variant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, this looks good to me though :)

end
end

defp all_relations() do
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up with a fair number of functions to manage the fixtures - so of which have overlaps. If the overlaps are resolved I am hoping that I could reduce the number of fixture methods.

However, I will need to extend (at least) the scenarios that cover :all use case as I partition the other tables - as such I was hoping to delay the cleanup until those scenarios have been catered for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats cool, if the functions end up living in this test file thats all good imo.

@@ -0,0 +1,17 @@
defmodule Lightning.AdminTools do
def generate_iso_weeks(start_date, end_date) do
Date.range(start_date, end_date)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was setting up tests for this class, I noticed that there was a small edge case that emerged when dates that are not mondays are passed in as arguments. I can't imagine that it was a big deal before, but the tweak to fix it was small, so I made an executive decision :).

Copy link
Member

@stuartc stuartc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice dude! This is awesome, I'll be wanting to test this against a backup of a real db before merging.

Are the migrations idempotent, i.e. if it borks out half way or we have to cancel it (on the unlikely case it takes too long) - will it leave behind a bunch of schemas that must be deleted before running it again? Not saying we have to cater for this right now; but I'd like to know more about the expected sad path.

end

@impl Oban.Worker
def perform(%Oban.Job{args: %{"drop_older_than" => %{"weeks" => weeks}}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, this looks good to me though :)

@impl Oban.Worker
def perform(%Oban.Job{args: %{"drop_older_than" => %{"weeks" => weeks}}})
when is_integer(weeks) do
upper_bound = Timex.shift(DateTime.utc_now(), weeks: weeks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try using DateTime.shift instead of Timex - nothing against what you've done here - Timex was the gold standard for all things Date but the built in DateTime has come a long way since then. And we have had issues with Timex in compiled environments (like without having the entire app running, like in migrations on production); if you can achieve a passing test suite by just replacing the use of Timex then we should be good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartc I have expunged all the uses of Timex from prod code, with one exception - I can't find an equivalent to iso_week on either Date or DateTime - if you think this is going to be problematic, I can try and handroll an equivalent?

@rorymckinley
Copy link
Collaborator Author

Nice dude! This is awesome, I'll be wanting to test this against a backup of a real db before merging.

Are the migrations idempotent, i.e. if it borks out half way or we have to cancel it (on the unlikely case it takes too long) - will it leave behind a bunch of schemas that must be deleted before running it again? Not saying we have to cater for this right now; but I'd like to know more about the expected sad path.

Thanks for all the pointers @stuartc - I tested what I thought would the worst case by inserting a sleep in the migration just before the last up execution. I then gave it a few seconds, did a Crtrl-C followed by hitting (k)ill a few times until my screen filled with stackdumps and the thing stopped. The migration cleaned up after itself and left no half-backed artefacts. I also tried an experiment where the last instruction was broken SQl and it produced the same result - so based on this potted example, I think we should be ok. That being said: I built things based on a combination of two criteria :
(1) The tables have less than ten thousand records in them.
(2) Downtime can be tolerated.

If either of those are no longer true, I am happy to revisit the implementation.

I have made the PR changes in a new commit for ease of re-review, if you ping me before you merge I can rebase it into the original commit.

@rorymckinley rorymckinley marked this pull request as draft December 4, 2023 09:22
@rorymckinley
Copy link
Collaborator Author

This PR is being parked, pending a decision on the further partitioning of tables.

Amongst other things, this branch contains additional tests and small code tweaks (when compared to the original code) that may be useful when doing date-based partitioning.

@rorymckinley
Copy link
Collaborator Author

This code needs to still be tested against a database with a significant number of records in the work_orders table to gauge performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Icebox
Development

Successfully merging this pull request may close these issues.

2 participants