-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move some common database import subroutines to Script::Utils
#3200
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sensible to me, but I'd prefer if we had a test for the extra flags if they wouldn't be too complex. Although it does kinda feel maybe we can just drop the utf8 flag, but maybe we should check with @mayhem first if he remembers the reasons. The comment mentioning automodsaccepted
which I have never even seen suggests that this is ancient and the extra options might be unneeded.
Well, I'm not sure how to test those easily (I guess I'd have to construct, by hand, some broken dumps with artificial errors?), but I didn't change how they work either. If I somehow broke them, it'd only be by passing the options incorrectly (via I've never used these flags in all my time at MB, as far as I can remember. At the same time, I left them alone in the PR to avoid having to make a decision about removing them. :) Since it's unrelated to what I wanted to achieve here, which is making (Removing the draft status since this was finished when I submitted it.) |
2b06d5d
to
168972d
Compare
admin/MBImport.pl and admin/replication/ImportReplicationChanges contained very similar implementations of `ImportTable`, so it would be ideal to share them. I'd also like to use the same functionality in a future commit (to load dbmirror2 packets into temporary tables). The implementations in these two files did diverge slightly. For one, MBImport.pl's allowed fixing broken UTF-8 byte sequences. I'm not sure how necessary that is in 2024, or what the historical reasons for adding it were, but I kept the functionality behind a flag in `copy_table_from_file`. MBImport.pl's also supported the flags `$delete_first` (to empty the table before importing) and `$fProgress` (to control whether progress is shown). I've basically kept all of MBImport.pl's code, with these features behind `%opts` flags. I kept the definitions of `ImportTable`, but they now call `copy_table_from_file` internally. I couldn't replace all of the `ImportTable` calls with direct calls to `copy_table_from_file`, because `ImportTable` also updates statistics local to each file and has a different return value.
168972d
to
b810c1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mwiencek for this refactoring. It’s a shame we can’t find the reason for this code fixing UTF-8. Did you test your changes with correct dumps still?
Would it be reasonable to add support for setting these unused options such as show_progress
from the command line?
Problem
We have some identical or very similar-looking subroutines duplicated between admin/MBImport.pl and admin/replication/ImportReplicationChanges:
empty
(to tell if a table is empty) andImportTable
(to import a table from a file). The two implementations of the latter function differ slightly.For the incremental dumps (
MusicBrainz::Script::Role::IncrementalDump
), I'd like to reuse theImportTable
function to import a dbmirror2 packet into some temporary tables (to parse it).We currently parse the old packets by hand; I'm surprised it works at all (it probably doesn't in some cases) because I don't believe it fully or properly implement's parsing of PostgreSQL's
COPY
text
format. dbmirror2 packets contain JSON, which may contain JSON escape sequences underneathCOPY
'stext
format escape sequences, and the mixing of these can be tricky to parse. It's much, much easier to let PostgreSQL do the parsing!Solution
This just adds two shared subroutines,
is_table_empty
andcopy_table_from_file
toScript::Utils
, and modifies admin/MBImport.pl and admin/replication/ImportReplicationChanges to use them.Testing
We have existing automated tests that make heavy use of these scripts.
I did not test the
--fix-broken-utf8
or--ignore-errors
flags specifically.