-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests by compiling and calling Ghostferry from Ruby #64
Conversation
558c297
to
080c76a
Compare
080c76a
to
78bdc7d
Compare
78bdc7d
to
6198812
Compare
68f4f0a
to
23a602d
Compare
a456cd5
to
2b25a1d
Compare
3f45878
to
e4c88ce
Compare
e4c88ce
to
2ef8987
Compare
It would be great if I can get some reviews on this PR. It's a fairly independent piece that we'll need regardless what approach we take to implement the rest of #75 as we'll need a mechanism to test. At least half of this PR is reasonably trivial: direct translation from Go (DataWriter, TrivialIntegrationTest, TestCase), glue code to work with database (DbManager). The other half is a simple implementation of a Ghostferry with unix socket for status reporting. For future note: the integrationferry.Ferry.Main could be a method we'll eventually port back into ghostferry.Ferry, to simplify how Ghostferry application can be run as per an idea @pushrax had sometime ago. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the overall approach of running integration tests using a separate (ruby) process. Questioning the necessity of the socket communication and being event-driven as it seems to introduce a good amount of complexity, but apart from that this looks good to me. I can't speak to the conventions or idioms of ruby testing as much as others, so I'll defer that to the other reviewers.
StatusDone string = "DONE" | ||
|
||
// Could be sent by multiple goroutines in parallel | ||
StatusBeforeRowCopy string = "BEFORE_ROW_COPY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these statuses actually needed and used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will be useful. For example, if we want to terminate Ghostferry in a test after copy has begun (in a later PR):
@ghostferry.on_status(Ghostferry::Status::AFTER_ROW_COPY) do
batches_written += 1
if batches_written >= 2
@ghostferry.send_signal("TERM")
end
end
Furthermore it is the intention to eventually port all existing Ghostferry integration tests into this framework. Lots of those Go tests uses the same statuses.
*ghostferry.Ferry | ||
} | ||
|
||
// ========================================= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work setting all of this up. It clearly works and does what it needs to do. However, I'm not sure we really need to setup and manage a socket to communicate between ghostferry and the ruby master process. Could the same thing not have been achieved using signals (SIGSTOP
, SIGCONTv
, and SIGUSR1/2
), ENV vars or even files? This just seems like significant overhead and complexity that could be simplified using other methods.
Also, I'm not sure we really need to make it event-driven (do we really need to know when each row is written?) as much as simply communicating overall state, which I again feel could be communicated using the other methods mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand the need for this custom communication protocol.
Another option is to use the already existing HTTPCallback
facility, similar to how the sharding
package uses it in sharding/testhelpers/unit_test_suite.go
and how the shop mover integration works (see ghostferry.rb
in core).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, this ruby testing setup/structure is super weird to me. Can we follow some existing patterns instead of hacking together a bunch of binaries? I understand it works but please let's refactor to something that our colleagues can understand quickly.
What I expect:
bundle exec rake test
in the root of the repo runs all ruby tests (seeRakefile
in nginx-routing-modules)require 'test_helper'
instead ofrequire 'ghostferry_integration'
- All ruby test files follow the pattern
*_test.rb
and live undertest/integration
ortest/unit
- Avoid implicit assumptions in tests such as
@dbs
and@ghostferry
. It's fine to have them in one file (the base test case) but not all over the rest of the test cases.
I can provide a more detailed review when some of these concerns are taken care of
By the way, you may want to check out and possibly port some of the shop mover tests from core |
15c373e
to
3901412
Compare
Makes room to make the tests more like ruby tests, which will be employed heavily.
Instead of running integration tests directly in Go, we run it in Ruby by calling Ghostferry as a subprocess. The idea is that we will be able to test the interrupt/resume work more easily as well as making integration tests easier to write. In order to allow the ruby tests to inject race conditions for testing purposes, we need a way for the ruby code to pause and restart the execution of Go code at strategic locations. This is done with a local Unix socket. At locations of interests, the Go code will send some string via the Unix socket to the Ruby code. The ruby code will perform some sort of callback (such as injecting data, or locking a row) and then send a command back to the Go code via the same Unix socket, allowing it to continue executing. Note that each time we send some status to the ruby server, we do it in a new connection. This is to avoid race conditions if one shared connection was used between all the goroutines we employ inside Ghostferry. Stdout and stderr are captured and they can be asserted against/examined, more helper methods can be created later to give better structures for the stdout/stderr output such as being able to recognize a panic, or recognize particular stages of execution from within the logs. Lastly, the intention is to replace all integration tests with this framework in the future as it would likely be more robust and cause less issues such as goroutine leaks during the go test run. However, for the time being we will have both integration tests in Go and Ruby.
The DataWriterHelper is a straight port from data_writer.go, which is a multithreaded data writer that writes random data to the database. The DbHelper helps connects to the database, seed it, and verify that the source matches the target so that tests don't have to implement it over and over again.
Added some trivial integration tests and some interrupt/resume tests. Note the interrupt and resume works on master as of this commit without reconciler because the schema didn't change.
3901412
to
f1465a7
Compare
0e2a271
to
bd60675
Compare
I've updated this PR to follow more of a Ruby convention:
|
end | ||
end | ||
|
||
class DataWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of this is translated directly from data_writer.go. The goal is eventually we can use Ruby tests only.
[source_metrics, target_metrics] | ||
end | ||
|
||
def table_metric(conn, table, sample_id: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of this name, any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much better 👍 I found the Ghostferry
helper very hard to read still but that can be massaged as we go along maybe.
Looks good to go functionally though!
assert_equal( | ||
source[DEFAULT_FULL_TABLE_NAME][:sample_row], | ||
target[DEFAULT_FULL_TABLE_NAME][:sample_row], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the checksum assertion sufficient? I'm having trouble understanding where you're going with the sample rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checksum is likely sufficient and the row count (especially since you can have a checksum of 0
if a table is empty), I'm just being defensive. I'll remove this line.
refute dumped_state["LastSuccessfulPrimaryKeys"].nil? | ||
refute dumped_state["CompletedTables"].nil? | ||
refute dumped_state["LastWrittenBinlogPosition"].nil? | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit odd to have as a global helper and could easily go out of date. You could make it a private method where it's being used right now for the first concern. For the going out of date... maybe we need to write some negative tests? i.e. "if i remove field X, then we can't resume correctly?" or something like that?
test/lib/go/minimal.go
Outdated
if err != nil { | ||
panic(err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep IntegrationFerry
in the same file for now? Or did you have a purpose in mind for having a separate IntegrationFerry
abstraction ?
test/helpers/ghostferry_helper.rb
Outdated
end | ||
|
||
# When using this method, you need to call it within the block of | ||
# GhostferryIntegration::TestCase#with_state_cleanup to ensure that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with_state_cleanup
doesn't exist?
end | ||
|
||
dumped_state = ghostferry.run_expecting_interrupt | ||
assert_basic_fields_exist_in_dumped_state(dumped_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we assert more than that at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to assert the state of the database? It's difficult to know exactly what's going on due to the semi-randomness of the data writer and binlog streamer.
I'll try to add a test without the datawriter where we can better check the state of the target database.
|
||
start_datawriter_with_ghostferry(datawriter, ghostferry) | ||
stop_datawriter_during_cutover(datawriter, ghostferry) | ||
ghostferry.run(dumped_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_test_table_is_identical
?
|
||
class InterruptResumeTest < GhostferryTestCase | ||
def test_interrupt_resume_with_writes_to_source | ||
seed_simple_database_with_single_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def setup
?
// These should be kept in sync with ghostferry.rb | ||
portEnvName string = "GHOSTFERRY_INTEGRATION_PORT" | ||
timeout time.Duration = 30 * time.Second | ||
maxMessageSize int = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unused
test/helpers/ghostferry_helper.rb
Outdated
|
||
# Keep these in sync with integrationferry.go | ||
ENV_KEY_PORT = "GHOSTFERRY_INTEGRATION_PORT" | ||
MAX_MESSAGE_SIZE = 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused as well
StatusRowCopyCompleted string = "ROW_COPY_COMPLETED" | ||
StatusDone string = "DONE" | ||
|
||
// Could be sent by multiple goroutines in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this distinction matter?
d3b3280
to
652fdfd
Compare
I've addressed all the feedback as well as slightly reorganized the |
652fdfd
to
b8fb91a
Compare
Instead of running integration tests directly in Go, this PR enables us to run integration tests by calling Ghostferry from Ruby as a subprocess. This is needed because it is not possible for the the Go code to interrupt and resume itself.
Given that the Ruby integration test code is also much cleaner than the Go equivalent and suffer less issues such as goroutine leakage, we should prefer writing integration tests in Ruby from now on.
The Ghostferry subprocess right now is a separate binary from copydb/sharding. This decision was made because the tests needs to perform some actions (such as interrupting Ghostferry or injecting some data) in the middle of the run at strategic and reproducible times(1) and the base Ghostferry framework doesn't support it for now. In order to accomplish this, bidirectional communication needs to exists between the Ruby and Go code and this is done via a WEBrick::HTTPServer on the local machine.
Using this server, Ghostferry can send a message and pause until a reply is sent from the server. The pause allows race injection and other test code to execute.
All of this is wrapped up in the class
GhostferryHelper::Ghostferry
. To use it in tests, one can extend fromGhostferryTestCase
, where other helpers like the database connection, a parallel data writer, also exists.A test will look something like the following: