Investigate deterministic QEMU behavior #307

rafalcieslak · 2017-05-13T14:01:55Z

I've stumbled upon:

http://wiki.qemu.org/index.php/Features/record-replay

Summary:

Deterministically replays whole system execution and all contents of
the memory, state of the hardware devices, clocks, and screen of the VM.

Writes execution log into the file for later replaying for multiple times
on different machines.

Performs deterministic replay of all operations with keyboard and mouse input devices.

This feature is very interesting for us! Chances are it could help us drop OVPsim entirely. It does need a closer look first. Some questions that need answering:

Does it support MIPS? Some sources say it does, some say it does not.
How well does it work with GDB? Do we get cycle-exact behavior on each replay?
How large are the replay files, in practice? How can we download such replay file from Travis, to investigate a test failure? Is it okay replay with a different qemu version than recorded?
What would be a convenient way of integrating replays with our workflow?
Is record-replay everything we need for testing and debugging, or do we use OVPsim's determinism in some other ways this feature would not provide?

Additional documentation can be found here:

https://github.com/qemu/qemu/blob/master/docs/replay.txt

@cahirwpz: Suppose the answer to all questions above is "Perfect for our needs". Would you then consider dropping support for OVPsim as a viable option? If so, then this task is probably of a high priority.

EDIT: Also:

6. Do we even need recording? Using -icount shift=N might be enough to enable deterministic behavior.

The text was updated successfully, but these errors were encountered:

rafalcieslak · 2017-05-13T14:42:54Z

How large are the replay files, in practice? How can we download such replay file from Travis, to investigate a test failure? Is it okay replay with a different qemu version than recorded?

Well, the replay files aren't small. Running test=all with -icount shift=7 produces a 14MB file, it then grows about 5MB/second. Tunning test=all repeat=5 produces a 40MB file.

The replays are qemu version sensitive, thus if Travis provided us with a recoding for a test failure, we would need to debug it using the exact same qemu version as installed on Travis. That's not a big problem though, we can install multiple qemu versions locally, or deploy a particluar new qemu version onto Travis just as we do with the toolchain.

The problem might be with getting these files back from Travis. 10MB is far too much to push via raw output. Travis supports uploading result files to S3, but we would need to pay for AWS storage. Maybe it would be possible to have Travis upload results to the mimiker server, but we'd need to figure out a way of doing it securely so that nobody else can push junk onto the server.

rafalcieslak · 2017-05-13T15:00:35Z

Does it support MIPS? Some sources say it does, some say it does not.

It seems to.... Kinda. With -icount shift=7,sleep=off I seem to receive timer interrupts at the exact same instruction every time. I'll need to test this in more detail (e.g. prepare a non-deterministically failing test and see whether the ktest seed is enough to reproduce it 100%), but initial observations make me very hopeful!

There is a problem with replaying though. It seems to be supported, but something's off and the kernel gets stuck in initrd_build_tree_and_names, looping forever. If that's a bug on our side, then it should also emerge during a recording. But maybe qemu provides the initrd somehow differently for a replay run?

cahirwpz · 2017-05-13T16:03:16Z

That's an awesome finding you've made!

@cahirwpz: Suppose the answer to all questions above is "Perfect for our needs". Would you then consider dropping support for OVPsim as a viable option? If so, then this task is probably of a high priority.

Knowing how deeply device emulation is broken in OVPsim - more than happily ;-)

cahirwpz · 2017-05-18T11:40:59Z

For the record, the decision of dropping OVPsim will automatically render issues #293 and #286 obsolete.

rafalcieslak · 2017-05-27T13:23:23Z

While debugging #328 I've met with a lot of random synchronization bugs that were unreproducible. Out of curiosity, I added -icount shift=7 to QEMU options, and now each tests seed causes the kernel to crash in identical manner! Running ./launch with -d option allows me to immediately debug a repro case just by passing the seed.

I'd say we can safely enable -icount shift=7 in master (even if we don't want to support recording/playback ATM), and see if/how it helps us. It has unnoticeable performance penalty, and there is a very high chance it may help us reproduce Travis problems. Then, after some time, we'll examine if this setup is right for us and how to proceed with OVPsim. What do you think?

cahirwpz · 2017-05-27T13:34:31Z

I'm in favour of enabling the flag in master. Please do so!

rafalcieslak · 2017-05-27T13:44:17Z

There is one detail that worries me, though. Making QEMU deterministic makes it 1000x less likely to trigger synchronization bugs. See #328, as soon as I enabled icount in a recent commit, Travis build no longer fails!

This is related to why we had such a hard time finding bugs on OVPsim - it's because of it's cycle-exact nature.

I don't know know whether the reduced chance of sync problems is a bad thing (will cause us to detect less problems), or actually a benefit (QEMU default chaotic execution is unrepresentative of hardware, and it triggers buggy scenarios that never would happen on a real machine). I don't want to diminish Travis' capability of bug-finding by enabling icount, but (maybe?) it actually fixes stuff we shouldn't pay attention to?

Another observation I have is that Travis uses QEMU 2.0. When testing #328 locally with the same version of QEMU I observed no problems. However, switching to a recent QEMU 2.8.1 the run_tests.py script was able to find (100% reproducible!) problems. Therefore if we decide to enable icount we'll need to find a way to get a newer QEMU on Travis.

rafalcieslak added the research label May 13, 2017

rafalcieslak self-assigned this May 13, 2017

rafalcieslak added the important label May 13, 2017

cahirwpz added this to the Summer 2017 milestone May 31, 2017

cahirwpz removed this from the Summer 2017 milestone Dec 20, 2018

cahirwpz removed the important label Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate deterministic QEMU behavior #307

Investigate deterministic QEMU behavior #307

rafalcieslak commented May 13, 2017 •

edited

Loading

rafalcieslak commented May 13, 2017

rafalcieslak commented May 13, 2017

cahirwpz commented May 13, 2017

cahirwpz commented May 18, 2017

rafalcieslak commented May 27, 2017

cahirwpz commented May 27, 2017

rafalcieslak commented May 27, 2017

Investigate deterministic QEMU behavior #307

Investigate deterministic QEMU behavior #307

Comments

rafalcieslak commented May 13, 2017 • edited Loading

rafalcieslak commented May 13, 2017

rafalcieslak commented May 13, 2017

cahirwpz commented May 13, 2017

cahirwpz commented May 18, 2017

rafalcieslak commented May 27, 2017

cahirwpz commented May 27, 2017

rafalcieslak commented May 27, 2017

rafalcieslak commented May 13, 2017 •

edited

Loading