Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible file descriptor race condition when running kirigami with many (>500) partitions #78

Open
alberdingk-thijm opened this issue Jan 15, 2022 · 1 comment
Labels
bug Something isn't working kirigami Pertains to verification via partitioning P-high High priority

Comments

@alberdingk-thijm
Copy link
Collaborator

For some examples, particularly those in benchmarks/SinglePrefix/Random and examples/TopologyZoo/kdl/, NV will randomly fail to parse an SMT reply if the input has a very large number of partitions it creates. The output looks something like the following:

...
Z3 stats:
  :memory       21.09
 :time         0.01)
Slice 0 took: 0.213472 secs to complete
Encoding network took: 0.021362 secs to complete
Compiling query took: 0.005419 secs to complete

error: Solver error: (error "line 5669 column 41: Invalid constant declaration: unknown sort 'Boo'")
Fatal error: exception Nv_lang.Console.Error("Solver error: (error \"line 5669 column 41: Invalid constant declaration: unknown sort 'Boo'\")")
Raised at Nv_lang__Console.error in file "src/lib/lang/Console.ml", line 22, characters 2-19
Called from Nv_smt__SmtUtils.parse_reply in file "src/lib/smt/SmtUtils.ml", line 638, characters 9-45
Called from Nv_smt__SmtKirigami.solve.solve_aux.get_ret in file "src/lib/smt/SmtKirigami.ml", line 42, characters 18-39
Called from Nv_smt__SmtKirigami.solve.solve_aux in file "src/lib/smt/SmtKirigami.ml", line 62, characters 9-51
Called from Nv__Main_defs.run_smt_classic_aux.get_answer in file "src/lib/Main_defs.ml", line 57, characters 10-74
Called from Nv_utils__Profile.time_profile_absolute in file "src/lib/utils/Profile.ml", line 11, characters 12-16
Called from Nv__Main_defs.run_smt_classic_aux.solve_slices in file "src/lib/Main_defs.ml", line 79, characters 8-169
Called from BatList.map.loop in file "src/batList.mlv", line 237, characters 28-33
Called from BatList.map in file "src/batList.mlv", line 240, characters 4-12
Called from Main.main_func in file "src/exe/Main.ml", line 24, characters 9-45
Called from Nv_utils__Profile.time_profile_absolute in file "src/lib/utils/Profile.ml", line 11, characters 12-16
Called from Main.main in file "src/exe/Main.ml", line 35, characters 11-76

Note the error line: it seems as though Z3 has been given part of an SMT query but the query was cut off. Looking at the code for SolverUtil.ml, which manages the creation of Z3 processes, we can see that it's doing some work setting up a pipe and creating a process to run Z3, which happens each time we call Smt.solver. Normally, this only happens once, but the kirigami code will happily call Smt.solver as many times as there are partitions, which can of course be hundreds or thousands of times.

We don't seem to report any Unix errors when doing this, which is a problem, as it makes it hard to see why this function might fail. That being said, given that I am seeing this error only when using many partitions, and not all the time (sometimes NV completes without any issues), I'm guessing we're seeing a race condition where the file descriptor of one process gets closed by accident, whether because we've hit the maximum number of file descriptors or because it was reassigned and closed by accident. I don't know much about how OCaml and Batteries' Unix library works, but I'll start off by trying to get more error reporting to see if this might indeed be the source of the bug.

@alberdingk-thijm alberdingk-thijm added bug Something isn't working P-high High priority kirigami Pertains to verification via partitioning labels Jan 15, 2022
@alberdingk-thijm
Copy link
Collaborator Author

Looks like this is related to #15, and may need to be addressed by closing descriptors as we go, as mentioned back around this commit.
@DKLoehr do we need to re-add code somewhere to call close_solver?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kirigami Pertains to verification via partitioning P-high High priority
Projects
None yet
Development

No branches or pull requests

1 participant