Add support for interactive problems #261

Holmes98 · 2024-01-28T00:53:15Z

Depends on #260. Based on the Communication task type from CMS.

This adds an interactive_processes field to the Evaluator class.

If interactive_processes=0 (default), functionality is unchanged (evaluator runs after the user/submission program).

If interactive_processes>0, we concurrently spawn that many user processes, each connected using pipes to the manager process. After all processes exit, execution stats are merged (time/memory is added, wall time is maximised, first failing status is used).

Unlike normal evaluators, interactive evaluators (which I'm referring to as the "manager" to match CMS's terminology) must always terminate gracefully, otherwise it is judged as an evaluator error (see here for relevant discussion).
- Need to handle SIGPIPE when writing to the user pipes, in case they crashed/exited early, etc.
- It is possible for the processes to deadlock if both attempt to read input at the same time (e.g. on forgetting to flush output, or just user error, example); in this case the user process will be terminated first by isolate on reaching the wall time limit, then the manager should read EOF.
There is currently some extra handling to make MLE/TLE report correctly after merging stats; it feels a bit hacky but it seemed like the easiest approach (happy to take suggestions though).
If interactive_processes>1, the process number is sent as a command line argument to each user program, but it can also be sent by the manager if preferred.
I'm using file descriptors 5+ for the user pipes, fd 4 for "expected output", and stdin for the case input. fd 3 is currently unused; it can be considered "reserved" in case we want to add more functionality later (e.g. for "actual output" that can be shown to the user for sample cases, or for staff to debug). I'm also considering swapping file descriptors stdin/3 for non-interactive evaluators (currently 3 => case input, stdin => actual output) so they match, but have left it for now (will require manually updating existing evaluators).
interactive_processes is currently limited to 2 to avoid exhausting server resources.

Notes on resource limits:

According to CMS, the manager's wall time limit should be greater than the sum of wall time limits of all user processes, and the manager's time limit should be greater than the sum of time limits of all user processes.
Each user process is allowed to execute for the total time limit, as the constraint on total time can only be enforced after all processes have terminated.
The wall time limit for user processes is lowered to match CMS. This reduces the time taken to judge submissions in the deadlock scenario mentioned above.

Holmes98 · 2024-01-28T00:54:07Z

I've marked this as a draft but am looking for feedback if anyone is able to review.

coveralls · 2024-01-28T00:55:43Z

coverage: 38.75% (+1.2%) from 37.523%
when pulling 5894a2b on interactive-problems
into 51571c5 on master.

BelgianSalamander · 2024-05-23T13:17:18Z

I have tested this on IOI 2018 Combo and IOI 2020 Stations. No glaring issues so far. I will try to review soon.

BelgianSalamander

This seems mostly correct overall.

I like the idea of fd 3 being used for "actual output" as for some problems it definitely makes sense to show some extra feedback to users on sample cases (as currently the only feedback is the testcase outcome).

While testing this with the IOI problems, I found it difficult to debug evaluators and submissions as it can usually be unclear if the problem lies in the evaluator or the submission. I feel like it could be useful to have some way of showing the stderr of the user processes to staff to make it easier when making problems as otherwise there is no way to get any info directly from the user processes.

BelgianSalamander · 2024-05-24T01:51:54Z

app/workers/judge_submission_worker.rb

+        r['meta']['cg-mem'] = [r['meta']['cg-mem'],memory_limit*1024].min if r['meta'].has_key?('cg-mem')
+        r['meta']['max-rss'] = [r['meta']['max-rss'],memory_limit*1024].min


One small issue: The stored memory usage should probably get capped like this regardless of whether the previous status is OK. (r['meta']['status'] == 'OK' on line 358).

Otherwise, in some cases the memory usage displayed on a submission's page may be much higher than the memory limit, since this code won't be run if one of the program instances causes some other error (RE, TLE), on top of exceeding the combined memory limit.

Holmes98 force-pushed the interactive-problems branch 2 times, most recently from 49f169f to 3fdecfc Compare January 28, 2024 05:48

Base automatically changed from compiled-evaluators to master May 19, 2024 06:44

Holmes98 added 2 commits May 19, 2024 22:31

Add support for interactive problems

8756266

Add tests for interactive problems

5894a2b

Holmes98 force-pushed the interactive-problems branch from 3fdecfc to 5894a2b Compare May 19, 2024 10:32

BelgianSalamander reviewed May 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for interactive problems #261

Add support for interactive problems #261

Holmes98 commented Jan 28, 2024 •

edited

Loading

Holmes98 commented Jan 28, 2024

coveralls commented Jan 28, 2024 •

edited

Loading

BelgianSalamander commented May 23, 2024

BelgianSalamander left a comment

BelgianSalamander May 24, 2024

		r['meta']['cg-mem'] = [r['meta']['cg-mem'],memory_limit*1024].min if r['meta'].has_key?('cg-mem')
		r['meta']['max-rss'] = [r['meta']['max-rss'],memory_limit*1024].min

Add support for interactive problems #261

Are you sure you want to change the base?

Add support for interactive problems #261

Conversation

Holmes98 commented Jan 28, 2024 • edited Loading

Holmes98 commented Jan 28, 2024

coveralls commented Jan 28, 2024 • edited Loading

BelgianSalamander commented May 23, 2024

BelgianSalamander left a comment

Choose a reason for hiding this comment

BelgianSalamander May 24, 2024

Choose a reason for hiding this comment

Holmes98 commented Jan 28, 2024 •

edited

Loading

coveralls commented Jan 28, 2024 •

edited

Loading