location_report_builder getting stuck on get_filedir_count() #417

marxjohnson · 2021-04-23T11:09:44Z

We have noticed that our cron is running lots of instances of the generate_status_report scheduled task, that never seem to complete.

Doing some digging, I have found that location_report_builder reaches the stage where it runs $filesystem->get_filedir_count(), which runs the following shell command: find /srv/learn2syst.open.ac.uk/www/moodledata/filedir -type f | grep -c /
For some reason, the task hangs at this point and never completes. There is no error output. Our contianer running the cron script remains active, so it obviously doesn't think the cron script is complete.

More strange is that Moodle does seem to think the scheduled task is complete, it continues to run additional scheduled tasks, including further instances of generate_status_report. Watching runningtasks.php shows the task run for about 3-4 minutes, then disappear.

The text was updated successfully, but these errors were encountered:

marxjohnson · 2021-04-23T12:33:55Z

Doing a bit more digging, it appears that the generate_status_report task does eventually complete, but this section takes about an hour to complete on our filesystem.

$rowcount = $filesystem->get_filedir_count();
$rowsum = $filesystem->get_filedir_size();

In this case $rowcount is 495263 and $rowsum is 3436428000.

So it's being slow rather than dying, but it's still puzzling why Moodle doesn't seem to think that this task is still running, and continues to run additional instances along with other tasks.

brendanheywood · 2024-02-14T03:03:33Z

@marxjohnson did you ever get to the bottom of this? It sounds more like a lock factory problem than an objectfs issue?

marxjohnson · 2024-02-14T08:58:45Z

@brendanheywood I didn't get to the bottom of it before I left the OU, and I haven't looked into it since.

@sammarshallou Are you still having problems with this?

sammarshallou · 2024-02-15T09:27:45Z

@marxjohnson @brendanheywood We still have this task disabled on live server, I guess nobody has needed the status report...

I made it run on acct yesterday - it appeared normally in the 'running tasks' page, and although I had to go home before it finished, it did complete after just over 2 hours, and when I look now it's not still showing on the running tasks page or anything like that. The log from task logs is like this:

Execute scheduled task: Object status report generator task (tool_objectfs\task\generate_status_report)
... started 16:30:10. Current memory use 7.7 MB.
... used 58 dbqueries
... used 7607.4464600086 seconds
Scheduled task complete: Object status report generator task (tool_objectfs\task\generate_status_report)

One strange thing is that we have custom cron logs for each cron runner, which I still use because you can just reload them to monitor progress during a run not after. That log should have a duplicate of this, but it is cut off after the start:

Execute scheduled task: Object status report generator task (tool_objectfs\task\generate_status_report)
... started 14:17:02. Current memory use 4.7 MB.

I can't really understand why this log file would get cut off given that the process obviously didn't crash, but anyway, it's presumably something to do with our infrastructure and not indicating any problem with the task.

So in summary, it would be nice if the task didn't take 2 hours to run obviously, but other than that it looks OK,

brendanheywood · 2024-02-19T06:34:16Z

ok sounds like there is a few things and this issue should be split up. The issue with generate_status_report due to the sql already has a few issues elsewhere like #596

get_filedir_count should be small, certainly not millions of files. This can depend on what the settings are, like if a large threshold is set for the size of files to be moved to object storage. Is this set high?

sammarshallou · 2024-02-19T10:15:07Z

The size is default, 10240. I checked a couple of random filedir directories /xx/yy and they both had approx. 20 files in, almost all of which were < that size so I think it's working. So * the 64K directories would give about 1.3 million files total in filedir. So it's not 'millions' but it is a million.

When I mentioned this task in a standup, the developer who knows about infrastructure said it was expensive to run frequently as well due to AWS storage costs or something (I'm not sure if he's right, it's possible he might be thinking of a different task, this was just a quick chat) - anyway we are cool with leaving it disabled and running manually only if required, so it's really ok for us that it takes 2 hours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

location_report_builder getting stuck on get_filedir_count() #417

location_report_builder getting stuck on get_filedir_count() #417

marxjohnson commented Apr 23, 2021

marxjohnson commented Apr 23, 2021

brendanheywood commented Feb 14, 2024

marxjohnson commented Feb 14, 2024

sammarshallou commented Feb 15, 2024

brendanheywood commented Feb 19, 2024

sammarshallou commented Feb 19, 2024

location_report_builder getting stuck on get_filedir_count() #417

location_report_builder getting stuck on get_filedir_count() #417

Comments

marxjohnson commented Apr 23, 2021

marxjohnson commented Apr 23, 2021

brendanheywood commented Feb 14, 2024

marxjohnson commented Feb 14, 2024

sammarshallou commented Feb 15, 2024

brendanheywood commented Feb 19, 2024

sammarshallou commented Feb 19, 2024