Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport] fix: Prevent error page recursion (#35209) #35259

Conversation

cmltaWt0
Copy link
Contributor

@cmltaWt0 cmltaWt0 commented Aug 8, 2024

This is a backport of #35209

Fix details

We sometimes see rendering errors in the error page itself, which then cause another attempt at rendering the error page. I'm not sure exactly how the loop is occurring, but it looks something like this:

  1. An error is raised in a view or middleware and is not caught by application code
  2. Django catches the error and calls the registered uncaught error handler
  3. Our handler tries to render an error page
  4. The rendering code raises an error
  5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in a hardcoded string, we can reduce server resources, avoid logging massive sequences of recursive stack traces, and still give the user some indication that yes, there was a problem.

This should help address #35151

At least one of these rendering errors is known to be due to a translation error. There's a separate issue for restoring translation quality so that we avoid those issues in the future (openedx/openedx-translations#549) but in general we should catch all rendering errors, including unknown ones.

Testing:

  • In lms/envs/devstack.py change DEBUG to False to ensure that the usual error page is displayed (rather than the debug error page).
  • Add line 1/0 to the top of the student_dashboard function in common/djangoapps/student/views/dashboard.py to make that view error.
  • In lms/templates/static_templates/server-error.html replace static.get_platform_name() with None * 7 to make the error template itself produce an error.
  • Visit http://localhost:18000/dashboard.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k line set of stack traces and the page displays "A server error occurred. Please contact the administrator."

With the fix, the response takes less than a second and produces three stack traces (one of which contains the error page's rendering error).

Description

Describe what this pull request changes, and why. Include implications for people using this change.
Design decisions and their rationales should be documented in the repo (docstring / ADR), per
OEP-19, and can be
linked here.

Useful information to include:

  • Which edX user roles will this change impact? Common user roles are "Learner", "Course Author",
    "Developer", and "Operator".
  • Include screenshots for changes to the UI (ideally, both "before" and "after" screenshots, if applicable).
  • Provide links to the description of corresponding configuration changes. Remember to correctly annotate these
    changes.

Supporting information

Link to other information about the change, such as Jira issues, GitHub issues, or Discourse discussions.
Be sure to check they are publicly readable, or if not, repeat the information here.

Testing instructions

Please provide detailed step-by-step instructions for testing this change.

Deadline

"None" if there's no rush, or provide a specific date or event (and reason) if there is one.

Other information

Include anything else that will help reviewers and consumers understand the change.

  • Does this change depend on other changes elsewhere?
  • Any special concerns or limitations? For example: deprecations, migrations, security, or accessibility.
  • If your database migration can't be rolled back easily.

We sometimes see rendering errors in the error page itself, which then
cause another attempt at rendering the error page. I'm not sure _exactly_
how the loop is occurring, but it looks something like this:

1. An error is raised in a view or middleware and is not caught by
   application code
2. Django catches the error and calls the registered uncaught error
   handler
3. Our handler tries to render an error page
4. The rendering code raises an error
5. GOTO 2 (until some sort of server limit is reached)

By catching all errors raised during error-page render and substituting in
a hardcoded string, we can reduce server resources, avoid logging massive
sequences of recursive stack traces, and still give the user *some*
indication that yes, there was a problem.

This should help address openedx#35151

At least one of these rendering errors is known to be due to a translation
error. There's a separate issue for restoring translation quality so that
we avoid those issues in the future (openedx/openedx-translations#549)
but in general we should catch all rendering errors, including unknown
ones.

Testing:

- In `lms/envs/devstack.py` change `DEBUG` to `False` to ensure that the
  usual error page is displayed (rather than the debug error page).
- Add line `1/0` to the top of the `student_dashboard` function in
 `common/djangoapps/student/views/dashboard.py` to make that view error.
- In `lms/templates/static_templates/server-error.html` replace
  `static.get_platform_name()` with `None * 7` to make the error template
  itself produce an error.
- Visit <http://localhost:18000/dashboard>.

Without the fix, the response takes 10 seconds and produces a 6 MB, 85k
line set of stack traces and the page displays "A server error occurred.
Please contact the administrator."

With the fix, the response takes less than a second and produces three
stack traces (one of which contains the error page's rendering error).
@openedx-webhooks
Copy link

Thanks for the pull request, @cmltaWt0!

What's next?

Please work through the following steps to get your changes ready for engineering review:

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

🔘 Let us know that your PR is ready for review:

Who will review my changes?

This repository is currently maintained by @openedx/wg-maintenance-edx-platform. Tag them in a comment and let them know that your changes are ready for review.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Aug 8, 2024
@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented Aug 8, 2024

Tested locally for dev and local tutor options using the following testing steps:

- In lms/envs/devstack.py change DEBUG to False to ensure that the usual error page is displayed (rather than the debug error page).
- Add line 1/0 to the top of the student_dashboard function in common/djangoapps/student/views/dashboard.py to make that view error.
- In lms/templates/static_templates/server-error.html replace static.get_platform_name() with None * 7 to make the error template itself produce an error.
- Visit http://localhost:18000/dashboard.
image Screenshot 2024-08-09 at 00 03 21

@cmltaWt0 cmltaWt0 merged commit 3d50dd8 into openedx:open-release/redwood.master Aug 8, 2024
78 checks passed
@openedx-webhooks
Copy link

@cmltaWt0 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@OmarIthawi
Copy link
Member

@cmltaWt0 thank you! I think this error has been going on for at least 5 years without an easy way to reproduce or fix. Thanks for fixing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants