-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
077 autoquant gpt fast #361
Commits on Jun 18, 2024
-
fixing peak memory stats for benchmark
Summary: we were hitting the peak upon model load, not during model runtime, this is an issue since users can load model to cpu/meta which significantly reduces mem usage during model load/quant. Test Plan: sh benchmarks.sh Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 22d5574 - Browse repository at this point
Copy the full SHA 22d5574View commit details -
Autoquantization work for benchmarks
Summary: autoquant wasn't working for llama benchmarks for a few reasons the main one being that we were doing logging on prefill not decode_one_token. We also weren't torch.compiling prefill which obviated the whole point of autoquant benchmarking torch.compiled prefill shapes. To fix this, new functionality was needed for autoquant, we needed an option to not automatically end logging upon a single instance of model.forward. The flag manual_do_autoquant now controls whether you manually have to call model.do_autoquant() after logging is done, or whether it happens automatically after a model forward run. a few small other fixes were also made: 1) updated where generate.py resets cuda memory so as to not confound with torch.compilation memory usage 2) README updated with new numbers 3) better autoquant docstring 5) reordered benchmarks so they match whats in the README Test Plan: sh benchmarks.sh python test_integration.py -k "test_autoquant_manual" Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for d472855 - Browse repository at this point
Copy the full SHA d472855View commit details -
updating api name and improving docstrings
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 9717087 - Browse repository at this point
Copy the full SHA 9717087View commit details -
oops missed a few manual_do_autoquant -> manual
Summary: Test Plan: sh benchmarks.sh Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 8f1ba0a - Browse repository at this point
Copy the full SHA 8f1ba0aView commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for f45d9d8 - Browse repository at this point
Copy the full SHA f45d9d8View commit details -
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for 8b29cd5 - Browse repository at this point
Copy the full SHA 8b29cd5View commit details
Commits on Jun 19, 2024
-
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for b3d9816 - Browse repository at this point
Copy the full SHA b3d9816View commit details
Commits on Jun 20, 2024
-
final tests and change do_autoquant to finalize_autoquant
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Configuration menu - View commit details
-
Copy full SHA for c16593e - Browse repository at this point
Copy the full SHA c16593eView commit details