Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

077 autoquant gpt fast #361

Merged
merged 8 commits into from
Jun 21, 2024
Merged

077 autoquant gpt fast #361

merged 8 commits into from
Jun 21, 2024

Commits on Jun 18, 2024

  1. fixing peak memory stats for benchmark

    Summary: we were hitting the peak upon model load, not during model
    runtime, this is an issue since users can load model to cpu/meta which
    significantly reduces mem usage during model load/quant.
    
    Test Plan: sh benchmarks.sh
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    22d5574 View commit details
    Browse the repository at this point in the history
  2. Autoquantization work for benchmarks

    Summary:
    
    autoquant wasn't working for llama benchmarks for a few reasons the main
    one being that we were doing logging on prefill not decode_one_token. We
    also weren't torch.compiling prefill which obviated the whole point of
    autoquant benchmarking torch.compiled prefill shapes.
    
    To fix this, new functionality was needed for autoquant, we needed an
    option to not automatically end logging upon a single instance of
    model.forward. The flag manual_do_autoquant now controls whether you
    manually have to call model.do_autoquant() after logging is done, or
    whether it happens automatically after a model forward run.
    
    a few small other fixes were also made:
    1) updated where generate.py resets cuda memory so as to not confound
       with torch.compilation memory usage
    2) README updated with new numbers
    3) better autoquant docstring
    5) reordered benchmarks so they match whats in the README
    
    Test Plan: sh benchmarks.sh
    
    python test_integration.py -k "test_autoquant_manual"
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    d472855 View commit details
    Browse the repository at this point in the history
  3. updating api name and improving docstrings

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    9717087 View commit details
    Browse the repository at this point in the history
  4. oops missed a few manual_do_autoquant -> manual

    Summary:
    
    Test Plan:
    sh benchmarks.sh
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    8f1ba0a View commit details
    Browse the repository at this point in the history
  5. fix forward_log_only

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    f45d9d8 View commit details
    Browse the repository at this point in the history
  6. improving test conditions

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 18, 2024
    Configuration menu
    Copy the full SHA
    8b29cd5 View commit details
    Browse the repository at this point in the history

Commits on Jun 19, 2024

  1. fixing nits

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 19, 2024
    Configuration menu
    Copy the full SHA
    b3d9816 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. final tests and change do_autoquant to finalize_autoquant

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    HDCharles committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    c16593e View commit details
    Browse the repository at this point in the history