New Features

JetStream Checkpoint Converter Support: Added support for JetStream checkpoint conversion for Llama models on MaxText. (#840)
Automatic Deployment of CMSA and Autoscaling Config: Enabled automatic deployment of the Custom Metrics Stackdriver Adapter (CMSA) and autoscaling configurations for custom metrics with vLLM. (#825)
Network Optimization DaemonSet: Introduced a new DaemonSet that applies network optimizations to improve performance, including IRQ spreading, TCP settings, and larger GVE driver packet buffers. (#805)
Server-Side Metrics Scraping: Added initial implementation for scraping server-side metrics for analysis, with support for vLLM and Jetstream. (#804)
Pod Label Copying to Node: The TPU Provisioner can now be configured to copy specific Pod labels to Node labels at Node Pool creation time. (#788)
Configurable Prompt Dataset: The prompt dataset is now configurable, allowing you to customize the prompts used for benchmarking and analysis. (#844)

Improvements

Benchmarking Script Enhancements:
- The benchmarking script now uses data gathered directly from the script instead of relying on Prometheus, resulting in more accurate and user-relevant metrics. (#836)
- Added request_rate to the summary statistics generated by the benchmarking script. (#837)
- Made the benchmark time configurable and increased the default time to 2 minutes for improved steady-state analysis. (#833)
- Included additional metrics in the Load Profile Generator (LPG) script output for more comprehensive analysis. (#832)
- Ensured that the LPG script output can be collected by changing the LPG to a Deployment and enabling --save-json-results by default. (#811)
MLFlow Fixes: Resolved issues with multiple experiment versions, duplicate model registrations, and missing system metrics in multi-node scenarios. (#813)

Fixed Internal Links: Updated internal links to publicly accessible Cloud Console links. (#843)
Removed Unavailable Jetstream Metrics: Removed unavailable Jetstream metrics from the monitoring system. (#841)
Fixed Throughput Metric: Corrected the throughput metric to be in output tokens per second. (#839)
Fixed Missing JSON Fields: Added missing JSON fields to the benchmarking script output. (#835)
Fixed Single GPU Training Job Example: Corrected an issue in the single GPU training job example where the model directory was not being created. (#831)
Re-enabled CI and Fixed Flaky Tests: Re-enabled continuous integration and addressed issues with OOMKill in the fuse sidecar and database connection flakiness. (#827)
Removed GKE Cluster Version: Removed the specified GKE cluster version to allow the Terraform configuration to use the latest version in the REGULAR channel. (#817)
Updated Pause Container Image Location: Updated the location of the pause container image to a more stable and accessible source. (#814)
Upgraded GKE Module: Upgraded the GKE module to avoid compatibility issues with Terraform Provider Google v5.44.0 on Autopilot. (#806)

Moved the ml-platform to the GoogleCloudPlatform/accelerated-platforms repository (#828) and included it as a submodule (#829)
Added a tutorial for installing KServe on GKE Autopilot. (#812)
Removed unneeded dependencies in the LPG container image. (#830)