Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      5.4k8820Updated Jan 28, 2025Jan 28, 2025
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      5.4k002Updated Jan 27, 2025Jan 27, 2025
    • The driver for LMCache core to run in vLLM
      Python
      Apache License 2.0
      13000Updated Jan 24, 2025Jan 24, 2025
    • LMCache

      Public
      ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
      Python
      Apache License 2.0
      40000Updated Jan 24, 2025Jan 24, 2025
    • Python
      7000Updated Jan 23, 2025Jan 23, 2025
    • Python
      Apache License 2.0
      15000Updated Jan 22, 2025Jan 22, 2025
    • kvpress

      Public
      LLM KV cache compression made easy
      Python
      Apache License 2.0
      24000Updated Jan 21, 2025Jan 21, 2025
    • litellm

      Public
      Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
      Python
      Other
      2k000Updated Jan 13, 2025Jan 13, 2025
    • Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
      C++
      Other
      142000Updated Dec 20, 2024Dec 20, 2024
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      145000Updated Dec 16, 2024Dec 16, 2024
    • ROCm Implementation of torchac_cuda from LMCache
      Cuda
      1000Updated Dec 16, 2024Dec 16, 2024
    • etalon

      Public
      LLM Serving Performance Evaluation Harness
      Python
      Apache License 2.0
      8000Updated Dec 16, 2024Dec 16, 2024
    • Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
      Python
      MIT License
      123001Updated Dec 7, 2024Dec 7, 2024
    • Efficient Triton Kernels for LLM Training
      Python
      BSD 2-Clause "Simplified" License
      249000Updated Dec 6, 2024Dec 6, 2024
    • Efficient LLM Inference over Long Sequences
      Python
      Apache License 2.0
      17000Updated Nov 29, 2024Nov 29, 2024
    • JamAIBase

      Public
      The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
      Python
      Apache License 2.0
      2579310Updated Nov 29, 2024Nov 29, 2024
    • A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
      Python
      4000Updated Nov 24, 2024Nov 24, 2024
    • ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
      Cuda
      Apache License 2.0
      54100Updated Nov 21, 2024Nov 21, 2024
    • Go ahead and axolotl questions
      Python
      Apache License 2.0
      931000Updated Nov 16, 2024Nov 16, 2024
    • Typescript Documentation of JamAISDK
      HTML
      0000Updated Nov 14, 2024Nov 14, 2024
    • skypilot

      Public
      SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
      Python
      Apache License 2.0
      553000Updated Nov 7, 2024Nov 7, 2024
    • This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
      Shell
      Apache License 2.0
      0100Updated Oct 26, 2024Oct 26, 2024
    • ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
      Python
      BSD 3-Clause "New" or "Revised" License
      1.4k000Updated Oct 26, 2024Oct 26, 2024
    • A Python client for the Unstructured hosted API
      Python
      MIT License
      17001Updated Oct 14, 2024Oct 14, 2024
    • EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
      Python
      12972Updated Oct 6, 2024Oct 6, 2024
    • Go
      1000Updated Sep 26, 2024Sep 26, 2024
    • PowerToys

      Public
      Windows system utilities to maximize productivity
      C#
      MIT License
      6.7k000Updated Aug 9, 2024Aug 9, 2024
    • Arena-Hard-Auto: An automatic LLM benchmark.
      Jupyter Notebook
      Apache License 2.0
      89000Updated Jul 15, 2024Jul 15, 2024
    • Python
      Apache License 2.0
      138000Updated Jul 11, 2024Jul 11, 2024
    • Python
      Apache License 2.0
      54000Updated Jul 9, 2024Jul 9, 2024