From 33acd2c749b7bd04af186da6d0c84bb6c46a0751 Mon Sep 17 00:00:00 2001 From: Jarrod Barnes Date: Tue, 19 Nov 2024 07:30:21 -0500 Subject: [PATCH] Fix: updated README --- LICENSE | 21 +++++++++++++++++++++ README.md | 21 ++++++++++++++------- 2 files changed, 35 insertions(+), 7 deletions(-) create mode 100644 LICENSE diff --git a/LICENSE b/LICENSE new file mode 100644 index 000000000..6b06f5db4 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2024 Jarrod Barnes + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md index ed261fb02..b37fc9ecf 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,12 @@ -# MLX Distributed Training +# MLX Distributed Training (Beta) -A high-performance distributed training framework for MLX that enables efficient AI model training across multiple Apple Silicon devices. +A privacy-first distributed training framework built on MLX for Apple Silicon, enabling secure and efficient AI model training across multiple devices while preserving data privacy. ![MLX Version](https://img.shields.io/badge/MLX-%3E%3D0.20.0-blue) -![Python](https://img.shields.io/badge/Python-%3E%3D3.12-blue) +![Python](https://img.shields.io/badge/Python-%3E%3D3.11-blue) ![macOS](https://img.shields.io/badge/macOS-Sonoma%2014.0%2B-brightgreen) ![License](https://img.shields.io/badge/License-MIT-green) +![Status](https://img.shields.io/badge/Status-Beta-yellow) ![Tests](https://img.shields.io/badge/Tests-Passing-success) ## What We're Building @@ -19,12 +20,19 @@ We're training a decoder-only transformer model from scratch, optimized for Appl - 8192 max sequence length - Training optimizations: Flash Attention, Grouped Query Attention (GQA), RoPE embeddings, SwiGLU activations -- **Goal**: Train a competitive 1B parameter model that can match or exceed Llama 3.2's performance using distributed consumer hardware instead of traditional GPU clusters. Overall, we're aming to push the boundaries of what's possible with Apple Silicon and see how preformance scales with increasing model size on consumer hardware. +- **Goal**: Train a competitive 1B parameter model that can match or exceed Llama 3.2's performance using distributed consumer hardware instead of traditional GPU clusters. Overall, we're aiming to push the boundaries of what's possible with Apple Silicon and see how performance scales with increasing model size on consumer hardware. ## System Architecture [![](https://mermaid.ink/img/pako:eNqlVmtv0zAU_SuWkfiUodlpWRchJGjHQ2rWinYCke6DGzutIbEjx2EbY_-dm1eTRl2LRKSqiXPO9fU959p5xKHmAnt4Y1i6Rcv3K4XgyvJ1NTA3MmHmAV0DqnpVXMtxsDRMKqk2aKy14VIxq80tOjt7i_58VtJKFsvfIvuDJksSTGRmjVznVvCSJsxtGwsAFe0mK_C-TwJfJBrm9Jlim2PY6zEJroW90-YnpJEkuZJhmcezjOWCtJkvwq3geXxshvmEBHNhIm0SpkKBJizbrjUz_HnK-IoEY62sVLnOM3T1i8V5J6mWt6vxhFmG5jIVMZQGkRZRRod6lICmGtVcC2sES4r6Tuv3U8343lJK9rTO7YMQvFKjBQjF24cDefmgeVwUNtVKKJv1UyuUmn6rYHVaH7S5g-qgOcv6sxXXjASz1MoErNEs5CblzJbCH8qsvOlZciFCrXhjyqwrBD1ktX130WPuovvuoqfdRffdRY-766T4tC8-PSE-PSo-7YlP_0P8XmpFIU-I32PM6BHx6XHxodPRm5Ly0TAuISG0eFBhKVIf8VXIzdai9wZqErLMdkA9J_kadiptCrVeoqJnjY7bPKD3qyx9AZYKiyU13X9yT4BNoOLW7S-16tCf93Zb7zYWl0aERQQ0_dKO-n7fyHWy5RMk-3F-E8APVahObtfjw7au-I1ERQNcLXdIsHUUybATZrk45PUqyDv-I88shPg0Dz49pMKkzLBEWGGybvf4x80N70pAJuztXsnCGBw2ERFK69MpknHsvYguIwe6X_8U3gvXdev7szvJ7daj6X2HmO32kIq6Xv8zNWwkqqk84v9KTVq_1Rmf5NZsOHEd2E0dOB0dcLoD55gD_nTAZw6cEA7s8w78zUhTkoYHXQgkCiQKJApg-E1hjDoz2pahgfs-IAEIuHahu1iNf4upO4vBDk4ENIPk8B3xWCi1wnYrErHCHtxyEbE8tiu8Uk8AZbnVRe9iz5pcONjofLPFXsTiDJ7yckeYSAb9kOxGU6a-a500FHjE3iO-x94ZIa_OL1w6uiCEjF7T4YA4-KEYH8H4-eXr0eXFcDAauqPhk4N_lzHIK-IS9-J8OBgMXHd46Y4cLHixFr_6FCq_iJ7-Aiso0mw)](https://mermaid.live/edit#pako:eNqlVmtv0zAU_SuWkfiUodlpWRchJGjHQ2rWinYCke6DGzutIbEjx2EbY_-dm1eTRl2LRKSqiXPO9fU959p5xKHmAnt4Y1i6Rcv3K4XgyvJ1NTA3MmHmAV0DqnpVXMtxsDRMKqk2aKy14VIxq80tOjt7i_58VtJKFsvfIvuDJksSTGRmjVznVvCSJsxtGwsAFe0mK_C-TwJfJBrm9Jlim2PY6zEJroW90-YnpJEkuZJhmcezjOWCtJkvwq3geXxshvmEBHNhIm0SpkKBJizbrjUz_HnK-IoEY62sVLnOM3T1i8V5J6mWt6vxhFmG5jIVMZQGkRZRRod6lICmGtVcC2sES4r6Tuv3U8343lJK9rTO7YMQvFKjBQjF24cDefmgeVwUNtVKKJv1UyuUmn6rYHVaH7S5g-qgOcv6sxXXjASz1MoErNEs5CblzJbCH8qsvOlZciFCrXhjyqwrBD1ktX130WPuovvuoqfdRffdRY-766T4tC8-PSE-PSo-7YlP_0P8XmpFIU-I32PM6BHx6XHxodPRm5Ly0TAuISG0eFBhKVIf8VXIzdai9wZqErLMdkA9J_kadiptCrVeoqJnjY7bPKD3qyx9AZYKiyU13X9yT4BNoOLW7S-16tCf93Zb7zYWl0aERQQ0_dKO-n7fyHWy5RMk-3F-E8APVahObtfjw7au-I1ERQNcLXdIsHUUybATZrk45PUqyDv-I88shPg0Dz49pMKkzLBEWGGybvf4x80N70pAJuztXsnCGBw2ERFK69MpknHsvYguIwe6X_8U3gvXdev7szvJ7daj6X2HmO32kIq6Xv8zNWwkqqk84v9KTVq_1Rmf5NZsOHEd2E0dOB0dcLoD55gD_nTAZw6cEA7s8w78zUhTkoYHXQgkCiQKJApg-E1hjDoz2pahgfs-IAEIuHahu1iNf4upO4vBDk4ENIPk8B3xWCi1wnYrErHCHtxyEbE8tiu8Uk8AZbnVRe9iz5pcONjofLPFXsTiDJ7yckeYSAb9kOxGU6a-a500FHjE3iO-x94ZIa_OL1w6uiCEjF7T4YA4-KEYH8H4-eXr0eXFcDAauqPhk4N_lzHIK-IS9-J8OBgMXHd46Y4cLHixFr_6FCq_iJ7-Aiso0mw) +## Features + +- **Privacy-First**: All training happens on your devices, keeping sensitive data under your control +- **Efficient**: Optimized for Apple Silicon using MLX, enabling fast training on consumer hardware +- **Distributed**: Scale training across multiple Macs for better performance +- **Flexible**: Support for various model architectures and training configurations + ## Introduction to Distributed Training with MLX This project explores the potential of distributed training on Apple Silicon, specifically targeting the development of large language models. By leveraging [MLX's distributed communication framework](https://ml-explore.github.io/mlx/build/html/usage/distributed.html), we're pushing the boundaries of what's possible with consumer hardware. @@ -72,8 +80,7 @@ This project serves as both a practical implementation and a research platform, ### System Requirements - macOS Sonoma 14.0+ (Apple Silicon) -- Python 3.12+ -- Xcode Command Line Tools +- Python 3.11+ - MLX 0.20.0+ - High-speed network connection (10Gbps recommended) - SSH access configured between devices @@ -220,4 +227,4 @@ For detailed information about our hardware configuration, training process, and ## License -MIT License \ No newline at end of file +MIT License - See [LICENSE](LICENSE) for details. \ No newline at end of file