Skip to content

nod-ai/shark-dashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

ML Compiler Build Dashboard

A real-time build monitoring system for ML compiler projects using secure WebSocket connections. Track build status, failures, and progress across torch-mlir, ieee-mlir, and LLVM-MLIR projects through a unified interface.

Features

  • πŸ”„ Real-time WebSocket-based build monitoring
  • πŸ”’ Secure bidirectional communication (no webhooks)
  • πŸ—οΈ ML compiler project support:
    • torch-mlir
    • ieee-mlir
    • LLVM-MLIR
  • πŸ“Š Build metrics and failure analysis
  • πŸ“¦ S3-based artifact management
  • πŸ“§ Email notifications via SendGrid
  • πŸ” JWT-based authentication
  • πŸ“ Comprehensive build logs
  • πŸ“ˆ Historical data analysis

Architecture

The system uses a WebSocket-based architecture for real-time communication:

Prerequisites

System Requirements

  • Python 3.8+
  • Node.js 18+
  • MySQL
  • AWS Account

Dependencies

# Backend
pip install flask flask-cors gitpython pymongo boto3 python-jose[cryptography] 

# Frontend
npm install aws-sdk bcrypt jsonwebtoken

Configuration

Environment Setup

Create a .env file:

# WebSocket Configuration
WS_ENDPOINT=wss://your-api-gateway-url
WS_REGION=us-east-1

# Database
MYSQL_URI=mongodb://localhost:27017/
DB_NAME=build_dashboard

# AWS Services
AWS_ACCESS_KEY=your_key
AWS_SECRET_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET=build-artifacts

# Authentication
JWT_SECRET_KEY=your_jwt_secret
JWT_ALGORITHM=HS256

# Notifications
SECRET_API_KEY=your_enter_key
NOTIFICATION_EMAIL=[email protected]

Project Configuration

Create config.yaml:

projects:
  torch-mlir:
    repo_url: https://github.com/llvm/torch-mlir
    build_command: python setup.py build
    build_dir: ./torch-mlir-build
    notification_emails:
      - [email protected]
    websocket:
      reconnect_attempts: 5
      reconnect_interval: 1000
    
  ieee-mlir:
    repo_url: https://github.com/ieee-mlir/ieee-mlir
    build_command: cmake . && make
    build_dir: ./ieee-mlir-build
    websocket:
      reconnect_attempts: 3
      reconnect_interval: 2000

  llvm-mlir:
    repo_url: https://github.com/llvm/llvm-project
    build_command: |
      cmake -G Ninja ../llvm \
        -DLLVM_ENABLE_PROJECTS=mlir \
        -DLLVM_BUILD_EXAMPLES=ON && ninja
    build_dir: ./llvm-mlir-build

Implementation

Build Agent Setup

from build_dashboard import BuildAgent

agent = BuildAgent(
    project_name="torch-mlir",
    api_key="your_api_key",
    ws_endpoint="wss://your-api-gateway-url"
)

@agent.on_build_start
def handle_build_start(build_id):
    print(f"Build {build_id} started")

@agent.on_build_complete
def handle_build_complete(build_id, status):
    print(f"Build {build_id} completed with status: {status}")

agent.start()

WebSocket Message Protocol

// Build Event Message
interface BuildMessage {
    type: 'BUILD_START' | 'BUILD_UPDATE' | 'BUILD_COMPLETE';
    buildId: string;
    project: string;
    data: {
        status: string;
        progress?: number;
        metrics?: BuildMetrics;
        error?: string;
    };
    timestamp: number;
}

// Subscription Message
interface SubscriptionMessage {
    type: 'SUBSCRIBE';
    projects: string[];
    events: string[];
}

API Routes

// WebSocket Routes
const routes = {
    $connect: handleConnect,
    $disconnect: handleDisconnect,
    build_update: handleBuildUpdate,
    subscribe: handleSubscribe
};

// REST API Routes
app.get('/api/builds', listBuilds);
app.get('/api/builds/:id', getBuildDetails);
app.get('/api/builds/:id/logs', getBuildLogs);
app.get('/api/builds/:id/artifacts', listArtifacts);
app.post('/api/auth/login', login);

Development

Local Setup

  1. Start local services:
docker-compose up -d mysql
  1. Install dependencies:
pip install -r requirements.txt
npm install
  1. Run the development server:
python server.py
  1. Start the dashboard:
npm run dev

Testing

# Backend tests
python -m pytest

# Frontend tests
npm run test

Deployment

AWS Deployment

  1. Deploy infrastructure:
terraform init
terraform apply
  1. Configure API Gateway:
aws apigateway create-websocket-api \
  --name "BuildDashboardAPI" \
  --protocol-type WEBSOCKET
  1. Deploy application:
./deploy.sh

Monitoring

CloudWatch Metrics

  • WebSocket connection count
  • Message processing latency
  • Build duration
  • Error rates

Logging

  • Build logs in CloudWatch
  • Agent connection logs
  • Build state transitions

Security

WebSocket Security

  • API key authentication
  • JWT for client connections
  • Message validation
  • Rate limiting

Data Protection

  • In-transit encryption (WSS)
  • At-rest encryption (S3/DynamoDB)
  • Access logging

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add feature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • pytorch HUD
  • AWS WebSocket API
  • ML Compiler Communities

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published