A flexible, extensible, and customizable pipeline.
jiaoziflow is a versatile pipeline framework designed for the concurrent preprocessing of various data types, including tables, images, videos, and text. It enables users to define and customize the behavior of each node, seamlessly integrating with Jiaozifs to unlock the potential of versioned data. jiaoziflow is built for cloud-native deployment, offering flexible scaling to handle large data volumes.
- Multi-type Data Support: Process table data, images, videos, and text.
- Concurrent Execution: Leverage parallel processing for high efficiency and scalability.
- Customizable Nodes: Users can freely define and tailor the behavior of each pipeline node.
- Jiaozifs Integration: Enhanced data versioning capabilities for more robust data management.
- Cloud-Native: Designed for easy deployment and scaling in cloud environments.
- Rust: Requires Rust 1.80.1 or higher. Install Rust
- MongoDB: Used to store runtime data. Install MongoDB
- Protobuf: Utilizes Protocol Buffers for data exchange between nodes. Install Protobuf Compiler
- Kubernetes: Relies on Kubernetes for deployment and scaling. Requires K8s 1.21 or higher. Install Kubernetes
- StorageClass: Require a storage class named
jz-flow-fs
sudo apt-get install -y protobuf-compiler pkg-config libssl-dev
git clone https://github.com/GitDataAI/jiaoziflow.git
make build-jz
# dont specify the database; it is created dynamically.
./dist/jz-flow daemon --mongo-url mongodb://<ip>:27017
./dist/jz-flow job create --name simple --path ./script/example_dag.json # Create a job and deploy all pods
./dist/jz-flow job detail <job id> # Monitor the job's details