We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I cannot get benchmarks running in k8s. I suspect that too many tasks are being scheduled in parallel.
I added resource constraints in the code:
@ray.remote(num_cpus=1) def execute_query_stage( ... @ray.remote(num_cpus=1) def execute_query_partition(
I am running the benchmark with
RAY_ADDRESS='http://localhost:8265' ray job submit --working-dir `pwd` -- python3 tpcbench.py --benchmark tpch --queries /home/ray/datafusion-benchmarks/tpch/queries/ --data /mnt/bigdata/tpch/sf100 --concurrency 4
My cluster definition is:
apiVersion: ray.io/v1alpha1 kind: RayCluster metadata: name: datafusion-ray-cluster spec: headGroupSpec: rayStartParams: num-cpus: "0" template: spec: containers: - name: ray-head image: andygrove/datafusion-ray-tpch:latest imagePullPolicy: Always resources: limits: cpu: 2 memory: 8Gi requests: cpu: 2 memory: 8Gi volumeMounts: - mountPath: /mnt/bigdata # Mount path inside the container name: ray-storage volumes: - name: ray-storage persistentVolumeClaim: claimName: ray-pvc # Reference the PVC name here workerGroupSpecs: - replicas: 2 groupName: "datafusion-ray" rayStartParams: num-cpus: "4" template: spec: containers: - name: ray-worker image: andygrove/datafusion-ray-tpch:latest imagePullPolicy: Always resources: limits: cpu: 5 memory: 64Gi requests: cpu: 5 memory: 64Gi volumeMounts: - mountPath: /mnt/bigdata name: ray-storage volumes: - name: ray-storage persistentVolumeClaim: claimName: ray-pvc
I build my image with this Dockerfie, which extends the datafusion-ray image built from the repo.
FROM andygrove/datafusion-ray RUN sudo apt update && \ sudo apt install -y git RUN git clone https://github.com/apache/datafusion-benchmarks.git
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I cannot get benchmarks running in k8s. I suspect that too many tasks are being scheduled in parallel.
I added resource constraints in the code:
I am running the benchmark with
My cluster definition is:
I build my image with this Dockerfie, which extends the datafusion-ray image built from the repo.
The text was updated successfully, but these errors were encountered: