创建GIE不成功,而且有WARNING][coordinator:297]: Connect to analytical engine failed, engine may not started or closed. code: UNAVAILABLE, details: failed to connect to all addresses #819
Replies: 3 comments 5 replies
-
Hi, @sunnyshark2018
This warning is expected as GAE engine may takes some time to launch. After that, line About the failure of launching GIE, could you give more details such as the schema of datasets, and how to deploy your kubernetes cluster, thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi, bro, these two problems has tortured me for two days, sloved now? |
Beta Was this translation helpful? Give feedback.
-
手动在coordinator的容器上,执行命令拉起GIE,是可以成功的。 可以看到GIE的excutor在两个pod上拉起来了。 |
Beta Was this translation helpful? Give feedback.
-
1、创建interactive报错,无法创建GIE:
interactive = sess.gremlin(g)
2021-09-15 18:04:46,253 [INFO][cluster:375]: Create GIE instance with command: /opt/graphscope/bin/giectl create_gremlin_instance_on_k8s /tmp/gs/fwkpmu/session_fsnvsldl 243021195744897 /tmp/graph_P07U20o0.json gs-engine-fwkpmu-hnbk4,gs-engine-fwkpmu-mwn9j engine gaia.engine.port:40476 False coordinator-fwkpmu
2021-09-15 06:05:46,371 [ERROR][rpc:232]: Runstep failed with code: INTERACTIVE_ENGINE_INTERNAL_ERROR, message: Error occurred during preprocessing, The traceback is: Traceback (most recent call last):
File "/home/graphscope/.local/lib/python3.6/site-packages/gscoordinator/coordinator.py", line 676, in _create_interactive_instance
outs, errs = proc.communicate(timeout=60)
File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib64/python3.6/subprocess.py", line 1535, in _communicate
self._check_timeout(endtime, orig_timeout)
File "/usr/lib64/python3.6/subprocess.py", line 891, in _check_timeout
raise TimeoutExpired(self.args, orig_timeout)
subprocess.TimeoutExpired: Command '['/opt/graphscope/bin/giectl', 'create_gremlin_instance_on_k8s', '/tmp/gs/fwkpmu/session_fsnvsldl', '243021195744897', '/tmp/graph_P07U20o0.json', 'gs-engine-fwkpmu-hnbk4,gs-engine-fwkpmu-mwn9j', 'engine', 'gaia.engine.port:40476', 'False', 'coordinator-fwkpmu']' timed out after 60 seconds
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/graphscope/.local/lib/python3.6/site-packages/gscoordinator/coordinator.py", line 512, in RunStep
request.session_id, dag_def, op_results
File "/home/graphscope/.local/lib/python3.6/site-packages/gscoordinator/coordinator.py", line 428, in run_on_interactive_engine
op_result = self._create_interactive_instance(op)
File "/home/graphscope/.local/lib/python3.6/site-packages/gscoordinator/coordinator.py", line 731, in _create_interactive_instance
raise RuntimeError("Create interactive instance failed.") from e
RuntimeError: Create interactive instance failed.
2、创建session总报错
sess = graphscope.session(k8s_coordinator_cpu=2, k8s_coordinator_mem="4Gi", k8s_volumes=k8s_volumes)
2021-09-15 05:52:14,915 [INFO][session:640]: Initializing graphscope session with parameters: {'addr': None, 'mode': 'eager', 'cluster_type': 'k8s', 'num_workers': 2, 'preemptive': True, 'k8s_namespace': None, 'k8s_service_type': 'NodePort', 'k8s_gs_image': 'registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.7.0', 'k8s_etcd_image': 'quay.io/coreos/etcd:v3.4.13', 'k8s_image_pull_policy': 'IfNotPresent', 'k8s_image_pull_secrets': [], 'k8s_coordinator_cpu': 2, 'k8s_coordinator_mem': '4Gi', 'k8s_etcd_num_pods': 1, 'k8s_etcd_cpu': 1.0, 'k8s_etcd_mem': '512Mi', 'k8s_vineyard_daemonset': 'none', 'k8s_vineyard_cpu': 0.2, 'k8s_vineyard_mem': '512Mi', 'vineyard_shared_mem': '4Gi', 'k8s_engine_cpu': 0.2, 'k8s_engine_mem': '1Gi', 'k8s_mars_worker_cpu': 0.2, 'k8s_mars_worker_mem': '512Mi', 'k8s_mars_scheduler_cpu': 0.2, 'k8s_mars_scheduler_mem': '512Mi', 'with_mars': False, 'enable_gaia': False, 'reconnect': False, 'k8s_volumes': {'data': {'type': 'hostPath', 'field': {'path': '/kg/test_data/', 'type': 'Directory'}, 'mounts': {'mountPath': '/home/graphscope'}}}, 'k8s_waiting_for_delete': False, 'timeout_seconds': 600, 'dangling_timeout_seconds': 600, 'k8s_client_config': {}}
2021-09-15 05:52:19,660 [INFO][cluster:308]: Launching coordinator...
2021-09-15 05:52:26,555 [INFO][utils:167]: coordinator-fwkpmu-6f684945f8-ckcht: Successfully assigned gs-gmknss/coordinator-fwkpmu-6f684945f8-ckcht to k8s-master
2021-09-15 05:52:59,975 [INFO][utils:167]: coordinator-fwkpmu-6f684945f8-ckcht: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.7.0" already present on machine
2021-09-15 05:53:25,171 [INFO][utils:167]: coordinator-fwkpmu-6f684945f8-ckcht: Created container coordinator
2021-09-15 05:53:29,419 [INFO][utils:167]: coordinator-fwkpmu-6f684945f8-ckcht: Started container coordinator
2021-09-15 17:53:37,173 [INFO][cluster:614]: Launching etcd ...
2021-09-15 05:53:41,689 [INFO][utils:167]: coordinator-fwkpmu-6f684945f8-ckcht: Readiness probe failed: dial tcp 10.244.0.43:59990: connect: connection refused
2021-09-15 17:53:43,796 [INFO][cluster:817]: Etcd is ready, endpoint is 10.101.120.243:58729
2021-09-15 17:53:43,797 [INFO][cluster:820]: Creating interactive engine service...
2021-09-15 17:53:43,797 [INFO][cluster:766]: Launching zetcd proxy service ...
2021-09-15 17:53:43,799 [INFO][cluster:781]: zetcd cmd /usr/local/bin/zetcd --zkaddr 0.0.0.0:2181 --endpoints http://gs-etcd-service-fwkpmu:58729,http://gs-etcd-fwkpmu-0:58729
Running zetcd proxy
Version: Version not provided (use make instead of go build)
SHA: SHA not provided (use make instead of go build)
2021-09-15 17:53:44,839 [INFO][cluster:810]: ZEtcd is ready, endpoint is 10.244.0.43:2181
2021-09-15 17:53:44,840 [INFO][cluster:828]: Creating engine replicaset...
2021-09-15 17:53:44,841 [INFO][cluster:505]: Launching GraphScope engines pod ...
2021-09-15 17:53:56,217 [INFO][cluster:879]: [gs-engine-fwkpmu-hnbk4]: Successfully assigned gs-gmknss/gs-engine-fwkpmu-hnbk4 to k8s-master
2021-09-15 17:53:57,097 [INFO][cluster:879]: [gs-engine-fwkpmu-mwn9j]: Successfully assigned gs-gmknss/gs-engine-fwkpmu-mwn9j to k8s-master
2021-09-15 17:54:55,970 [INFO][cluster:879]: [gs-engine-fwkpmu-mwn9j]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.7.0" already present on machine
2021-09-15 17:55:14,603 [INFO][cluster:879]: [gs-engine-fwkpmu-hnbk4]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.7.0" already present on machine
2021-09-15 17:55:32,688 [INFO][cluster:879]: [gs-engine-fwkpmu-mwn9j]: Created container engine
2021-09-15 17:55:45,774 [INFO][cluster:879]: [gs-engine-fwkpmu-mwn9j]: Started container engine
2021-09-15 17:55:52,243 [INFO][cluster:879]: [gs-engine-fwkpmu-hnbk4]: Created container engine
2021-09-15 17:56:01,819 [INFO][cluster:879]: [gs-engine-fwkpmu-hnbk4]: Started container engine
2021-09-15 17:56:22,701 [INFO][cluster:879]: [gs-engine-fwkpmu-mwn9j]: Created container vineyard
2021-09-15 17:56:35,629 [INFO][cluster:879]: [gs-engine-fwkpmu-mwn9j]: Started container vineyard
2021-09-15 17:56:39,644 [INFO][cluster:879]: [gs-engine-fwkpmu-hnbk4]: Created container vineyard
2021-09-15 17:56:39,648 [INFO][cluster:879]: [gs-engine-fwkpmu-hnbk4]: Started container vineyard
2021-09-15 17:56:50,550 [INFO][cluster:915]: GraphScope engines pod is ready.
2021-09-15 17:56:51,283 [INFO][cluster:1052]: Engines pod name list: ['gs-engine-fwkpmu-hnbk4', 'gs-engine-fwkpmu-mwn9j']
2021-09-15 17:56:51,283 [INFO][cluster:1053]: Engines pod ip list: ['10.244.0.45', '10.244.0.44']
2021-09-15 17:56:51,284 [INFO][cluster:1054]: Engines pod host ip list: ['192.168.3.10', '192.168.3.10']
2021-09-15 17:56:51,284 [INFO][cluster:1056]: Vineyard service endpoint: 192.168.3.10:30794
2021-09-15 17:56:51,284 [INFO][cluster:941]: Starting GAE rpc service on 10.244.0.45:56032 ...
2021-09-15 17:56:57,074 [INFO][coordinator:1395]: Coordinator server listen at 0.0.0.0:59990
2021-09-15 05:57:06,978 [INFO][cluster:556]: Coordinator pod start successful with address 192.168.3.10:30184, connecting to service ...
2021-09-15 05:57:07,014 [WARNING][rpc:124]: Heart beat analytical engine failed, code: DEADLINE_EXCEEDED, details: Connect to analytical engine failed, engine may not started or closed. code: UNAVAILABLE, details: failed to connect to all addresses
2021-09-15 05:57:08,031 [WARNING][rpc:124]: Heart beat analytical engine failed, code: DEADLINE_EXCEEDED, details: Connect to analytical engine failed, engine may not started or closed. code: UNAVAILABLE, details: failed to connect to all addresses
Beta Was this translation helpful? Give feedback.
All reactions