error "SSL certificate verify failed" when downloading data with intake-esm. #361
Replies: 3 comments 8 replies
-
@claudia-wieners, Sorry to hear that you're running into issues while accessing data from google cloud storage... This issue may have to do with versions of packages in your environment. Could you share here the output of import intake_esm
print(intake_esm.show_versions()) ??? |
Beta Was this translation helpful? Give feedback.
-
I'm a visiting scientist this month at an organization that is using a self-signed certificate on Azure and hitting this issue. col = intake.open_esm_datastore("https://cmip6-pds.s3.amazonaws.com/pangeo-cmip6.json",) is failing in ClientConnectorError: Cannot connect to host cmip6-pds.s3.amazonaws.com:443 ssl:default [Connection reset by peer] I'm able to set up a connection with import ssl
import aiohttp
CA_BUNDLE="/etc/pki/tls/certs/ca-bundle.crt"
PEM_PUB="/home/la.signell/PA-RootCA-Cert-2023-Pub.pem"
ssl_ctx = ssl.create_default_context(cafile=CA_BUNDLE)
ssl_ctx.load_verify_locations(PEM_PUB) We know this certificate works because we told Firefox to trust it and then it can open the catalog. But how do we communicate this to Intake? I tried: col = intake.open_esm_datastore("https://cmip6-pds.s3.amazonaws.com/pangeo-cmip6.json",
storage_options={'ssl': ssl_ctx}) but that gave a TypeError: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File <timed exec>:4
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/intake_esm/core.py:107, in esm_datastore.__init__(self, obj, progressbar, sep, registry, read_csv_kwargs, columns_with_iterables, storage_options, **intake_kwargs)
105 self.esmcat = ESMCatalogModel.from_dict(obj)
106 else:
--> 107 self.esmcat = ESMCatalogModel.load(
108 obj, storage_options=self.storage_options, read_csv_kwargs=read_csv_kwargs
109 )
111 self.derivedcat = registry or default_registry
112 self._entries = {}
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/intake_esm/cat.py:264, in ESMCatalogModel.load(cls, json_file, storage_options, read_csv_kwargs)
262 csv_path = f'{os.path.dirname(_mapper.root)}/{cat.catalog_file}'
263 cat.catalog_file = csv_path
--> 264 df = pd.read_csv(
265 cat.catalog_file,
266 storage_options=storage_options,
267 **read_csv_kwargs,
268 )
269 else:
270 df = pd.DataFrame(cat.catalog_dict)
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
899 kwds_defaults = _refine_defaults_read(
900 dialect,
901 delimiter,
(...)
908 dtype_backend=dtype_backend,
909 )
910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/parsers/readers.py:577, in _read(filepath_or_buffer, kwds)
574 _validate_names(kwds.get("names", None))
576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
579 if chunksize or iterator:
580 return parser
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
1404 self.options["has_index_names"] = kwds["has_index_names"]
1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1661, in TextFileReader._make_engine(self, f, engine)
1659 if "b" not in mode:
1660 mode += "b"
-> 1661 self.handles = get_handle(
1662 f,
1663 mode,
1664 encoding=self.options.get("encoding", None),
1665 compression=self.options.get("compression", None),
1666 memory_map=self.options.get("memory_map", False),
1667 is_text=is_text,
1668 errors=self.options.get("encoding_errors", "strict"),
1669 storage_options=self.options.get("storage_options", None),
1670 )
1671 assert self.handles is not None
1672 f = self.handles.handle
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/common.py:716, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
713 codecs.lookup_error(errors)
715 # open URLs
--> 716 ioargs = _get_filepath_or_buffer(
717 path_or_buf,
718 encoding=encoding,
719 compression=compression,
720 mode=mode,
721 storage_options=storage_options,
722 )
724 handle = ioargs.filepath_or_buffer
725 handles: list[BaseBuffer]
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/common.py:368, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
366 # assuming storage_options is to be interpreted as headers
367 req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options)
--> 368 with urlopen(req_info) as req:
369 content_encoding = req.headers.get("Content-Encoding", None)
370 if content_encoding == "gzip":
371 # Override compression based on Content-Encoding header
File ~/miniforge3/envs/pangeo/lib/python3.10/site-packages/pandas/io/common.py:270, in urlopen(*args, **kwargs)
264 """
265 Lazy-import wrapper for stdlib urlopen, as that imports a big chunk of
266 the stdlib.
267 """
268 import urllib.request
--> 270 return urllib.request.urlopen(*args, **kwargs)
File ~/miniforge3/envs/pangeo/lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
214 else:
215 opener = _opener
--> 216 return opener.open(url, data, timeout)
File ~/miniforge3/envs/pangeo/lib/python3.10/urllib/request.py:519, in OpenerDirector.open(self, fullurl, data, timeout)
516 req = meth(req)
518 sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 519 response = self._open(req, data)
521 # post-process response
522 meth_name = protocol+"_response"
File ~/miniforge3/envs/pangeo/lib/python3.10/urllib/request.py:536, in OpenerDirector._open(self, req, data)
533 return result
535 protocol = req.type
--> 536 result = self._call_chain(self.handle_open, protocol, protocol +
537 '_open', req)
538 if result:
539 return result
File ~/miniforge3/envs/pangeo/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
494 for handler in handlers:
495 func = getattr(handler, meth_name)
--> 496 result = func(*args)
497 if result is not None:
498 return result
File ~/miniforge3/envs/pangeo/lib/python3.10/urllib/request.py:1391, in HTTPSHandler.https_open(self, req)
1390 def https_open(self, req):
-> 1391 return self.do_open(http.client.HTTPSConnection, req,
1392 context=self._context, check_hostname=self._check_hostname)
File ~/miniforge3/envs/pangeo/lib/python3.10/urllib/request.py:1348, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
1346 try:
1347 try:
-> 1348 h.request(req.get_method(), req.selector, req.data, headers,
1349 encode_chunked=req.has_header('Transfer-encoding'))
1350 except OSError as err: # timeout error
1351 raise URLError(err)
File ~/miniforge3/envs/pangeo/lib/python3.10/http/client.py:1283, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
1280 def request(self, method, url, body=None, headers={}, *,
1281 encode_chunked=False):
1282 """Send a complete request to the server."""
-> 1283 self._send_request(method, url, body, headers, encode_chunked)
File ~/miniforge3/envs/pangeo/lib/python3.10/http/client.py:1324, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
1321 encode_chunked = False
1323 for hdr, value in headers.items():
-> 1324 self.putheader(hdr, value)
1325 if isinstance(body, str):
1326 # RFC 2616 Section 3.7.1 says that text default has a
1327 # default charset of iso-8859-1.
1328 body = _encode(body, 'body')
File ~/miniforge3/envs/pangeo/lib/python3.10/http/client.py:1260, in HTTPConnection.putheader(self, header, *values)
1257 elif isinstance(one_value, int):
1258 values[i] = str(one_value).encode('ascii')
-> 1260 if _is_illegal_header_value(values[i]):
1261 raise ValueError('Invalid header value %r' % (values[i],))
1263 value = b'\r\n\t'.join(values)
TypeError: expected string or bytes-like object |
Beta Was this translation helpful? Give feedback.
-
@mgrover1 , any ideas? |
Beta Was this translation helpful? Give feedback.
-
Dear all,
I am new to intake-esm and also fairly new to python, so maybe my question is super naive, sorry for that. Answers nonetheless would be highly appreciated!
Basically following this exampe (http://gallery.pangeo.io/repos/pangeo-gallery/cmip6/global_mean_surface_temp.html), I managed to view and search metadata from the file https://storage.googleapis.com/cmip6/pangeo-cmip6.json with no problem.
However, when trying to actually download data, I get the following error:
Cannot connect to host storage.googleapis.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1123)')]
Some googling suggests that I might need to somehow tell python to accept the certificate of the data source I'm using.
I tried adding the certificate of "http://gallery.pangeo.io/repos/pangeo-gallery/cmip6/global_mean_surface_temp.html" to the cacert.pem file in /path-to-my-environment/lib/python3.8/site-packages/certifi/cacert.pem.
To get this certificate, I opened the website in firefox, clicked on the lock icon, and eventually clicked download pem chain.
However, this did not work. neither does it work to type "export REQUESTS_CA_BUNDLE=/path/to/your/certificate.pem" into a terminal, where 'path/to/your/certificate.pem' is the path to the place where I am storing the downloaded pem-file with the url's certificate.
(I used this website for reference: https://stackoverflow.com/questions/30405867/how-to-get-python-requests-to-trust-a-self-signed-ssl-certificate)
Can anyone help me with how to proceed?
Beta Was this translation helpful? Give feedback.
All reactions