Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes and updates for metadata search #29

Merged
merged 13 commits into from
Feb 27, 2024
174 changes: 84 additions & 90 deletions src/sparc/client/services/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,6 @@
"Accept": "application/json; charset=utf-8",
}

host_api = "https://scicrunch.org/api/1/elastic"

scicrunch_api_key: str = None
profile_name: str = None

Expand Down Expand Up @@ -123,103 +121,99 @@
# Function to GET content from URL with retries
def getURL(self, url, headers="NONE"):
result = "[ERROR]"
url_session = requests.Session()

Check warning on line 124 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:124:- src/sparc/client/services/metadata.py:125:- with requests.Session() as url_session:

Check warning on line 124 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / runner / black

[blackfmt] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:124:- src/sparc/client/services/metadata.py:125:- with requests.Session() as url_session:

Check warning on line 124 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:124:- src/sparc/client/services/metadata.py:125:- with requests.Session() as url_session:
with requests.Session() as url_session:

retries = Retry(
total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504]
)
retries = Retry(

Check warning on line 127 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:125:+ with requests.Session() as url_session:

Check warning on line 127 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / runner / black

[blackfmt] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:125:+ with requests.Session() as url_session:

Check warning on line 127 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:125:+ with requests.Session() as url_session:
total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504]

Check warning on line 128 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:128:- total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504] src/sparc/client/services/metadata.py:127:+ total=6, src/sparc/client/services/metadata.py:128:+ backoff_factor=1, src/sparc/client/services/metadata.py:129:+ status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504],

Check warning on line 128 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / runner / black

[blackfmt] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:128:- total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504] src/sparc/client/services/metadata.py:127:+ total=6, src/sparc/client/services/metadata.py:128:+ backoff_factor=1, src/sparc/client/services/metadata.py:129:+ status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504],

Check warning on line 128 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:128:- total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504] src/sparc/client/services/metadata.py:127:+ total=6, src/sparc/client/services/metadata.py:128:+ backoff_factor=1, src/sparc/client/services/metadata.py:129:+ status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504],
)

url_session.mount("https://", HTTPAdapter(max_retries=retries))

self.es_success = 1

try:
if headers == "NONE":
url_result = url_session.get(url)
else:
url_result = url_session.get(url, headers=headers)

if url_result.status_code == 410:
logging.warning("Retrieval Status 410 - URL Unpublished:" + url)
else:
url_result.raise_for_status()

except requests.exceptions.HTTPError as errh:
logging.error("Retrieving URL - HTTP Error:", errh)
self.es_success = 0
except requests.exceptions.ConnectionError as errc:
logging.error("Retrieving URL - Error Connecting:", errc)
self.es_success = 0
except requests.exceptions.Timeout as errt:
logging.error("Retrieving URL - Timeout Error:", errt)
self.es_success = 0
except requests.exceptions.RequestException as err:
logging.error("Retrieving URL - Something Else", err)
self.es_success = 0

url_session.close()

if self.es_success == 1:
result = url_result
else:
result = {}
url_session.mount("https://", HTTPAdapter(max_retries=retries))

self.es_success = 1

try:
if headers == "NONE":
url_result = url_session.get(url)
else:
url_result = url_session.get(url, headers=headers)

if url_result.status_code == 410:
logging.warning("Retrieval Status 410 - URL Unpublished:" + url)
else:
url_result.raise_for_status()

except requests.exceptions.HTTPError as errh:
logging.error("Retrieving URL - HTTP Error:", errh)
self.es_success = 0
except requests.exceptions.ConnectionError as errc:
logging.error("Retrieving URL - Error Connecting:", errc)
self.es_success = 0
except requests.exceptions.Timeout as errt:
logging.error("Retrieving URL - Timeout Error:", errt)
self.es_success = 0
except requests.exceptions.RequestException as err:
logging.error("Retrieving URL - Something Else", err)
self.es_success = 0

return result.json()
url_session.close()
hsorby marked this conversation as resolved.
Show resolved Hide resolved

result = url_result if self.es_success == 1 else {}

return result.json()
hsorby marked this conversation as resolved.
Show resolved Hide resolved

#####################################################################
# Function to retrieve content via POST from URL with retries
def postURL(self, url, body, headers="NONE"):
result = "[ERROR]"
url_session = requests.Session()

retries = Retry(
total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504]
)

url_session.mount("https://", HTTPAdapter(max_retries=retries))

try:
if type(body) is dict:
body_json = body
else:
body_json = json.loads(body)
except:
logging.error("Elasticsearch query body can not be read")

self.es_success = 1

try:
if headers == "NONE":
url_result = url_session.post(url, json=body_json)
else:
url_result = url_session.post(url, json=body_json, headers=headers)

if url_result.status_code == 410:
logging.warning("Retrieval Status 410 - URL Unpublished:" + url)
else:
url_result.raise_for_status()

except requests.exceptions.HTTPError as errh:
logging.error("Retrieving URL - HTTP Error:", errh)
self.es_success = 0
except requests.exceptions.ConnectionError as errc:
logging.error("Retrieving URL - Error Connecting:", errc)
self.es_success = 0
except requests.exceptions.Timeout as errt:
logging.error("Retrieving URL - Timeout Error:", errt)
self.es_success = 0
except requests.exceptions.RequestException as err:
logging.error("Retrieving URL - Something Else", err)
self.es_success = 0

url_session.close()

if self.es_success == 1:
result = url_result
else:
result = {}

return result.json()
with requests.Session() as url_session:

Check warning on line 170 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:170:- with requests.Session() as url_session: src/sparc/client/services/metadata.py:171:- src/sparc/client/services/metadata.py:171:+ with requests.Session() as url_session:

Check warning on line 170 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / runner / black

[blackfmt] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:170:- with requests.Session() as url_session: src/sparc/client/services/metadata.py:171:- src/sparc/client/services/metadata.py:171:+ with requests.Session() as url_session:

Check warning on line 170 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:170:- with requests.Session() as url_session: src/sparc/client/services/metadata.py:171:- src/sparc/client/services/metadata.py:171:+ with requests.Session() as url_session:

retries = Retry(
total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504]

Check warning on line 173 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:173:- total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504] src/sparc/client/services/metadata.py:173:+ total=6, src/sparc/client/services/metadata.py:174:+ backoff_factor=1, src/sparc/client/services/metadata.py:175:+ status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504],

Check warning on line 173 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / runner / black

[blackfmt] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:173:- total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504] src/sparc/client/services/metadata.py:173:+ total=6, src/sparc/client/services/metadata.py:174:+ backoff_factor=1, src/sparc/client/services/metadata.py:175:+ status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504],

Check warning on line 173 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:173:- total=6, backoff_factor=1, status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504] src/sparc/client/services/metadata.py:173:+ total=6, src/sparc/client/services/metadata.py:174:+ backoff_factor=1, src/sparc/client/services/metadata.py:175:+ status_forcelist=[403, 404, 413, 429, 500, 502, 503, 504],
)

url_session.mount("https://", HTTPAdapter(max_retries=retries))

try:
if type(body) is dict:
body_json = body
else:
body_json = json.loads(body)
except:
logging.error("Elasticsearch query body can not be read")

self.es_success = 1

try:
if headers == "NONE":
url_result = url_session.post(url, json=body_json)
else:
url_result = url_session.post(url, json=body_json, headers=headers)

if url_result.status_code == 410:
logging.warning("Retrieval Status 410 - URL Unpublished:" + url)
else:
url_result.raise_for_status()

except requests.exceptions.HTTPError as errh:
logging.error("Retrieving URL - HTTP Error:", errh)
self.es_success = 0
except requests.exceptions.ConnectionError as errc:
logging.error("Retrieving URL - Error Connecting:", errc)
self.es_success = 0
except requests.exceptions.Timeout as errt:
logging.error("Retrieving URL - Timeout Error:", errt)
self.es_success = 0
except requests.exceptions.RequestException as err:
logging.error("Retrieving URL - Something Else", err)
self.es_success = 0

url_session.close()
hsorby marked this conversation as resolved.
Show resolved Hide resolved

result = url_result if self.es_success == 1 else {}

return result.json()
hsorby marked this conversation as resolved.
Show resolved Hide resolved

#####################################################################
# Metadata Search Functions
Expand Down Expand Up @@ -270,7 +264,7 @@

"""


Check warning on line 267 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:267:-

Check warning on line 267 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / runner / black

[blackfmt] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:267:-

Check warning on line 267 in src/sparc/client/services/metadata.py

View workflow job for this annotation

GitHub Actions / reviewdog

[formatters] reported by reviewdog 🐶 Raw Output: src/sparc/client/services/metadata.py:267:-
list_url = self.algolia_api + "?" + "key=" + self.scicrunch_api_key

list_results = self.postURL(list_url, body=query, headers=self.default_headers)
Expand Down