Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Pagination in CPT Dashboard #135

Merged
merged 9 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions backend/app/api/v1/commons/hce.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,27 @@
from app.services.search import ElasticService


async def getData(start_datetime: date, end_datetime: date, configpath: str):
async def getData(
start_datetime: date, end_datetime: date, size: int, offset: int, configpath: str
):
query = {
"query": {"bool": {"filter": {"range": {"date": {"format": "yyyy-MM-dd"}}}}}
}

es = ElasticService(configpath=configpath)
response = await es.post(
query=query,
size=size,
start_date=start_datetime,
end_date=end_datetime,
timestamp_field="date",
)
await es.close()
tasks = [item["_source"] for item in response]
tasks = [item["_source"] for item in response["data"]]
jobs = pd.json_normalize(tasks)
if len(jobs) == 0:
return {"data": jobs, "total": response["total"]}

jobs[["group"]] = jobs[["group"]].fillna(0)
jobs.fillna("", inplace=True)
if len(jobs) == 0:
return jobs
return jobs

Comment on lines +23 to +28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would reduce the code complexity if you just included those two lines in a len(jobs) != 0 case instead of having an identical return early.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of potential refactoring ideas we should pursue to make the somewhat messy codebase more maintainable.

Right now, we've got a long list of PRs we'd really like to land, and we need to focus on the most critical concerns ... starting with landing this PR, #138, and then a follow-on filtering PR which ought to complete the "revamp" branch. Once we land that onto "main" we can churn through my backlog of ilab PRs.

At that point, our focus should be cleaning up the code base (including unit testing, functional testing, lint and format checkers) to make it more maintainable with a viable CI.

As much as I hate being "pragmatic" in things like this, let's not let minor stuff like this bog us down at this point. Maybe leaving these comments in place and open as a reference for later work isn't a bad idea ...

return {"data": jobs, "total": response["total"]}
15 changes: 10 additions & 5 deletions backend/app/api/v1/commons/ocm.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,40 @@
from app.services.search import ElasticService


async def getData(start_datetime: date, end_datetime: date, configpath: str):
async def getData(
start_datetime: date, end_datetime: date, size: int, offset: int, configpath: str
):
query = {
"size": size,
"from": offset,
"query": {
"bool": {
"filter": {"range": {"metrics.earliest": {"format": "yyyy-MM-dd"}}}
}
}
},
}

es = ElasticService(configpath=configpath)
response = await es.post(
query=query,
size=size,
start_date=start_datetime,
end_date=end_datetime,
timestamp_field="metrics.earliest",
)
await es.close()
tasks = [item["_source"] for item in response]
tasks = [item["_source"] for item in response["data"]]
jobs = pd.json_normalize(tasks)
if len(jobs) == 0:
return jobs
return {"data":jobs,"total": response["total"]}

if "buildUrl" not in jobs.columns:
jobs.insert(len(jobs.columns), "buildUrl", "")
if "ciSystem" not in jobs.columns:
jobs.insert(len(jobs.columns), "ciSystem", "")
jobs.fillna("", inplace=True)
jobs["jobStatus"] = jobs.apply(convertJobStatus, axis=1)
jaredoconnell marked this conversation as resolved.
Show resolved Hide resolved
return jobs
return {"data": jobs, "total": response["total"]}


def fillCiSystem(row):
Expand Down
15 changes: 10 additions & 5 deletions backend/app/api/v1/commons/ocp.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,30 @@
from app.services.search import ElasticService


async def getData(start_datetime: date, end_datetime: date, configpath: str):
async def getData(
start_datetime: date, end_datetime: date, size: int, offset: int, configpath: str
):
query = {
"size": size,
"from": offset,
"query": {
"bool": {"filter": {"range": {"timestamp": {"format": "yyyy-MM-dd"}}}}
}
},
}

es = ElasticService(configpath=configpath)
response = await es.post(
query=query,
size=size,
start_date=start_datetime,
end_date=end_datetime,
timestamp_field="timestamp",
)
await es.close()
tasks = [item["_source"] for item in response]
tasks = [item["_source"] for item in response["data"]]
jobs = pd.json_normalize(tasks)
if len(jobs) == 0:
return jobs
return {"data": jobs, "total": response["total"]}

jobs[
["masterNodesCount", "workerNodesCount", "infraNodesCount", "totalNodesCount"]
Expand Down Expand Up @@ -52,7 +57,7 @@ async def getData(start_datetime: date, end_datetime: date, configpath: str):
jbs = cleanJobs
jbs["shortVersion"] = jbs["ocpVersion"].str.slice(0, 4)

return jbs
return {"data": jbs, "total": response["total"]}
Comment on lines 57 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does cleanJobs come from? It may benefit from a comment.

And I do not think jbs makes sense as a variable name. It's just confusing and looks like a typo. Dave also questioned this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanJobs is the result of a pandas dataframe filter to remove rows, although my understanding of the complicated pandas infrastructure is minimal.

What bugs me here is the jbs = cleanJobs to effectively just obscure the name before returning the data using jbs. That's not new with Varshini's changes, though, and I don't think it's practical to push her to rewrite more existing logic than necessary for the revamp. (On the other hand, simply replacing cleanJobs with jbs or dropping the intermediary jbs would make the code easier to read, and it'd be "simple"... it's just that an endless list of "simple" things isn't simple anymore. 🤔 💣 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will definitely need to keep a note of this for a future refactor PR.



def fillEncryptionType(row):
Expand Down
17 changes: 12 additions & 5 deletions backend/app/api/v1/commons/quay.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,15 @@
from app.services.search import ElasticService


async def getData(start_datetime: date, end_datetime: date, configpath: str):
async def getData(
start_datetime: date, end_datetime: date, size, offset, configpath: str
):
query = {
"size": size,
"from": offset,
"query": {
"bool": {"filter": {"range": {"timestamp": {"format": "yyyy-MM-dd"}}}}
}
},
}

es = ElasticService(configpath=configpath)
Expand All @@ -19,10 +23,10 @@ async def getData(start_datetime: date, end_datetime: date, configpath: str):
timestamp_field="timestamp",
)
await es.close()
tasks = [item["_source"] for item in response]
tasks = [item["_source"] for item in response["data"]]
jobs = pd.json_normalize(tasks)
if len(jobs) == 0:
return jobs
return {"data": jobs, "total": response["total"]}

jobs[
["masterNodesCount", "workerNodesCount", "infraNodesCount", "totalNodesCount"]
Expand All @@ -38,4 +42,7 @@ async def getData(start_datetime: date, end_datetime: date, configpath: str):
jobs["build"] = jobs.apply(utils.getBuild, axis=1)
jobs["shortVersion"] = jobs["ocpVersion"].str.slice(0, 4)

return jobs[jobs["platform"] != ""]
cleanJobs = jobs[jobs["platform"] != ""]

jbs = cleanJobs
return {"data": jbs, "total": response["total"]}
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved
14 changes: 8 additions & 6 deletions backend/app/api/v1/commons/telco.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
import app.api.v1.endpoints.telco.telcoGraphs as telcoGraphs


async def getData(start_datetime: date, end_datetime: date, configpath: str):
async def getData(
start_datetime: date, end_datetime: date, size: int, offset: int, configpath: str
):
test_types = [
"oslat",
"cyclictest",
Expand Down Expand Up @@ -41,10 +43,12 @@ async def getData(start_datetime: date, end_datetime: date, configpath: str):
['test_type="{}"'.format(test_type) for test_type in test_types]
)
splunk = SplunkService(configpath=configpath)
response = await splunk.query(query=query, searchList=searchList)
response = await splunk.query(
query=query, size=size, offset=offset, searchList=searchList
)
mapped_list = []

for each_response in response:
for each_response in response["data"]:
end_timestamp = int(each_response["timestamp"])
test_data = each_response["data"]
threshold = await telcoGraphs.process_json(test_data, True)
Expand Down Expand Up @@ -83,7 +87,5 @@ async def getData(start_datetime: date, end_datetime: date, configpath: str):
)

jobs = pd.json_normalize(mapped_list)
if len(jobs) == 0:
return jobs

return jobs
return {"data": jobs, "total": response["total"]}
2 changes: 1 addition & 1 deletion backend/app/api/v1/commons/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ async def getMetadata(uuid: str, configpath: str):
es = ElasticService(configpath=configpath)
response = await es.post(query=query)
await es.close()
meta = [item["_source"] for item in response]
meta = [item["_source"] for item in response["data"]]
return meta[0]
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved


Expand Down
105 changes: 78 additions & 27 deletions backend/app/api/v1/endpoints/cpt/cptJobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
@router.get(
"/api/v1/cpt/jobs",
summary="Returns a job list from all the products.",
description="Returns a list of jobs in the specified dates. \
description="Returns a list of jobs in the specified dates of requested size \
If not dates are provided the API will default the values. \
`startDate`: will be set to the day of the request minus 5 days.\
`endDate`: will be set to the day of the request.",
Expand All @@ -48,7 +48,10 @@ async def jobs(
description="End date for searching jobs, format: 'YYYY-MM-DD'",
examples=["2020-11-15"],
),
pretty: bool = Query(False, description="Output contet in pretty format."),
pretty: bool = Query(False, description="Output content in pretty format."),
size: int = Query(None, description="Number of jobs to fetch"),
offset: int = Query(None, description="Offset Number to fetch jobs from"),
totalJobs: int = Query(None, description="Total number of jobs"),
):
if start_date is None:
start_date = datetime.utcnow().date()
Expand All @@ -66,23 +69,35 @@ async def jobs(
)

results_df = pd.DataFrame()
total_dict = {}
total = 0
with ProcessPoolExecutor(max_workers=cpu_count()) as executor:
futures = {
executor.submit(fetch_product, product, start_date, end_date): product
executor.submit(
fetch_product, product, start_date, end_date, size, offset
): product
for product in products
}
for future in as_completed(futures):
product = futures[future]
try:
result = future.result()
results_df = pd.concat([results_df, result])
total_dict[product] = result["total"]
results_df = pd.concat([results_df, result["data"]])
except Exception as e:
print(f"Error fetching data for product {product}: {e}")

# on first hit, totalJobs is 0
if totalJobs == 0:
for product in total_dict:
total += int(total_dict[product])
totalJobs = total
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved
response = {
"startDate": start_date.__str__(),
"endDate": end_date.__str__(),
"results": results_df.to_dict("records"),
"total": totalJobs,
"offset": offset + size,
}

if pretty:
Expand All @@ -93,34 +108,70 @@ async def jobs(
return jsonstring


async def fetch_product_async(product, start_date, end_date):
async def fetch_product_async(product, start_date, end_date, size, offset):
try:
df = await products[product](start_date, end_date)
return (
df.loc[
:,
[
"ciSystem",
"uuid",
"releaseStream",
"jobStatus",
"buildUrl",
"startDate",
"endDate",
"product",
"version",
"testName",
],
]
if len(df) != 0
else df
)
response = await products[product](start_date, end_date, size, offset)
if response:
df = response["data"]
return {
"data": (
df.loc[
:,
[
"ciSystem",
"uuid",
"releaseStream",
"jobStatus",
"buildUrl",
"startDate",
"endDate",
"product",
"version",
"testName",
],
]
if len(df) != 0
else df
),
"total": response["total"],
}
except ConnectionError:
print("Connection Error in mapper for product " + product)
except Exception as e:
print(f"Error in mapper for product {product}: {e}")
return pd.DataFrame()


def fetch_product(product, start_date, end_date):
return asyncio.run(fetch_product_async(product, start_date, end_date))
def fetch_product(product, start_date, end_date, size, offset):
return asyncio.run(fetch_product_async(product, start_date, end_date, size, offset))


def is_requested_size_available(total_count, offset, requested_size):
"""
Check if the requested size of data is available starting from a given offset.

Args:
total_count (int): Total number of available records.
offset (int): The starting position in the dataset.
requested_size (int): The number of records requested.

Returns:
bool: True if the requested size is available, False otherwise.
"""
return (offset + requested_size) <= total_count


def calculate_remaining_data(total_count, offset, requested_size):
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved
"""
Calculate the remaining number of data items that can be fetched based on the requested size.

Args:
total_count (int): Total number of available records.
offset (int): The starting position in the dataset.
requested_size (int): The number of records requested.

Returns:
int: The number of records that can be fetched, which may be less than or equal to requested_size.
"""
available_data = total_count - offset # Data available from the offset
return min(available_data, requested_size)
Loading