Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add promql query converter and request handler #111

Merged
merged 32 commits into from
Dec 9, 2021
Merged

Conversation

findingrish
Copy link

@findingrish findingrish commented Nov 30, 2021

This pr adds support for translating ht query request to promql, defines a prometheus view definition and request handler

Not every query can be served by prometheus, there are set of conditions under which a query maybe served by prometheus.

First a note on how data is stored in prometheus.

Currently num_calls metric time series is stored in prometheus with following attributes
num_calls(tenant_id, service_id, service_name, api_id, api_name)
This is a subset of the data from rawServiceView table
In prometheus every metric will be an individual time series and there are some attributes on it for aggregations

The PrometheusViewDefinition (or PrometheusMetricDefinition) defines the attributeMapping and the metricMapping.

There are two types of prometheus query, instant and timeseries, any ht query request with dateTimeConvert function is timeseries request
Conditions for a query to be served by prometheus:

  1. Aggregation selections: It must be an aggregation query, the function selection must be on metric column and those metric must be present in prometheus (and thus the view definition).
  2. Normal selection and groupby should be on same columns (groupby on dateTimeConvert can be skipped)
  3. Filter must be on simple columns (column or attribute only) and it should only have and operator (prometheus only allows simple filter applied anded, https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors)
  4. Time filter are part of the api request to prometheus are skipped from the query string
  5. orderby is not supported now

Conversion example:

Select service_name, api_name, COUNT(errorCount), AVG(num_calls) FROM ServiceView WHERE tenant_id = '__default' AND ( start_time_millis > 100 AND end_time_millis < 200 AND service_id IN ('1', '2', '3') AND REGEXP_LIKE(service_name,'someregex') ) GROUP BY service_name, span_name

translates to following prometheus queries

1. "count by (service_name, api_name) (count_over_time(error_count{service_id=\"1|2|3\", service_name=~\"someregex\"}[100ms]))"
2. "avg by (service_name, api_name) (avg_over_time(num_calls{service_id=\"1|2|3\", service_name=~\"someregex\"}[100ms]))"

Rules for conversion:

  1. Each aggregation function results in a query, so number of queries equals number of aggregation function requested
  2. the column in the function expression gives the metric series to be queried
  3. the function name is translated to the prometheus function
  4. the group by list makes the by clause of the query
  5. all the filters are put together comma separated inside curly brace
  6. the duration of the query is appended in square brackets

hypertrace/hypertrace#324, hypertrace/hypertrace#320

@codecov
Copy link

codecov bot commented Dec 3, 2021

Codecov Report

Merging #111 (9b102f0) into main (2e9b4ec) will decrease coverage by 3.91%.
The diff coverage is 55.78%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #111      +/-   ##
============================================
- Coverage     82.51%   78.60%   -3.92%     
- Complexity      456      508      +52     
============================================
  Files            41       50       +9     
  Lines          1716     2000     +284     
  Branches        180      216      +36     
============================================
+ Hits           1416     1572     +156     
- Misses          215      333     +118     
- Partials         85       95      +10     
Flag Coverage Δ
integration 78.60% <55.78%> (-3.92%) ⬇️
unit 76.53% <54.42%> (-3.95%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...service/pinot/QueryRequestToPinotSQLConverter.java 86.80% <ø> (+0.68%) ⬆️
...vice/prometheus/PrometheusBasedRequestHandler.java 0.00% <0.00%> (ø)
...e/prometheus/QueryRequestEligibilityValidator.java 13.43% <13.43%> (ø)
...ce/prometheus/PrometheusRequestHandlerBuilder.java 30.00% <30.00%> (ø)
...ypertrace/core/query/service/QueryRequestUtil.java 52.08% <36.36%> (-4.68%) ⬇️
...ervice/prometheus/PrometheusFunctionConverter.java 50.00% <50.00%> (ø)
...ry/service/prometheus/FilterToPromqlConverter.java 50.98% <50.98%> (ø)
.../hypertrace/core/query/service/QueryTimeRange.java 80.00% <80.00%> (ø)
...y/service/prometheus/PrometheusViewDefinition.java 92.30% <92.30%> (ø)
...ypertrace/core/query/service/ExecutionContext.java 94.36% <100.00%> (+0.12%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e9b4ec...9b102f0. Read the comment docs.

@github-actions

This comment has been minimized.

@findingrish findingrish changed the title feat: Add promql query converter feat: Add promql query converter and request handler Dec 3, 2021
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

tenantAttributeName = tenant_id
prometheusViewDefinition {
viewName = rawServiceView
metricMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each metric map entry, can we define something like

attribute: API.numCalls
            metric : { metricName: "num_calls",
              metricType: "GAUGE" }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log a ticket for this so we can change this structure to more readable and align with SERVICE.numCall attribute pattern?

@findingrish findingrish marked this pull request as ready for review December 3, 2021 18:44
Copy link
Contributor

@aaron-steinfeld aaron-steinfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the idea was to introduce a new API? Has that been done? Why did we choose to build from the bottom up - it makes it far easier to have to rework things when issues come up on integration at higher layers.

return executionContext.getTimeSeriesPeriod().get();
}

private String buildQuery(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this function as it is just returning the string from the arguments ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some logic need to go in here (in the upcoming changes)

@Test
void testCalculateCost_aggregationNotSupported() {
Builder builder = QueryRequest.newBuilder();
builder.addAggregation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use createCountByColumnSelection method ?

@github-actions

This comment has been minimized.

private QueryRequest buildMultipleGroupByMultipleAggQueryWithMultipleFiltersAndDateTime() {
Builder builder = QueryRequest.newBuilder();
builder.addAggregation(
createFunctionExpression("Count", createColumnExpression("SERVICE.errorCount").build()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate function for count

{
name = raw-service-view-service-prometheus-handler
type = prometheus
clientConfig = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to hook host and port property for this in follow-up hook PR.

tenantAttributeName = tenant_id
prometheusViewDefinition {
viewName = rawServiceView
metricMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log a ticket for this so we can change this structure to more readable and align with SERVICE.numCall attribute pattern?

metricName: "num_calls",
metricType: "GAUGE"
},
errorCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only have num_calls emitter for now. So, can you remove this? We will add it later when we add error_count emitter.

"EVENT.protocolName": "protocol_name",
"EVENT.status_code": "status_code",
"API_TRACE.status_code": "status_code",
"API.startTime": "start_time_millis",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this "API.startTime": "start_time_millis", "API.endTime": "end_time_millis",?


public class PrometheusBasedRequestHandler implements RequestHandler {

public static final String VIEW_DEFINITION_CONFIG_KEY = "prometheusViewDefinition";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public -> private

private final String viewName;
private final String tenantColumnName;
private final Map<String, MetricConfig> metricMap;
private final Map<String, String> columnMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to attributeMap?

Map<String, MetricConfig> metricMap,
Map<String, String> columnMap) {
this.viewName = viewName;
this.tenantColumnName = tenantColumnName; // tenantAttributeName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tenantColumnName -> tenantAttributeName ?

@Value
@AllArgsConstructor
public static class MetricConfig {
String name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name -> metricName ?


QueryCost calculateCost(QueryRequest queryRequest, ExecutionContext executionContext) {
try {
// orderBy to be supported later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a ticket for this in same epic?

return true;
}
return prometheusViewDefinition.getMetricConfigForLogicalMetricName(attributeId)
== null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put a comment as general (e.g only single arugment functions are supported on metrics columns only)?

}

// filter lhs should be column or simple attribute
if (!QueryRequestUtil.isSimpleColumnExpression(filter.getLhs())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can e check this as leaf filter condition?

return true;
}

// todo check for valid operators here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment?

== null;
});
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a comment here that currently only AND filters supported. Later this can be extended for OR filter on same column (attributeId) expression.

LinkedHashSet<Expression> allSelections,
QueryTimeRange queryTimeRange,
String timeFilterColumn) {
List<String> groupByList = getGroupByList(request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will have to add tenant filter as default filter? Hopefully, tenant filter is already present in QueryReqeust?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes added tenant filter, updated test


// iterate over all the functions in the query except for date time function (which is handled
// separately and not a part of the query string)
return allSelections.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allSelection = groupBy selection list + function selection list, right?

template,
function,
groupByList,
function + "_over_time", // assuming gauge type of metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have metricConfig with type (metricName, metricType), can we through the exception while preparing metricConfig itself that only supported type is Gauge for now?

Comment on lines 119 to 120
"%s by (%s) (%s(%s{%s}[%sms]))"; // sum by (a1, a2) (sum_over_time(num_calls{a4="..",
// a5=".."}[xms]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this example // sum by (a1, a2) (sum_over_time(num_calls{a4="..", // a5=".."}[xms])) above function?

}

private MetricConfig getMetricConfigForFunction(Expression functionSelection) {
return prometheusViewDefinition.getMetricConfigForLogicalMetricName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, you want here also we can check the type of metrics for gauge

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the type in QueryRequestEligibilityValidator

*/
void convertFilterToString(
Filter filter,
List<String> filterList,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like filterList is output? nit : can this be last argument? Also can we put comments for args?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, made it last argument

childFilter, filterList, timeFilterColumn, expressionToColumnConverter);
}
} else {
if (QueryRequestUtil.isSimpleColumnExpression(filter.getLhs())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSimpleColumnExpression do we need to check again? Isn't this already check in selection process?

StringBuilder builder = new StringBuilder();
builder.append(expressionToColumnConverter.apply(filter.getLhs()));
builder.append(convertOperatorToString(filter.getOperator()));
builder.append(convertLiteralToString(filter.getRhs().getLiteral()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the null value convertLiteralToString mean here? I haven't try such query? Instead of null, would it make to have "" filter or no filter at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing exception from convertLiteralToString

.build();
builder.setFilter(andFilter);

builder.addGroupBy(createColumnExpression("SERVICE.name"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the selection expression required?


// time filter is removed from the query
String query1 =
"count by (service_name, api_name) (count_over_time(error_count{service_id=\"1|2|3\", service_name=~\"someregex\"}[100ms]))";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, the value 100ms of expression [], we may have to change to value of time series period. we can talk on this.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm.


public enum MetricType {
GAUGE,
COUNTER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In follow up remove the COUNTER for now.

@findingrish findingrish merged commit 74a83c6 into main Dec 9, 2021
@findingrish findingrish deleted the metrics-dev branch December 9, 2021 13:38
@github-actions
Copy link

github-actions bot commented Dec 9, 2021

Unit Test Results

  28 files  +2    28 suites  +2   8s ⏱️ -2s
152 tests +6  152 ✔️ +6  0 💤 ±0  0 ❌ ±0 

Results for commit 74a83c6. ± Comparison against base commit 2e9b4ec.

@findingrish
Copy link
Author

findingrish commented Dec 9, 2021

Filed following backlogs: #112, #114, #115, #116, #117, #118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants