Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce JPQL/HQL parser for JPQ queries #2846

Closed
wants to merge 13 commits into from
Closed

Introduce JPQL/HQL parser for JPQ queries #2846

wants to merge 13 commits into from

Conversation

gregturn
Copy link
Contributor

@gregturn gregturn commented Mar 9, 2023

Tickets that require attention once this is done:

gregturn added 7 commits March 2, 2023 14:03
Introduce grammars that support both JPQL (JPA 3.1) as well as HQL (Hibernate 6.1) and allow us to leverage it for query handling.

Related: #2814.
* Rename the QueryParser interface and its implementations.
* Move render() to inside the QueryParser to the QueryParsingEnhancer doesn't need it.
* Since the parser doesn't need to "find" the alias or projections, override the right methods to side step doing that.
* Make it so that alias-finding function (which we don't really use anyway?) operates using null instead of ""
* Start activating previously deactivated test cases.
* Flesh out more comments.

Need to activate more test cases to really ensure all the HQL ones we had pre-parser are still active and the expected outputs are being met.
* Since QueryParser had several default functions and a private utility function, I thought it best to migrate to abstract class.
* Since HqlUtils and JpqlUtils are slimmed down to one function, I moved that function into HqlQueryParse and JpqlQueryParser.
* This makes it super simply to spot the HqlQueryParser and HqlQueryTransformer (and Jpql counterparts) as the operational code here.
* Simplified QueryParsingEnhancer as well, ensuring this is super clean.
* Also activated more test cases. Will continue restoring test cases as is suitable.
* Remove no longer needed fields.
* Since we don't need the ctx object anymore, drop it from all the testing.
* Since the render() function operations on a List<QueryParsingToken>, it felt appropriate to move this utility function to QueryParsingToken, and to make it static since it's really a utility method.
* Also activate more test cases and fixed some corner cases.
QueryUtils checks if a given sort parameter is "safe" or not. Pulled this into the parsing solution.

Also re-renabled a bunch of test cases, verifying much more functionality.

The one lingering thing I don't quite see how to implement are Sort parameters the reference function aliases. For example:

select avg(e.salary) as avg from Employee e // Sort.by("avg")

In this scenario, it should NOT prefix "order by avg" with the "e" alias. Perhaps, the easiest thing is to put ALL aliases into a Set and then if a given Sort properity is IN that set, don't apply the alias?
@gregturn gregturn requested review from schauder and mp911de March 9, 2023 05:30
@mp911de mp911de self-assigned this Mar 10, 2023
Copy link
Member

@mp911de mp911de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this PR is heading. I left a few initial comments about the structure and design of the parsers. Once these addressed, we can dig into more detail of parsing and rendering queries to make more progress.

How do we ensure that our visitors make sure of all permutations and objects (i.e. that we haven't forgotten to render out a particular case that is possible with JPQL/HQL but we don't have such a case in our tests)?

* @author Greg Turnquist
* @since 3.1
*/
abstract class QueryParser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current arrangement, we have QueryEnhancer, QueryParser, the integration with JSQLParser. QueryParser as name creates the association of also SQL query parsing. Let's rename this one to JpaQueryParser for more clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that.


private final DeclaredQuery declaredQuery;

QueryParser(DeclaredQuery declaredQuery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reject native queries to ensure we do not accidentally accept native queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

* @param parsedQuery
* @param sort can be {@literal null}
*/
String createQuery(ParserRuleContext parsedQuery, Sort sort) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of methods work with ParserRuleContext. It seems that we're handing out ParserRuleContext (state) to another component. Ideally, we would keep the state inside of query parser to not leak antlr internals. We could have e.g. a Lazy<ParserRuleContext> to lazily initiate parsing and avoid requiring calling code to be aware of the parsing state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is because invoking the ANTLR-generate visit() method requires passing in one of the generated context objects. ALL of their context classes implement ParserRuleContext, so this was my way to make HQL and JPQL both have a common API.

Basically:

HqlLexer lexer = new HqlLexer(CharStreams.fromString(query));
HqlParser parser = new HqlParser(new CommonTokenStream(lexer));

parser.addErrorListener(new QueryParsingSyntaxErrorListener());

return parser.start();

...is the ANTLR pattern to parse a query and get handed back an AST.

new HqlQueryTransformer(sort).visit(parsedQuery)

...is the ANTLR way to then apply one of their visitors to that AST. I'm trying to encapsulate this flow inside of QueryParser.

The reasons I have this "visible" is so that QueryEnhancer can apply various strategies based on what function it's doing. For instance, when you do applySorting, it's supposed to raise an IllegalArgumentException.

@Override
public String applySorting(Sort sort) {

	try {
		ParserRuleContext parsedQuery = queryParser.parse();

		if (parsedQuery == null) {
			return "";
		}

		return queryParser.createQuery(parsedQuery, sort);
	} catch (QueryParsingSyntaxError e) {
		LOG.warn(e);
		throw new IllegalArgumentException(e);
	}
}

However, if you run detectAlias, it's does NOT throw anything, but instead will simply return null for an invalid query.

@Override
public String detectAlias() {

	try {
		ParserRuleContext parsedQuery = queryParser.parse();

		if (parsedQuery == null) {

			LOG.warn("Failed to parse " + queryParser.getQuery() + ". See console for more details.");
			return null;
		}

		return queryParser.findAlias(parsedQuery);
	} catch (QueryParsingSyntaxError e) {
		LOG.warn(e);
		return null;
	}
}

This is where it would be nice if we had a chance to refactor QueryEnhancer. But that feels like too big a change. I figured if we kept all the ANTLR specifics inside QueryParsingEnhancer, we could minimize that leakage.

*/
abstract boolean hasConstructor(ParserRuleContext parsedQuery);

private static final Pattern PUNCTUATION_PATTERN = Pattern.compile(".*((?![._])[\\p{Punct}|\\s])");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These are at a strange position, should be on top of the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move this toward the top.


private static final Log LOG = LogFactory.getLog(QueryParsingEnhancer.class);

private final QueryParser queryParser;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat approach reusing existing abstractions 👍

*/
class QueryParsingSyntaxError extends ParseCancellationException {

public QueryParsingSyntaxError() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we attach the original SQL query to the exception for more context? Also, this exception should be public so that users can handle this exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely attach the original query to this exception.

* @author Greg Turnquist
* @since 3.1
*/
class QueryParsingSyntaxError extends ParseCancellationException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to let the exception extend InvalidDataAccessResourceUsageException (similar to BadSqlGrammarException) instead of an antlr type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. ParseCancellationException is a signal to ANTLR to stop parsing. I can try it out and see what happens.

* @author Greg Turnquist
* @since 3.1
*/
class QueryParsingSyntaxErrorListener extends BaseErrorListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could turn this into an enum or create a singleton to avoid instance creation on each usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, unless we want to pass in the original query. This then becomes a factory operation. I can poke at it to see what's possible.

@@ -68,6 +67,7 @@ class StringQuery implements DeclaredQuery {
*
* @param query must not be {@literal null} or empty.
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this deprecated? This leaves us without a deprecated constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is relic from when I tried to introduce "modes". I'll remove it.

* @author Greg Turnquist
* @since 3.1
*/
class JpqlQueryTransformer extends JpqlBaseVisitor<List<QueryParsingToken>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the HQL transformer require additional work.

  1. The many occurrences of getText() (e.g. new QueryParsingToken(ctx.FROM().getText(), true)) indicate that the code isn't ideal. We're creating a lot of objects for mostly static components that we could cache. Also, regular getText() calls could be abstracted away to avoid duplications.
  2. This class combines query parsing/rendering along with our domain-specific tweaks to introduce sorting, count query projections. Ideally, we can encapsulate all base-work that is necessary for query rendering into a base transformer class and have transformers for specific tasks (count query derivation, sorting, …). To put things into perspective: QueryParser has individual methods for each use case, while this class combines all use-cases.
  3. HQL and JPQL share a lot of common ground. It isn't ideal that we have so many duplications. Also, when comparing both files side-by-side, it's impossible to put both files next to each other to determine differences. Going forward, this will be the method to detect, if we apply a change to one, that we need to apply the same change to the other transformer as well. We need to fix that. First step would be to align method orders (e.g. that the HQL visitJoinType method is at the same place where the JPQL visitJoin_spec method is). While this is purely a style aspect from the technical perspective, it is something that decides over how difficult it becomes to maintain parsers.

@mp911de mp911de added the type: enhancement A general enhancement label Mar 10, 2023
There are many tokens that are reusable such as "(", ",", etc. By turning them into constant flyweights, this reduces churn. Consequence is that their "space" attribute must be made immutible to avoid side effects, so I introduced a utility function.
…sing

To clarify that this feature is JPA-specific, adopt a Jpa prefix (except for concrete situation such as Jpql or Hql specific).
Move all the "boilerplate" operations out of HqlQueryTransformer and into HqlQueryRenderer. This visitor simply re-renders the original query provided. HqlQueryTransform now extends it and ONLY overrides the relevant methods needed to apply sorting, counting, and other things.
Move all the "boilerplate" operations out of JpqlQueryTransformer and into JpqlQueryRenderer. This visitor simply re-renders the original query provided. JpqlQueryTransform now extends it and ONLY overrides the relevant methods needed to apply sorting, counting, and other things.
To minimize leakage of ANTLR details to the outside world, all the internal interactions with ANTLR types has been moved inside of JpaQueryParser. The implementing methods each parser must implement are now flagged as "protected" to further cotain this.
@gregturn
Copy link
Contributor Author

@mp911de I've processed all feedback items discussed this morning. For lack of a better name, I have split HqlQueryTransformer into HqlQueryRenderer and HqlQueryTransformer. All of the HQL specification tests I wrote earlier I now have running in a separate test class (HqlQueryRendererTests) to ensure that it renders queries exactly as they came in.

This has made HqlQueryTransformer much lighterweight, and eases our ability to make overrides and adjustments. And I implemented the counterpart to this for JPQL-based queries.

I also moved all the ANTLR specifics into JpaQueryParser such that the QueryParsingEnhancer doesn't have to see it. And the internal ANTLR-based methods are all now protected, further sealing this off.

I haven't gotten (yet) to any caching solution, but I have verified everything else is functional.

@mp911de mp911de changed the title Introduce JPQL/HQL parser for JPQ queries. Introduce JPQL/HQL parser for JPQ queries Mar 14, 2023
Align ANTLR version with Hibernate's ANTLR version to avoid version mismatch reports to syserr.

Rename types for consistent naming. Avoid duplicate creation of DeclaredQuery within the parsers. Lazify parsing and reuse cached results to avoid parser overhead. Add common configuration for parsers for consistent parser and lexer configurations.
Simplify factory methods for HQL and JPQL parsers. Simplify condition flow for query enhancer creation.

Refine fields for non-nullability requirements, avoid handing out null for a List.
@mp911de
Copy link
Member

mp911de commented Mar 14, 2023

I added a polishing commit to encapsulate components even more and reuse parser results as the initial parser yielded 1700 parse operations per second while the regex-one did 120000 ops/sec.

@mp911de mp911de added this to the 3.1 M3 (2023.0.0) milestone Mar 14, 2023
@mp911de mp911de added the in: query-parser Everything related to parsing JPQL or SQL label Mar 14, 2023
@gregturn gregturn closed this Mar 14, 2023
@gregturn gregturn deleted the issue/jpql branch March 14, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: query-parser Everything related to parsing JPQL or SQL type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants