-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce JPQL/HQL parser for JPQ queries #2846
Conversation
Introduce grammars that support both JPQL (JPA 3.1) as well as HQL (Hibernate 6.1) and allow us to leverage it for query handling. Related: #2814.
* Rename the QueryParser interface and its implementations. * Move render() to inside the QueryParser to the QueryParsingEnhancer doesn't need it. * Since the parser doesn't need to "find" the alias or projections, override the right methods to side step doing that. * Make it so that alias-finding function (which we don't really use anyway?) operates using null instead of "" * Start activating previously deactivated test cases. * Flesh out more comments. Need to activate more test cases to really ensure all the HQL ones we had pre-parser are still active and the expected outputs are being met.
* Since QueryParser had several default functions and a private utility function, I thought it best to migrate to abstract class. * Since HqlUtils and JpqlUtils are slimmed down to one function, I moved that function into HqlQueryParse and JpqlQueryParser. * This makes it super simply to spot the HqlQueryParser and HqlQueryTransformer (and Jpql counterparts) as the operational code here. * Simplified QueryParsingEnhancer as well, ensuring this is super clean. * Also activated more test cases. Will continue restoring test cases as is suitable.
* Remove no longer needed fields. * Since we don't need the ctx object anymore, drop it from all the testing. * Since the render() function operations on a List<QueryParsingToken>, it felt appropriate to move this utility function to QueryParsingToken, and to make it static since it's really a utility method.
* Also activate more test cases and fixed some corner cases.
QueryUtils checks if a given sort parameter is "safe" or not. Pulled this into the parsing solution. Also re-renabled a bunch of test cases, verifying much more functionality. The one lingering thing I don't quite see how to implement are Sort parameters the reference function aliases. For example: select avg(e.salary) as avg from Employee e // Sort.by("avg") In this scenario, it should NOT prefix "order by avg" with the "e" alias. Perhaps, the easiest thing is to put ALL aliases into a Set and then if a given Sort properity is IN that set, don't apply the alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this PR is heading. I left a few initial comments about the structure and design of the parsers. Once these addressed, we can dig into more detail of parsing and rendering queries to make more progress.
How do we ensure that our visitors make sure of all permutations and objects (i.e. that we haven't forgotten to render out a particular case that is possible with JPQL/HQL but we don't have such a case in our tests)?
* @author Greg Turnquist | ||
* @since 3.1 | ||
*/ | ||
abstract class QueryParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current arrangement, we have QueryEnhancer
, QueryParser
, the integration with JSQLParser. QueryParser
as name creates the association of also SQL query parsing. Let's rename this one to JpaQueryParser
for more clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that.
|
||
private final DeclaredQuery declaredQuery; | ||
|
||
QueryParser(DeclaredQuery declaredQuery) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reject native
queries to ensure we do not accidentally accept native queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
* @param parsedQuery | ||
* @param sort can be {@literal null} | ||
*/ | ||
String createQuery(ParserRuleContext parsedQuery, Sort sort) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of methods work with ParserRuleContext
. It seems that we're handing out ParserRuleContext
(state) to another component. Ideally, we would keep the state inside of query parser to not leak antlr internals. We could have e.g. a Lazy<ParserRuleContext>
to lazily initiate parsing and avoid requiring calling code to be aware of the parsing state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is because invoking the ANTLR-generate visit()
method requires passing in one of the generated context objects. ALL of their context classes implement ParserRuleContext
, so this was my way to make HQL and JPQL both have a common API.
Basically:
HqlLexer lexer = new HqlLexer(CharStreams.fromString(query));
HqlParser parser = new HqlParser(new CommonTokenStream(lexer));
parser.addErrorListener(new QueryParsingSyntaxErrorListener());
return parser.start();
...is the ANTLR pattern to parse a query and get handed back an AST.
new HqlQueryTransformer(sort).visit(parsedQuery)
...is the ANTLR way to then apply one of their visitors to that AST. I'm trying to encapsulate this flow inside of QueryParser
.
The reasons I have this "visible" is so that QueryEnhancer
can apply various strategies based on what function it's doing. For instance, when you do applySorting
, it's supposed to raise an IllegalArgumentException
.
@Override
public String applySorting(Sort sort) {
try {
ParserRuleContext parsedQuery = queryParser.parse();
if (parsedQuery == null) {
return "";
}
return queryParser.createQuery(parsedQuery, sort);
} catch (QueryParsingSyntaxError e) {
LOG.warn(e);
throw new IllegalArgumentException(e);
}
}
However, if you run detectAlias
, it's does NOT throw anything, but instead will simply return null
for an invalid query.
@Override
public String detectAlias() {
try {
ParserRuleContext parsedQuery = queryParser.parse();
if (parsedQuery == null) {
LOG.warn("Failed to parse " + queryParser.getQuery() + ". See console for more details.");
return null;
}
return queryParser.findAlias(parsedQuery);
} catch (QueryParsingSyntaxError e) {
LOG.warn(e);
return null;
}
}
This is where it would be nice if we had a chance to refactor QueryEnhancer
. But that feels like too big a change. I figured if we kept all the ANTLR specifics inside QueryParsingEnhancer
, we could minimize that leakage.
*/ | ||
abstract boolean hasConstructor(ParserRuleContext parsedQuery); | ||
|
||
private static final Pattern PUNCTUATION_PATTERN = Pattern.compile(".*((?![._])[\\p{Punct}|\\s])"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: These are at a strange position, should be on top of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move this toward the top.
|
||
private static final Log LOG = LogFactory.getLog(QueryParsingEnhancer.class); | ||
|
||
private final QueryParser queryParser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat approach reusing existing abstractions 👍
*/ | ||
class QueryParsingSyntaxError extends ParseCancellationException { | ||
|
||
public QueryParsingSyntaxError() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we attach the original SQL query to the exception for more context? Also, this exception should be public so that users can handle this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely attach the original query to this exception.
* @author Greg Turnquist | ||
* @since 3.1 | ||
*/ | ||
class QueryParsingSyntaxError extends ParseCancellationException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to let the exception extend InvalidDataAccessResourceUsageException
(similar to BadSqlGrammarException
) instead of an antlr type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. ParseCancellationException
is a signal to ANTLR to stop parsing. I can try it out and see what happens.
* @author Greg Turnquist | ||
* @since 3.1 | ||
*/ | ||
class QueryParsingSyntaxErrorListener extends BaseErrorListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could turn this into an enum or create a singleton to avoid instance creation on each usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, unless we want to pass in the original query. This then becomes a factory operation. I can poke at it to see what's possible.
@@ -68,6 +67,7 @@ class StringQuery implements DeclaredQuery { | |||
* | |||
* @param query must not be {@literal null} or empty. | |||
*/ | |||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this deprecated? This leaves us without a deprecated constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is relic from when I tried to introduce "modes". I'll remove it.
* @author Greg Turnquist | ||
* @since 3.1 | ||
*/ | ||
class JpqlQueryTransformer extends JpqlBaseVisitor<List<QueryParsingToken>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the HQL transformer require additional work.
- The many occurrences of
getText()
(e.g.new QueryParsingToken(ctx.FROM().getText(), true)
) indicate that the code isn't ideal. We're creating a lot of objects for mostly static components that we could cache. Also, regulargetText()
calls could be abstracted away to avoid duplications. - This class combines query parsing/rendering along with our domain-specific tweaks to introduce sorting, count query projections. Ideally, we can encapsulate all base-work that is necessary for query rendering into a base transformer class and have transformers for specific tasks (count query derivation, sorting, …). To put things into perspective:
QueryParser
has individual methods for each use case, while this class combines all use-cases. - HQL and JPQL share a lot of common ground. It isn't ideal that we have so many duplications. Also, when comparing both files side-by-side, it's impossible to put both files next to each other to determine differences. Going forward, this will be the method to detect, if we apply a change to one, that we need to apply the same change to the other transformer as well. We need to fix that. First step would be to align method orders (e.g. that the HQL
visitJoinType
method is at the same place where the JPQLvisitJoin_spec
method is). While this is purely a style aspect from the technical perspective, it is something that decides over how difficult it becomes to maintain parsers.
There are many tokens that are reusable such as "(", ",", etc. By turning them into constant flyweights, this reduces churn. Consequence is that their "space" attribute must be made immutible to avoid side effects, so I introduced a utility function.
…sing To clarify that this feature is JPA-specific, adopt a Jpa prefix (except for concrete situation such as Jpql or Hql specific).
Move all the "boilerplate" operations out of HqlQueryTransformer and into HqlQueryRenderer. This visitor simply re-renders the original query provided. HqlQueryTransform now extends it and ONLY overrides the relevant methods needed to apply sorting, counting, and other things.
Move all the "boilerplate" operations out of JpqlQueryTransformer and into JpqlQueryRenderer. This visitor simply re-renders the original query provided. JpqlQueryTransform now extends it and ONLY overrides the relevant methods needed to apply sorting, counting, and other things.
To minimize leakage of ANTLR details to the outside world, all the internal interactions with ANTLR types has been moved inside of JpaQueryParser. The implementing methods each parser must implement are now flagged as "protected" to further cotain this.
@mp911de I've processed all feedback items discussed this morning. For lack of a better name, I have split This has made I also moved all the ANTLR specifics into I haven't gotten (yet) to any caching solution, but I have verified everything else is functional. |
Align ANTLR version with Hibernate's ANTLR version to avoid version mismatch reports to syserr. Rename types for consistent naming. Avoid duplicate creation of DeclaredQuery within the parsers. Lazify parsing and reuse cached results to avoid parser overhead. Add common configuration for parsers for consistent parser and lexer configurations. Simplify factory methods for HQL and JPQL parsers. Simplify condition flow for query enhancer creation. Refine fields for non-nullability requirements, avoid handing out null for a List.
I added a polishing commit to encapsulate components even more and reuse parser results as the initial parser yielded 1700 parse operations per second while the regex-one did 120000 ops/sec. |
Tickets that require attention once this is done: