Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix false positive for S608 where SQL-like expression follows other text #8723

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,5 @@ def query41():
query = "REPLACE INTO table VALUES (%s)" % (var,)
query = "REPLACE table VALUES (%s)" % (var,)

query = "Deselect something that is not SQL even though it has a ' from ' somewhere in %s." % "there"
not_a_query = "Deselect something that is not SQL even though it has a ' from ' somewhere in %s." % "there"
not_a_query = f"Please select a value from the list of possible variants: {variants}."
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,15 @@ use crate::checkers::ast::Checker;
use super::super::helpers::string_literal;

static SQL_REGEX: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"(?i)\b(select\s.+\sfrom\s|delete\s+from\s|(insert|replace)\s.+\svalues\s|update\s.+\sset\s)")
// We pass this generated expression strings like:
// "SELECT " + val + " FROM " + table
// f'delete from table where var = {var}'
// "\n SELECT *\n FROM table\n WHERE var = {}\n ".format(var)
//
// To avoid false positives, we:
// - Require the SQL to be at the start of the expression, allowing for tokens that are not a part of the string
// - Require whole-word matches for SQL keywords
Regex::new(r#"(?i)\A(\"|f\"|\'|f\'|\\|\\n|\s)*\b(select\s.+\sfrom\s|delete\s+from\s|(insert|replace)\s.+\svalues\s|update\s.+\sset\s)"#)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible thing that might help here is to use verbose mode in the regex. That lets you break things apart using insignificant whitespace and even add comments if you want:

r#"(?ix)
\A
(\"|f\"|\'|f\'|\\|\\n|\s)*
\b
(
  select\s.+\sfrom\s
  |
  delete\s+from\s
  |
  (insert|replace)\s.+\svalues\s
  |
  update\s.+\sset\s
)
"#

Or somethinglike that. Dealer's choice about how best to break it up.

Now that I've written out, it makes me wonder about the following:

  • Is it intentional to have the escapes in \"|f\"|\'|f\'|\\|\\n|\s here?
    Namely, since this is a raw string, f\" will match f\" literally and not
    f". Similarly for the other branches.
  • While the regex didn't previously have it, is it perhaps desirable to add a
    word boundary assertion at the end as well? Although maybe not, since I see
    that each of the select, delete, insert|replace and update branches
    all must end with a \s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a verbose regex with comments :)

Is it intentional to have the escapes

For \\n, yes I want to match the literal \n not a newline. For f\" it looks like the \ just has no effect? With or without it, we still match f-strings in the test fixture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have suggestions for handling the false negatives? It looks like the regex would need to get much more complicated. This change may not be worth it as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, yes, f\" is equivalent to f". (Older versions of the regex crate would actually reject such escapes as superfluous.)

With respect to FNs and FPs, are you constrained to a single regex? If not, you could combine the status quo with what you have here. You could change the status quo to only match for uppercase things like SELECT. That will reduce its false positives but greatly increase the false negatives. Then to decrease false negatives, you could use your second regex here which allows lowercase select but only at the start of a string.

It's pretty hokey and probably is an overfitting to our current tests but is perhaps an improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... definitely hokey but I was also thinking case sensitivity makes some sense...

I almost would rather have a "case insensitive" setting for this which is off by default...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For f-strings, we wouldn't need to worry about the prefix and quotes as in https://github.com/astral-sh/ruff/pull/7927/files#diff-649016c25b517ba99e70caf422c95ef9d544e7c806d26e64624c0414777d54f5 I've removed them. That is, for f-strings, we'd do:

/// Concatenates the contents of an f-string, without the prefix and quotes,
/// and escapes any special characters.
///
/// ## Example
///
/// ```python
/// "foo" f"bar {x}" "baz"
/// ```
///
/// becomes `foobar {x}baz`.

.unwrap()
});

Expand Down Expand Up @@ -51,7 +59,7 @@ fn has_string_literal(expr: &Expr) -> bool {
}

fn matches_sql_statement(string: &str) -> bool {
SQL_REGEX.is_match(string)
SQL_REGEX.is_match(string.trim_start())
}

fn matches_string_format_expression(expr: &Expr, semantic: &SemanticModel) -> bool {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ S608.py:102:9: S608 Possible SQL injection vector through string-based query con
102 | query = "REPLACE table VALUES (%s)" % (var,)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ S608
103 |
104 | query = "Deselect something that is not SQL even though it has a ' from ' somewhere in %s." % "there"
104 | not_a_query = "Deselect something that is not SQL even though it has a ' from ' somewhere in %s." % "there"
|


Loading