-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix false positive for S608
where SQL-like expression follows other text
#8723
Conversation
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
S608 | 11 | 0 | 11 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+0 -11 violations, +0 -0 fixes in 41 projects)
apache/airflow (+0 -11 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --select ALL --preview
- airflow/example_dags/tutorial_objectstorage.py:118:22: S608 Possible SQL injection vector through string-based query construction - airflow/providers/amazon/aws/transfers/s3_to_redshift.py:185:17: S608 Possible SQL injection vector through string-based query construction - airflow/providers/databricks/operators/databricks_sql.py:323:24: S608 Possible SQL injection vector through string-based query construction - airflow/utils/db.py:1197:30: S608 Possible SQL injection vector through string-based query construction - airflow/utils/db_cleanup.py:236:12: S608 Possible SQL injection vector through string-based query construction - tests/providers/amazon/aws/transfers/test_s3_to_redshift.py:224:23: S608 Possible SQL injection vector through string-based query construction - tests/providers/databricks/operators/test_databricks_copy.py:107:12: S608 Possible SQL injection vector through string-based query construction - tests/providers/databricks/operators/test_databricks_copy.py:65:12: S608 Possible SQL injection vector through string-based query construction - tests/providers/databricks/operators/test_databricks_copy.py:87:12: S608 Possible SQL injection vector through string-based query construction - tests/system/providers/google/cloud/bigquery/example_bigquery_queries_async.py:67:18: S608 Possible SQL injection vector through string-based query construction - tests/system/providers/trino/example_trino.py:64:13: S608 Possible SQL injection vector through string-based query construction
Changes by rule (1 rules affected)
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
S608 | 11 | 0 | 11 | 0 | 0 |
All of the airflow ecosystem checks are false negatives :( |
// To avoid false positives, we: | ||
// - Require the SQL to be at the start of the expression, allowing for tokens that are not a part of the string | ||
// - Require whole-word matches for SQL keywords | ||
Regex::new(r#"(?i)\A(\"|f\"|\'|f\'|\\|\\n|\s)*\b(select\s.+\sfrom\s|delete\s+from\s|(insert|replace)\s.+\svalues\s|update\s.+\sset\s)"#) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible thing that might help here is to use verbose mode in the regex. That lets you break things apart using insignificant whitespace and even add comments if you want:
r#"(?ix)
\A
(\"|f\"|\'|f\'|\\|\\n|\s)*
\b
(
select\s.+\sfrom\s
|
delete\s+from\s
|
(insert|replace)\s.+\svalues\s
|
update\s.+\sset\s
)
"#
Or somethinglike that. Dealer's choice about how best to break it up.
Now that I've written out, it makes me wonder about the following:
- Is it intentional to have the escapes in
\"|f\"|\'|f\'|\\|\\n|\s
here?
Namely, since this is a raw string,f\"
will matchf\"
literally and not
f"
. Similarly for the other branches. - While the regex didn't previously have it, is it perhaps desirable to add a
word boundary assertion at the end as well? Although maybe not, since I see
that each of theselect
,delete
,insert|replace
andupdate
branches
all must end with a\s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like a verbose regex with comments :)
Is it intentional to have the escapes
For \\n
, yes I want to match the literal \n
not a newline. For f\"
it looks like the \
just has no effect? With or without it, we still match f-strings in the test fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have suggestions for handling the false negatives? It looks like the regex would need to get much more complicated. This change may not be worth it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derp, yes, f\"
is equivalent to f"
. (Older versions of the regex crate would actually reject such escapes as superfluous.)
With respect to FNs and FPs, are you constrained to a single regex? If not, you could combine the status quo with what you have here. You could change the status quo to only match for uppercase things like SELECT
. That will reduce its false positives but greatly increase the false negatives. Then to decrease false negatives, you could use your second regex here which allows lowercase select
but only at the start of a string.
It's pretty hokey and probably is an overfitting to our current tests but is perhaps an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... definitely hokey but I was also thinking case sensitivity makes some sense...
I almost would rather have a "case insensitive" setting for this which is off by default...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For f-strings, we wouldn't need to worry about the prefix and quotes as in https://github.com/astral-sh/ruff/pull/7927/files#diff-649016c25b517ba99e70caf422c95ef9d544e7c806d26e64624c0414777d54f5 I've removed them. That is, for f-strings, we'd do:
/// Concatenates the contents of an f-string, without the prefix and quotes,
/// and escapes any special characters.
///
/// ## Example
///
/// ```python
/// "foo" f"bar {x}" "baz"
/// ```
///
/// becomes `foobar {x}baz`.
While this regex is being improved, here's an example that is NOT detected by ruff currently (1.0.8) but is detected by bandit:
|
Thank you @brondsem :) |
This turned out to be hard to get right. I'd welcome help improving this but it's going to be tough to balance. |
i have this problem :( S608 Possible SQL injection vector through string-based query construction
|
Closes #8717
Updates the regex to enforce that the SQL keywords are at the start of the string. Unfortunately we are passing "generated expression strings" to the regular expression instead of the inner contents of a string, so I need to allow for all sorts of cruft in front e.g.
f"
,f'
,'
,"
,\n
(literal), and whitespace.