Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix false positive for S608 where SQL-like expression follows other text #8723

Closed
wants to merge 1 commit into from

Conversation

zanieb
Copy link
Member

@zanieb zanieb commented Nov 16, 2023

Closes #8717

Updates the regex to enforce that the SQL keywords are at the start of the string. Unfortunately we are passing "generated expression strings" to the regular expression instead of the inner contents of a string, so I need to allow for all sorts of cruft in front e.g. f", f', ', ", \n (literal), and whitespace.

@zanieb zanieb added the bug Something isn't working label Nov 16, 2023
Copy link
Contributor

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+0 -11 violations, +0 -0 fixes in 41 projects)

apache/airflow (+0 -11 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --select ALL

- airflow/example_dags/tutorial_objectstorage.py:118:22: S608 Possible SQL injection vector through string-based query construction
- airflow/providers/amazon/aws/transfers/s3_to_redshift.py:185:17: S608 Possible SQL injection vector through string-based query construction
- airflow/providers/databricks/operators/databricks_sql.py:323:24: S608 Possible SQL injection vector through string-based query construction
- airflow/utils/db.py:1197:30: S608 Possible SQL injection vector through string-based query construction
- airflow/utils/db_cleanup.py:236:12: S608 Possible SQL injection vector through string-based query construction
- tests/providers/amazon/aws/transfers/test_s3_to_redshift.py:224:23: S608 Possible SQL injection vector through string-based query construction
- tests/providers/databricks/operators/test_databricks_copy.py:107:12: S608 Possible SQL injection vector through string-based query construction
- tests/providers/databricks/operators/test_databricks_copy.py:65:12: S608 Possible SQL injection vector through string-based query construction
- tests/providers/databricks/operators/test_databricks_copy.py:87:12: S608 Possible SQL injection vector through string-based query construction
- tests/system/providers/google/cloud/bigquery/example_bigquery_queries_async.py:67:18: S608 Possible SQL injection vector through string-based query construction
- tests/system/providers/trino/example_trino.py:64:13: S608 Possible SQL injection vector through string-based query construction

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
S608 11 0 11 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -11 violations, +0 -0 fixes in 41 projects)

apache/airflow (+0 -11 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --select ALL --preview

- airflow/example_dags/tutorial_objectstorage.py:118:22: S608 Possible SQL injection vector through string-based query construction
- airflow/providers/amazon/aws/transfers/s3_to_redshift.py:185:17: S608 Possible SQL injection vector through string-based query construction
- airflow/providers/databricks/operators/databricks_sql.py:323:24: S608 Possible SQL injection vector through string-based query construction
- airflow/utils/db.py:1197:30: S608 Possible SQL injection vector through string-based query construction
- airflow/utils/db_cleanup.py:236:12: S608 Possible SQL injection vector through string-based query construction
- tests/providers/amazon/aws/transfers/test_s3_to_redshift.py:224:23: S608 Possible SQL injection vector through string-based query construction
- tests/providers/databricks/operators/test_databricks_copy.py:107:12: S608 Possible SQL injection vector through string-based query construction
- tests/providers/databricks/operators/test_databricks_copy.py:65:12: S608 Possible SQL injection vector through string-based query construction
- tests/providers/databricks/operators/test_databricks_copy.py:87:12: S608 Possible SQL injection vector through string-based query construction
- tests/system/providers/google/cloud/bigquery/example_bigquery_queries_async.py:67:18: S608 Possible SQL injection vector through string-based query construction
- tests/system/providers/trino/example_trino.py:64:13: S608 Possible SQL injection vector through string-based query construction

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
S608 11 0 11 0 0

@zanieb
Copy link
Member Author

zanieb commented Nov 16, 2023

All of the airflow ecosystem checks are false negatives :(

// To avoid false positives, we:
// - Require the SQL to be at the start of the expression, allowing for tokens that are not a part of the string
// - Require whole-word matches for SQL keywords
Regex::new(r#"(?i)\A(\"|f\"|\'|f\'|\\|\\n|\s)*\b(select\s.+\sfrom\s|delete\s+from\s|(insert|replace)\s.+\svalues\s|update\s.+\sset\s)"#)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible thing that might help here is to use verbose mode in the regex. That lets you break things apart using insignificant whitespace and even add comments if you want:

r#"(?ix)
\A
(\"|f\"|\'|f\'|\\|\\n|\s)*
\b
(
  select\s.+\sfrom\s
  |
  delete\s+from\s
  |
  (insert|replace)\s.+\svalues\s
  |
  update\s.+\sset\s
)
"#

Or somethinglike that. Dealer's choice about how best to break it up.

Now that I've written out, it makes me wonder about the following:

  • Is it intentional to have the escapes in \"|f\"|\'|f\'|\\|\\n|\s here?
    Namely, since this is a raw string, f\" will match f\" literally and not
    f". Similarly for the other branches.
  • While the regex didn't previously have it, is it perhaps desirable to add a
    word boundary assertion at the end as well? Although maybe not, since I see
    that each of the select, delete, insert|replace and update branches
    all must end with a \s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a verbose regex with comments :)

Is it intentional to have the escapes

For \\n, yes I want to match the literal \n not a newline. For f\" it looks like the \ just has no effect? With or without it, we still match f-strings in the test fixture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have suggestions for handling the false negatives? It looks like the regex would need to get much more complicated. This change may not be worth it as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, yes, f\" is equivalent to f". (Older versions of the regex crate would actually reject such escapes as superfluous.)

With respect to FNs and FPs, are you constrained to a single regex? If not, you could combine the status quo with what you have here. You could change the status quo to only match for uppercase things like SELECT. That will reduce its false positives but greatly increase the false negatives. Then to decrease false negatives, you could use your second regex here which allows lowercase select but only at the start of a string.

It's pretty hokey and probably is an overfitting to our current tests but is perhaps an improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... definitely hokey but I was also thinking case sensitivity makes some sense...

I almost would rather have a "case insensitive" setting for this which is off by default...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For f-strings, we wouldn't need to worry about the prefix and quotes as in https://github.com/astral-sh/ruff/pull/7927/files#diff-649016c25b517ba99e70caf422c95ef9d544e7c806d26e64624c0414777d54f5 I've removed them. That is, for f-strings, we'd do:

/// Concatenates the contents of an f-string, without the prefix and quotes,
/// and escapes any special characters.
///
/// ## Example
///
/// ```python
/// "foo" f"bar {x}" "baz"
/// ```
///
/// becomes `foobar {x}baz`.

@brondsem
Copy link

While this regex is being improved, here's an example that is NOT detected by ruff currently (1.0.8) but is detected by bandit:

f"""SELECT
    foo from bar where a={x}"""

@charliermarsh
Copy link
Member

Thank you @brondsem :)

@zanieb zanieb marked this pull request as draft March 12, 2024 18:38
@zanieb
Copy link
Member Author

zanieb commented Mar 12, 2024

This turned out to be hard to get right. I'd welcome help improving this but it's going to be tough to balance.

@zanieb zanieb closed this Mar 12, 2024
@Arexils
Copy link

Arexils commented Aug 24, 2024

i have this problem :(

S608 Possible SQL injection vector through string-based query construction

logger.warning(f'The linked role <{1}> has been delete from the guild {2}')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FALSE POSITIVE] "S608 Possible SQL injection" when not applicable
6 participants