Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Add generated models for standard libraries including core #18787

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

paldepind
Copy link
Contributor

@paldepind paldepind commented Feb 14, 2025

This adds generated models for some of the standard Rust libraries, core, std, alloc, and proc_macro.

We had some test that created .expected output growing with the number of models or taint steps caused by models. That didn't scale well to the new amount of models, so I've tweaked those tests.

@github-actions github-actions bot added the Rust Pull requests that update Rust code label Feb 14, 2025
@paldepind paldepind force-pushed the rust-core-std-models branch 2 times, most recently from 7bce170 to bfb716b Compare February 17, 2025 09:57
@paldepind paldepind force-pushed the rust-core-std-models branch from bfb716b to 0c3e8a0 Compare February 17, 2025 10:08
@paldepind paldepind marked this pull request as ready for review February 17, 2025 10:20
@Copilot Copilot bot review requested due to automatic review settings February 17, 2025 10:20

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR adds generated models for standard Rust libraries (core, std, alloc, and proc_macro) and updates the associated tests. The key changes are:

  • Updating a test annotation in dataflow/strings/main.rs from "hasTaintFlow" to "hasValueFlow".
  • Removing several taint and value model entries in lang-core.model.yml to better scale with the new models.

Changes

File Description
rust/ql/test/library-tests/dataflow/strings/main.rs Updated test comment annotation to reflect new model expectations.
rust/ql/lib/codeql/rust/frameworks/stdlib/lang-core.model.yml Removed several model entries to adjust for increased model volume.

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

rust/ql/test/library-tests/dataflow/strings/main.rs:53

  • Please ensure that changing the annotation from 'hasTaintFlow' to 'hasValueFlow' aligns with the updated test expectations and model semantics.
sink(s2); // $ hasValueFlow=36

rust/ql/lib/codeql/rust/frameworks/stdlib/lang-core.model.yml:7

  • Review the removal of the model entry for 'crate::hint::must_use' to ensure that tests or taint propagation flows are still adequately covered.
-      - ["lang:core", "crate::hint::must_use", "Argument[0]", "ReturnValue", "value", "manual"]

Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more

input = "Argument[self]" and
output = "ReturnValue" and
preservesValue = true and
model = "generated"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this had a model of "" and it seemed to be disabled/overwritten by the generated models. The generated models include a model for clone on i64, which caused the test for this method to fail. Changing the model to generated or manual fixed the problem. I just went with generated without worrying too much as this is temporary anyway.

@@ -11,6 +11,8 @@ private module Input implements InputSig<Location, RustDataFlow> {
not exists(n.asExpr().getLocation())
}

predicate postWithInFlowExclude(RustDataFlow::Node n) { n instanceof Node::FlowSummaryNode }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes some data flow inconsistencies otherwise introduced by the new models. Ruby and C# have the same, so I think this is appropriate.

@@ -14,7 +14,7 @@
| Macro calls - resolved | 2 |
| Macro calls - total | 2 |
| Macro calls - unresolved | 0 |
| Taint edges - number of edges | 4 |
| Taint edges - number of edges | 1465 |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 366x increase in taint edges 📈 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, though I wonder what they all are. I'm assuming hello-project is pretty basic.

@paldepind
Copy link
Contributor Author

DCA shows taint reach going down by 1 on the iced project. That's unexpected, but in the tests things look good, so I don't thing there's much to worry about.

@paldepind paldepind requested a review from hvitved February 20, 2025 11:28
@geoffw0
Copy link
Contributor

geoffw0 commented Feb 24, 2025

We had some test that created .expected output growing with the number of models or taint steps caused by models. That didn't scale well to the new amount of models, so I've tweaked those tests.

Those tests have been starting to irritate me even before we started adding generated models. Thanks for cleaning them up. 👍

DCA shows taint reach going down by 1 on the iced project. That's unexpected, but in the tests things look good, so I don't thing there's much to worry about.

This is very minor, but surprising - surprising enough it might be worth investigating. If you download the database from DCA you could try and narrow down taint edges we have before the changes here but not afterwards???

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, a few points to discuss, and I should really review a few more of the models (at random)...

@@ -14,7 +14,7 @@
| Macro calls - resolved | 2 |
| Macro calls - total | 2 |
| Macro calls - unresolved | 0 |
| Taint edges - number of edges | 4 |
| Taint edges - number of edges | 1465 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, though I wonder what they all are. I'm assuming hello-project is pretty basic.

- ["lang:core", "<crate::result::Result>::unwrap_or", "Argument[self].Field[crate::result::Result::Ok(0)]", "ReturnValue", "value", "dfc-generated"]
- ["lang:core", "<crate::result::Result>::unwrap_or_default", "Argument[self].Field[crate::result::Result::Ok(0)]", "ReturnValue", "value", "dfc-generated"]
- ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[0].ReturnValue", "ReturnValue", "value", "dfc-generated"]
- ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[self].Field[crate::result::Result::Err(0)].Reference", "ReturnValue", "value", "dfc-generated"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is true (the described method is here). Though (assuming I'm right) I doubt the model will do much harm anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely spotted! That model is indeed odd. Both because the error value is not directly returned and because there are no references involved. The latter might be due to some mistakenly inserted reference read step.

In any case, the implementation is very simple, so I would expect the model to be accurate. I've created an internal issue for me to fix this.

- ["lang:core", "<crate::result::Result>::unwrap_or_default", "Argument[self].Field[crate::result::Result::Ok(0)]", "ReturnValue", "value", "dfc-generated"]
- ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[0].ReturnValue", "ReturnValue", "value", "dfc-generated"]
- ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[self].Field[crate::result::Result::Err(0)].Reference", "ReturnValue", "value", "dfc-generated"]
- ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[self].Field[crate::result::Result::Err(0)]", "Argument[0].Parameter[0]", "value", "dfc-generated"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand this model is perfect and I missed it in the manual models. ✨

@paldepind
Copy link
Contributor Author

Looks really good, a few points to discuss, and I should really review a few more of the models (at random)...

Spotting mistakes like the one in unwrap_or_else is valuable, but I would suggest we go ahead with the models in the PR as is. They already add a lot of value and a many of the flaws are from known limitations. Instead I suggest we continuously regenerate the models when the data flow library improves and fix things when we run into problems.

geoffw0
geoffw0 previously approved these changes Feb 25, 2025
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree we should merge this ASAP, but continue discussions about possible follow-up improvements.

I think I just created some merge conflicts by merging #18701 ; let me know if you need any help untangling what happened there (I expect mostly it's those .expected files that change too often).

pack: codeql/rust-all
extensible: summaryModel
data:
- ["lang:std", "<&[u8] as crate::io::BufRead>::consume", "Argument[self].Element", "Argument[self].Reference.Reference", "value", "dfc-generated"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another weird edge involving references. Based on the description I don't think consume should have any taint flows.

geoffw0
geoffw0 previously approved these changes Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants