Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support alloc-free dev #794

Draft
wants to merge 14 commits into
base: alloc-free
Choose a base branch
from

Conversation

CapZTr
Copy link
Contributor

@CapZTr CapZTr commented Jan 21, 2025

No description provided.

Copy link
Collaborator

@ThomasHaas ThomasHaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a more general note about handling Free/Alloc: I'm not convinced that changing aliasing is the right concept for Alloc/Free. I think it is a misleading road.
What you really care about is not the individual addresses that the Alloc/Free targets, but only the memory objects as a whole. Similarly, you care about the objects a memory access can address if you want to talk about bugs like use-after-free.
So a canAccessSameObject(x, y) method would be more appropriate and would also elevate the issue of requiring allocations of known size (the size does not matter!).
If you then introduce a corresponding sameObj relation in .cat, you should be able to treat Free and Alloc more-or-less as single-address events (i.e., simple memory events).
A use-after-free (or racy free) would then simply be ~empty ([Free];sameObj;[M] \ hb).

Comment on lines 36 to 46
boolean mustAlias(Alloc a, MemoryCoreEvent e);

boolean mayAlias(Alloc a, MemoryCoreEvent e);

boolean mustAlias(Alloc a, MemFree f);

boolean mayAlias(Alloc a, MemFree f);

boolean mustAlias(MemFree a, MemFree b);

boolean mayAlias(MemFree a, MemFree b);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a draft, but you don't really want to go with so many overloads, do you?

@natgavrilenko
Copy link
Collaborator

On a more general note about handling Free/Alloc: I'm not convinced that changing aliasing is the right concept for Alloc/Free. I think it is a misleading road. What you really care about is not the individual addresses that the Alloc/Free targets, but only the memory objects as a whole. Similarly, you care about the objects a memory access can address if you want to talk about bugs like use-after-free. So a canAccessSameObject(x, y) method would be more appropriate and would also elevate the issue of requiring allocations of known size (the size does not matter!). If you then introduce a corresponding sameObj relation in .cat, you should be able to treat Free and Alloc more-or-less as single-address events (i.e., simple memory events). A use-after-free (or racy free) would then simply be ~empty ([Free];sameObj;[M] \ hb).

I think you didn't really check what the code is doing. We need both, individual addresses (i.e. the pointer returned by alloc) and the full allocated memory region to compare with addresses of memory accesses.

@ThomasHaas
Copy link
Collaborator

Why do you need the individual addresses in the full region if you instead had a accessesSameObject(x, y) method? This would hold true for an Alloc that is considered accessing the memory object it allocates and a memory access to anywhere inside that object.
Btw. the most compact representation of the memory region of a memory object is just the memory object itself, rather than the individual addresses inside. This makes the approach viable even for objects of unknown/unbounded size.

@natgavrilenko
Copy link
Collaborator

Why do you need the individual addresses in the full region if you instead had a accessesSameObject(x, y) method? This would hold true for an Alloc that is considered accessing the memory object it allocates and a memory access to anywhere inside that object. Btw. the most compact representation of the memory region of a memory object is just the memory object itself, rather than the individual addresses inside. This makes the approach viable even for objects of unknown/unbounded size.

please read the code first

@ThomasHaas
Copy link
Collaborator

Huh, I read the code?! I can see that all the checks about bounded size allocations and mayAlias(Alloc x , MemoryCoreEvent y) checking that y accesses something inside x. What else should I read in this PR?

@hernanponcedeleon
Copy link
Owner

I'm not convinced that changing aliasing is the right concept for Alloc/Free. I think it is a misleading road ... So a canAccessSameObject(x, y) method would be more appropriate

Isn't the later some kind of alias (maybe not as fine grained as per address, but at least per object)?

and would also elevate the issue of requiring allocations of known size (the size does not matter!).

This is assuming no OOB, right?

If you then introduce a corresponding sameObj relation in .cat, you should be able to treat Free and Alloc more-or-less as single-address events (i.e., simple memory events). A use-after-free (or racy free) would then simply be ~empty ([Free];sameObj;[M] \ hb).

Isn't this kind of what the two new relations allocptr and allocmem are doing? Those relation are not visible in the diff of this PR (which target the initial draft from Natalia), but maybe this is the source of the misunderstanding.

@ThomasHaas
Copy link
Collaborator

I'm not convinced that changing aliasing is the right concept for Alloc/Free. I think it is a misleading road ... So a canAccessSameObject(x, y) method would be more appropriate

Isn't the later some kind of alias (maybe not as fine grained as per address, but at least per object)?

Yes, it is a kind of aliasing. But I think it is worth to differentiate between the concepts of "same address", "overlapping address" and "same object". Covering them all under the term "aliasing" is bound to cause confusion.
Importantly, sameObject is actually easier to compute/reason about because we only ever have finitely many objects but possibly unboundedly many addresses.
For example, our newest alias analysis was not updated in this PR which I think is only partly because of its difficulty, but also partly because it does not compute explicit addresses like the other alias analyses do. However, the analysis does have information about which memory objects a memory event may access, and this is sufficient to implement the desired feature.
That being said, the implementation of canAccessSameObject for the other alias analyses would pretty much coincide with what this PR does.

and would also elevate the issue of requiring allocations of known size (the size does not matter!).

This is assuming no OOB, right?

All alias analyses assume no OOB anyways. So nothing changes there.

If you then introduce a corresponding sameObj relation in .cat, you should be able to treat Free and Alloc more-or-less as single-address events (i.e., simple memory events). A use-after-free (or racy free) would then simply be ~empty ([Free];sameObj;[M] \ hb).

Isn't this kind of what the two new relations allocptr and allocmem are doing? Those relation are not visible in the diff of this PR (which target the initial draft from Natalia), but maybe this is the source of the misunderstanding.

Yes, I know about those relations. I'm saying that you can likely replace them by a general sameObj relation. At least concept-wise, this is worth considering even if the multiple new base relations are preferred for some other reason. Either way, the underlying idea of those relations is still based more on object-based reasoning rather than address-based reasoning.
Lastly, encoding sameObj can be easier than sameAddress, especially if we eventually use a provenance-based pointer model like mentioned in #793 .

public boolean mayAlias(MemFree a, MemFree b) {
return a1.mayAlias(a, b) && a2.mayAlias(a, b);
}

@Override
public Graphviz getGraphVisualization() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need to add the new events to the graph

Signed-off-by: Tianrui Zheng <[email protected]>
@@ -774,14 +774,24 @@ public MutableKnowledge visitAllocPtr(AllocPtr aref) {
MutableEventGraph must = new MapEventGraph();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODOs can be removed now from both relations.

Could you also rename parameter aref -> allocPtr and aloc -> allocMem to match the other names in the class? I forgot to rename them after changing relation names.

@@ -774,14 +774,24 @@ public MutableKnowledge visitAllocPtr(AllocPtr aref) {
MutableEventGraph must = new MapEventGraph();
for (Alloc e1 : program.getThreadEvents(Alloc.class)) {
if (e1.isHeapAllocation()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge these two loops info one, something like this
List allocEvents = program.getThreadEvents(Alloc.class).stream().filter(a -> a.isHeapAllocation()).toList();
List freeEvents = program.getThreadEvents(MemFree.class);
Stream.concat(allocEvents.stream(), freeEvents.stream()).forEach(e1 -> freeEvents.forEach(e2 -> {
...
}));

@@ -795,8 +805,13 @@ public MutableKnowledge visitAllocMem(AllocMem aloc) {
MutableEventGraph must = new MapEventGraph();
for (Alloc e1 : program.getThreadEvents(Alloc.class)) {
if (e1.isHeapAllocation()) {
for (Event e2 : program.getThreadEvents(MemoryEvent.class)) {
may.add(e1, e2);
for (MemoryCoreEvent e2 : program.getThreadEvents(MemoryCoreEvent.class)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a list of MemoryCoreEvent before the first loop, so that we don't need to call getThreadEvents each time?

@@ -795,8 +805,13 @@ public MutableKnowledge visitAllocMem(AllocMem aloc) {
MutableEventGraph must = new MapEventGraph();
for (Alloc e1 : program.getThreadEvents(Alloc.class)) {
if (e1.isHeapAllocation()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be easy to extend this approach to stack events, but I guess you want to get the heap part ready first.

Set<Location> target = targets.get(address);
addresses = target != null ? target : getAddresses(address);
if (addrExpr instanceof Register register) {
Set<Location> target = targets.get(register);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be the same addrExpr (without cast)

return getMaxAddressSet(e).stream().anyMatch(
l -> l.base.equals(a.getAllocatedObject()) && l.offset < getAllocatedSize(a)
);
public boolean mayAlias(Event a, Event b) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this typecheck is a bit on overkill. We have two cases:

  1. A pair of alloc and memory event -> we need to check if memory event may/must access any address from the allocated region.
  2. Everything else, including alloc-free and free-free pairs -> we need to check the "normal" may/must alias for a single pointer.

How about something like this?

@OverRide
public boolean mayAlias(Event a, Event b) {
if (a instanceof Alloc alloc && b instanceof MemoryCoreEvent mem) {
return mayAccessAllocatedBy(alloc, mem);
}
if (b instanceof Alloc alloc && a instanceof MemoryCoreEvent mem) {
// This case shouldn't be called because alloc->mem relation always starts at alloc, we can keep both to be on the safe side.
return mayAccessAllocatedBy(alloc, mem);
}
return mayAccessSameAddress(a, b);
}

And the same for must sets and the other analysis classes.

@Override
public boolean mayAlias(MemFree a, MemFree b) {
return !Sets.intersection(getFreedAddresses(a), getFreedAddresses(b)).isEmpty();
private boolean mayAccessAllocatedBy(Alloc a, Event e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep only the first part and do alloc-free pair in may/mustAccessSameAddress.

}
}

private void processAllocs(Alloc a) {
if (!a.isHeapAllocation()) {
return;
}
Register r = a.getResultRegister();
if (a.getAllocationSize() instanceof IntLiteral i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add alloc to eventAddressSpaceMap and then check alloc-free pair via getMaxAddressSet:

eventAddressSpaceMap.put(a, Set.of(new Location(a.getAllocatedObject(), 0)));

throw new IllegalArgumentException("Unsupported event types for EqualityAliasAnalysis");
}

private boolean mustAccessSameAddress(MemoryCoreEvent a, MemoryCoreEvent b) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need all these complex reasoning in mayAlias/mustAlias methods. In the old comment, I meant that we can implement mustAccessAllocatedBy(Alloc a, MemoryCoreEvent e) following the same logic as for mustAccessSameAddress, i.e. must access is true if 1) the memory event uses the register of the alloc event and 2) the register value has not been overwritten.

We can also try reasoning about register + index accesses, but it will be a bit more complex, because we also need to consider sizes of the member elements and offsets.

final DerivedVariable vx = addressVariables.get(x);
final DerivedVariable vy = addressVariables.get(y);
return vx != null && vy != null && vx.base == vy.base && vx.modifier.offset == vy.modifier.offset &&
isConstant(vx.modifier) && isConstant(vy.modifier);
}

private boolean mayAccessAllocatedBy(Alloc a, Event e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will carefully check it and add comments a bit later.

@xeren
Copy link
Collaborator

xeren commented Feb 5, 2025

You may add the following methods to AliasAnalysis and use them instead:

boolean mayObjectAlias(Event a, Event b);
boolean mustObjectAlias(Event a, Event b);

Implementing it in InclusionBasedPointerAnalysis:

@Override
public boolean mayObjectAlias(Event a, Event b) {
    DerivedVariable addressA = addressVariables.get(a);
    DerivedVariable addressB = addressVariables.get(b);
    return addressA == null || addressB == null ||
            !Collections.disjoint(getAccessibleObjects(addressA), getAccessibleObjects(addressB));
}
@Override
public boolean mustObjectAlias(Event a, Event b) {
    DerivedVariable addressA = addressVariables.get(a);
    DerivedVariable addressB = addressVariables.get(b);
    if (addressA == null | addressB == null) {
        return false;
    }
    if (addressA.base == addressB.base) {
        return true;
    }
    Set<MemoryObject> objectsA = getAccessibleObjects(addressA);
    return objectsA.size() == 1 && objectsA.equals(getAccessibleObjects(addressB));
}
private Set<MemoryObject> getAccessibleObjects(DerivedVariable address) {
    var objects = new HashSet<MemoryObject>();
    objects.add(address.base.object);
    for (IncludeEdge edge : address.base.includes) {
        objects.add(edge.source.object);
    }
    objects.remove(null);
    return objects;
}
private void run(Program program, AliasAnalysis.Config configuration) {
    ...
    for (Alloc alloc : program.getThreadEvents(Alloc.class)) {
        addressVariables.put(alloc, derive(objectVariables.get(alloc.getAllocatedObject())));
    }
    ...
}

Implementing it in EqualityAliasAnalysis:

@Override
public boolean mayObjectAlias(MemoryCoreEvent a, MemoryCoreEvent b) {
    return true;
}
@Override
public boolean mustObjectAlias(MemoryCoreEvent a, MemoryCoreEvent b) {
    return mustAlias(a, b);
}

Implementation for CombinedAliasAnalysis:

@Override
public boolean mayObjectAlias(MemoryCoreEvent a, MemoryCoreEvent b) {
    return a1.mayObjectAlias(a, b) && a2.mayObjectAlias(a, b);
}
@Override
public boolean mustObjectAlias(MemoryCoreEvent a, MemoryCoreEvent b) {
    return a1.mustObjectAlias(a, b) || a2.mustObjectAlias(a, b);
}

Implementing it in AndersenAliasAnalysis and FieldSensitiveAndersen:

@Override
public boolean mayObjectAlias(MemoryCoreEvent a, MemoryCoreEvent b) {
    return !Collections.disjoint(getAccessibleObjects(a), getAccessibleObjects(b));
}
@Override
public boolean mustObjectAlias(MemoryCoreEvent a, MemoryCoreEvent b) {
    Set<MemoryObject> objects = getAccessibleObjects(a);
    return objects.size() == 1 && objects.containsAll(getAccessibleObjects(b));
}
private Set<MemoryObject> getAccessibleObjects(MemoryCoreEvent event) {
    var objects = new HashSet<MemoryObject>();
    for (Location location : getMaxAddressSet(event)) {
        objects.add(location.base);
    }
    return objects;
}

@CapZTr
Copy link
Contributor Author

CapZTr commented Feb 5, 2025

Are you suggesting that we only care about the base object and don't consider offset at all? Won't this result in a loss of precision for analysis when the size of object is integer?

@ThomasHaas
Copy link
Collaborator

Not really. You would only get precision in the presence of out-of-bounds accesses (UB), which we don't handle correctly either way.

for (Event e2 : program.getThreadEvents(MemoryEvent.class)) {
may.add(e1, e2);
for (MemoryCoreEvent e2 : program.getThreadEvents(MemoryCoreEvent.class)) {
if (alias.mayAlias(e1, e2)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip e2 if it is an instance of Init

@@ -202,6 +332,9 @@ private void run(Program program, AliasAnalysis.Config configuration) {
for (final MemoryCoreEvent memoryEvent : program.getThreadEvents(MemoryCoreEvent.class)) {
processMemoryEvent(memoryEvent);
}
for (final MemFree free : program.getThreadEvents(MemFree.class)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other analysis, Alloc -> MemFree should check exact pointer match, not the whole allocated region. So, in order to reuse the same algorithm, you need to allocs to addressVariables.

continue;
}
final Modifier m = compose(i.modifier, v.modifier);
final boolean may = isConstant(m) ? m.offset < size && m.offset >= 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Rene's suggestion and compare only base objects when checking the whole memory region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants