Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XXX [mycpp] add LinkedList #1769

Closed
wants to merge 1 commit into from
Closed

XXX [mycpp] add LinkedList #1769

wants to merge 1 commit into from

Conversation

melvinw
Copy link
Collaborator

@melvinw melvinw commented Dec 18, 2023

This commit adds LinkedList, which implements a simple doubly-linked list. This will be useful for building caches.

This commit adds LinkedList, which implements a simple doubly-linked
list. This will be useful for building caches.
@andychu
Copy link
Contributor

andychu commented Dec 18, 2023

Hmm is there any reason to make it GC-managed? I think the ownership is clear for a cache -- it's global and can be torn down at process exit

i.e. We could just have our own simple manually managed linked list.

Oh OK well I guess a reason is if the entries themselves are GC objects.

In this case the value is a regex_t which is NOT a GC object. Actually that will complicate tracing. And the key is a Str, which in general IS a GC object

But we could also have the cache own copies of the Str as a char* or whatever


Tracing is still expensive on some workloads

http://travis-ci.oilshell.org/github-jobs/5644/raw-vm.wwz/_tmp/perf/osh-parse.report.txt

I would tend to go for the minimal thing that will solve regcomp() first ... i.e. whatever makes it fast

Not sure if it has to be integrated with the GC

@andychu
Copy link
Contributor

andychu commented Dec 18, 2023

Oh yeah and remember we actually have no code hooks for deallocating ... So we have to figure out where to put the regfree().

I guess that's a main goal -- to prevent that potentially infinite memory leak (while not redoing regcomp() and regfree() every time unnecessarily)

The dealloc hook / finalizer is a pretty hard problem and I don't think we need it in general, since this is I think the first time its come up

}

static constexpr uint32_t field_mask() {
return maskbit(offsetof(LinkedListNode, obj_)) |
Copy link
Contributor

@andychu andychu Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because whether this bit is set depends on whether T is a pointer or not

In the unit tests, it looks like it's not a pointer, which I think means bad things will happen here

For Slab and Tuple{2,3,4} I think we're using std::is_pointer<T> at compile time to conditionally set the mask! That took a bit of figuring out ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@melvinw
Copy link
Collaborator Author

melvinw commented Dec 19, 2023

Hmm is there any reason to make it GC-managed? I think the ownership is clear for a cache -- it's global and can be torn down at process exit

i.e. We could just have our own simple manually managed linked list.

Oh OK well I guess a reason is if the entries themselves are GC objects.

In this case the value is a regex_t which is NOT a GC object. Actually that will complicate tracing. And the key is a Str, which in general IS a GC object

Oh, yeah. I guess this doesn't need to be managed by the GC (marking all the nodes could be wasteful...). I went with this since we needed Str* as keys. More commentary on this directly below.

But we could also have the cache own copies of the Str as a char* or whatever

This probably isn't necessary. The cache type that uses this list will still have to have a Dict<Str*, LinkedListNode<...>*> anyway, right? It should be safe to just have the linked list use Tuple2<Str*, regex_t> since the lifetimes should be coupled. Does this make sense? I didn't realize this until just now.

Oh yeah and remember we actually have no code hooks for deallocating ... So we have to figure out where to put the regfree().

Oh, yeah, this could be messy. The clearest spot to add this is probably at evict-time. Two ideas below.

We could have LRUCache::Set() return the handle to the value of the evicted entry if it had to evict something. Then the caller could unpack and do the appropriate cleanup. I don't love this.

Maybe we could have the user provide a cleanup method in the constructor? We could have LRUCache<K, V>::LRUCache(int capacity, void(V)* cleanup). Then whenever the cache decides to evict something, it calls the provided cleanup func on the victim

@andychu
Copy link
Contributor

andychu commented Dec 19, 2023

Hm yeah not sure, I'd be tempted just to hard-code regfree() and see what it looks like

It doesn't have to be generic at first

I think the other instances of LRU caches are in Python code, and it seems like we're not sure how or if to unify them ...


In general I think this is tricky / unique because it's one of few places (only place?) where we actually have a heap object of a type we didn't define ourselves -- the regex_t

@melvinw
Copy link
Collaborator Author

melvinw commented Dec 19, 2023

Hmm ok. Let me ditch this pr and just send some specialized thing then

@melvinw melvinw closed this Dec 19, 2023
@melvinw melvinw deleted the linked-list branch December 19, 2023 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants