Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turn off the cache for a directory #28

Open
beroal opened this issue Mar 18, 2016 · 14 comments
Open

turn off the cache for a directory #28

beroal opened this issue Mar 18, 2016 · 14 comments

Comments

@beroal
Copy link

beroal commented Mar 18, 2016

This rather is a support request. Can I disable the cache for a specific directories? Benefit is as follows. A program transmits big files over a network and stores hosts' metadata in the metadata file. It's better not to disable cache for the metadata file.

@Feh
Copy link
Owner

Feh commented Mar 19, 2016

A more generic solution would be to introduce a method to either include (“whitelist”) or exclude (“blacklist”) certain glob patterns. The problem here is that the only way you can configure nocache’s behavior is by specifying environment variables, so this approach would mean you have to: think of two good variable names, parse their contents into an array or a list and then match every open() call against each expression. Feel free to implement this if you need it; I’ll take a look at it, but I currently don’t have the time to do it myself.

@beroal
Copy link
Author

beroal commented Mar 19, 2016

The problem here is that the only way you can configure nocache’s behavior is by specifying environment variables

Why so? There are commands which accept options and a command, for example, "nice", "sudo", "env", "time", "xargs".

@Feh
Copy link
Owner

Feh commented Mar 19, 2016

True; but the functionality of nocache is achieved by the wrapper shell script setting the LD_PRELOAD env variable. The initializer of nocache.so is only called from the specified executable and has no access to command line arguments. See for example how the -n option is implemented.

@beroal
Copy link
Author

beroal commented Mar 19, 2016

A more generic solution would be to introduce a method to either include (“whitelist”) or exclude (“blacklist”) certain glob patterns.

A wildcard never matches the pathname separator, so how do I specify all descendants of a directory by a glob pattern?

@Feh
Copy link
Owner

Feh commented Mar 19, 2016

Use fnmatch(3) without FNM_PATHNAME.

$ cat fnmatch.c
#include <fnmatch.h>
int main(int argc, char *argv[]) {
        return fnmatch("foo/*", argv[1], 0);
}
$ gcc -Wall -o fnmatch fnmatch.c
$ ./fnmatch foo/bar/baz && echo it matches
it matches

@beroal
Copy link
Author

beroal commented Mar 19, 2016

Well, I implemented this feature request in my fork. I decided to use POSIX Extended Regular Expressions because they are more straightforward and more powerful than glob patterns. What do you think?

@beroal
Copy link
Author

beroal commented Mar 19, 2016

Because the library remembers which pages (ie., 4K-blocks of the file) were already in file system cache when the file was opened, these will not be marked as "don't need", because other applications might need that, although they are not actively used (think: hot standby).

I don't understand this. Do you think that OS uses the last suggestion instead of joining suggestions from all processes?

@Feh
Copy link
Owner

Feh commented Mar 19, 2016

I’ve added some comments to your commit beroal@c3956d3

Do you think that OS uses the last suggestion instead of joining suggestions from all processes?

The reality is a bit more complicated, but in principle, yes. If process A reads file X completely it’s in the FS cache; if B now maps X and does an fadvise with “don’t need” on the file descriptor, the contents are evicted from the cache; subsequent reads of A from X will require going back to the storage medium to retrieve data.

In other words: Without this mechanism, you might evict files that are in active use, thereby impacting other processes.

@beroal
Copy link
Author

beroal commented Mar 19, 2016

Then the Linux kernel is kind of stupid.

@beroal
Copy link
Author

beroal commented Mar 19, 2016

Regarding maybe_store_pageinfo. All my additions contain cond or pattern. I group code by keywords. Other suggestions are implemented.

@beroal
Copy link
Author

beroal commented Mar 19, 2016

Documentation. The cache is disabled for a file iff (I and not E) where I iff the file name satisfies the environment variable NOCACHE_PATTERN_INCLUDE (default: true), E iff the file name satisfies the environment variable NOCACHE_PATTERN_EXCLUDE (default: false). Both variables are treated as POSIX Extended Regular Expressions.

@Feh
Copy link
Owner

Feh commented Mar 19, 2016

I left some comments on beroal@1e6061c again.

Then the Linux kernel is kind of stupid.

Yes, and you’re welcome to improve it. The code is in mm/fadvise.c. Beware though that good and robust cache invalidation is one of the harder problems in programming.

Documentation.

Can you please add command line options to the nocache shell wrapper and add documentation to the Readme?

@beroal
Copy link
Author

beroal commented Mar 20, 2016

I’d make explicit what you expect, i.e. if(regcomp(…) != NULL)

Look at the type of regcomp.

@beroal
Copy link
Author

beroal commented Mar 20, 2016

Can you please add command line options to the nocache shell wrapper and add documentation to the Readme?

Sorry, I don't know the Bash programming language and I'm happy with that. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants