Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using advanced storage strategy, why copy the peer data file into the same dir as the output file? #1683

Open
embroede opened this issue Sep 16, 2022 · 7 comments

Comments

@embroede
Copy link
Contributor

Per the documentation, when using the io.d7y.storage.v2.advance storage strategy, the peer data file is copied into the same directory as the output file.

After running dfget <url> -O /tmp/eddie_test, I have observed that there are actually 3 hard links to the file. They are:

  • /tmp/eddie_test
  • /tmp/.eddie_test.dfget.cache.<req.PeerID>
  • <dataDir>/<req.TaskID>/<req.PeerID>/data

Why not just copy the file to the dataDir, and link from there?

@gaius-qi
Copy link
Member

To avoid copying the daemon cache across filesystems to the specified directory.

@jim3ma
Copy link
Member

jim3ma commented Sep 26, 2022

Hardlink is fast than copying the file. io.d7y.storage.v2.simple storage strategy will copy the file.

@embroede
Copy link
Contributor Author

embroede commented Sep 26, 2022

Yep I definitely like the hard link approach. But I don't see why links need to exist in <dataDir> and in the output path.

If the <dataDir> is on a different filesystem, I believe we could use a symlink (and I see there is code already to do this).

So just download straight to <dataDir>, and then either hard link or symlink to the output path?

@jim3ma
Copy link
Member

jim3ma commented Sep 27, 2022

The strategy io.d7y.storage.v2.simple will make symlink if is on different filesystems

@embroede
Copy link
Contributor Author

It appears that in https://github.com/dragonflyoss/Dragonfly2/blob/main/client/daemon/storage/storage_manager.go#L454 the symlink is done as a fallback if the hard link fails, when using io.d7y.storage.v2.advance.

@embroede
Copy link
Contributor Author

I just updated my comment above, as my <data_dir> wasn't in backticks, so was being hidden.

@embroede
Copy link
Contributor Author

@jim3ma @gaius-qi To clarify, what I'd like to know is: Why is it not sufficient to download the file to the dataDir, and then link (either hardlink or symlink) to it?

Why do we need /tmp/.eddie_test.dfget.cache.<req.PeerID>?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants