Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix and IPFS #859

Open
vcunat opened this issue Mar 24, 2016 · 179 comments
Open

Nix and IPFS #859

vcunat opened this issue Mar 24, 2016 · 179 comments
Assignees
Labels
feature Feature request or proposal fetching Networking with the outside (non-Nix) world, input locking performance significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. store Issues and pull requests concerning the Nix store UX The way in which users interact with Nix. Higher level than UI.

Comments

@vcunat
Copy link
Member

vcunat commented Mar 24, 2016

(I wanted to split this thread from #296 (comment) .)

Let's discuss relations with IPFS here. As I see it, mainly a decentralized way to distribute nix-stored data would be appreciated.

What we might start with

The easiest usable step might be to allow distribution of fixed-output derivations over IPFS. That are paths that already are content-addressed, typically by (truncated) sha256 over either a flat file or a tar-like dump of a directory tree; more details are in the docs. These paths are mainly used for compressed tarballs of sources. This step itself should avoid lots of problems with unstable upstream downloads, assuming we could convince enough nixers to serve their files over IPFS.

Converting hashes

One of the difficulties is that we use different kinds of hashing than in IPFS, and I don't think it would be good to require converting those many thousands of hashes in our expressions. (Note that it's infeasible to convert among those hashes unless you have the whole content.) IPFS people might best suggest how to work around this. I imagine we want to "serve" a mapping from the hashes we use to the IPFS's hashes, perhaps realized through IPNS. (I don't know details of IPFS's design, I'm afraid.) There's an advantage that one can easily verify the nix-style hash in the end after obtaining the paths in any way.

Non-fixed content

If we get that far, it shouldn't be too hard to manage distributing everything via IPFS, as for all other derivations we use something we could call indirect content addressing. To explain that, let's look at how we distribute binaries now – our binary caches. We hash the build recipe, including all its recipe dependencies, and we inspect the corresponding narinfo URL on cache.nixos.org. If our build farm has built that recipe, various information is in that file, mainly the hashes of the content of the resulting outputs of that build and crypto-signatures of them.

Note that this narinfo step just converts our problem to the previous fixed-output case, and the conversion itself seems very reminiscent of IPNS.

Deduplication

Note that nix-built stuff has significantly greater than usual potential for chunk-level deduplication. Very often we do a rebuild of a package only because something in a dependency has changed, so there are only very minor changes expected in the results, mainly just exchanging the references to runtime dependencies as their paths have changed. (In seldom occasions even lengths of the paths would change.) There's a great potential to save on that during distribution of binaries, which would be utilized by implementing the section above, and even potential in saving disk space in comparison to our way of hardlinking equal files (the next paragraph).

Saving disk space

Another use might be to actually store the files in a FS similar to what IPFS uses. That seems a little more complex and tricky thing to deploy, e.g. I'm not sure someone already trusts the implementation of the FS enough to have the whole OS running of it.

It's probably premature to speculate too much on this use ATM; I'll just write I can imagine having symlinks from /nix/store/foo to /ipfs/*, representing the locally trusted version of that path. (That's working around the problems related to making /nix/store/foo content-addressed.) Perhaps it could start as a per-path opt-in, so one could move only the less vital paths out of /nix/store itself.


I can help personally with bridging the two communities in my spare time. Not too long ago, I spent many months on researching various ways to handle "highly redundant" data, mainly from the point of view of theoretical computer science.

@ehmry
Copy link
Contributor

ehmry commented Mar 24, 2016

I'm curious what the minimalist way to associate store paths to IPFS objects while interfering as little as possible with IPFS-unaware tools would be.

@vcunat
Copy link
Member Author

vcunat commented Mar 24, 2016

I described such a way in the second paragraph from bottom. It should work with IPFS and nix store as they are, perhaps with some script that would move the data, create the symlink and pin the path in IPFS to avoid losing it during GC. (It could be unpinned when nix deletes the symlink during GC.)

@ehmry
Copy link
Contributor

ehmry commented Mar 24, 2016

I was thinking about avoiding storing store objects in something that wouldn't require a daemon, but of course you can't have everything.

@Ericson2314
Copy link
Member

@vcunat Great write up! More thoughts on this later, but one thing that gets me is the tension between wanting incremental goals, and avoiding work we don't need long term. For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

Maybe the best first step is a new non-flat/non-NAR hashing strategy for fixed output derivations? We can slowly convert nixpkgs to use that, and get IPFS mirroring and dedup in the fixed-output case. Another step is using git tree hashes for fetch git. We already want to do that, and I suspect IPFS would want that too for other users. IPFS's multihash can certainly be heavily abused for such a thing :).

@Ericson2314
Copy link
Member

For me the end goal should be only using IPNS for the derivation -> build map. Any trust-based compatibility map between hashing schemes long term makes the perfectionist in me sad :).

@vcunat
Copy link
Member Author

vcunat commented Mar 24, 2016

For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

I meant that we would "use" some IPFS hashes but also utilize a mapping from our current hashes, perhaps run over IPNS, so that it would still be possible to run our fetchurl { sha256 = "..." } without modification. Note that it's these flat tarball hashes that most upstreams release and sign, and that's not going to change anytime soon, moreover there's not much point in trying to deduplicate compressed tarballs anyway. (We might choose to use uncompressed sources instead, but that's just another partially independent decision I'm not sure about.)

@Ericson2314
Copy link
Member

For single files / IPFS blobs, we should be able to hash the same way without modification.

@Ericson2314
Copy link
Member

But for VCS fetches we currently do a recursive/nar hash right? That is what I was worried about.

@Ericson2314
Copy link
Member

@ehmry I assume it would be pretty easy to make the nix store an immutable FUSE filesystem backed by IPFS (hopefully such a thing exists already). Down the road I'd like to have package references and the other things currently in the SQLite database also backed by IPFS: they would "appear" in the fuse filesystem as specially-named symlinks/hard-links/duplicated sub-directories. "referees" is the only field I'm aware of that'd be a cache on top. Nix would keep track of roots, but IPFS would do GC itself, in the obvious way.

@cleverca22
Copy link
Contributor

one idea i had, was to keep all outputs in NAR format, and have the fuse layer dynamically unpack things on-demand, that can then be used with some other planned IPFS features to share a file without copying it into the block storage

then you get a compressed store and don't have to store 2 copies of everything (the nar for sharing and the installed)

@nmikhailov
Copy link

@cleverca22 yeah, I had same thoughts about that, its unclear how hard this would impact performance though

@cleverca22
Copy link
Contributor

could keep a cache of recently used files in a normal tmpfs, and relay things over to that to boost performance back up

@davidar
Copy link

davidar commented Apr 8, 2016

@cleverca22 another idea that was mentioned previously was to add support for NAR to ipfs, so that we can transparently unpack it as we do with TAR currently (ipfs tar --help)

@Ericson2314
Copy link
Member

NAR sucks though---no file-level dedup we could otherwise get for free. The above might be fine as a temporary step, but Nix should learn about a better format.

@davidar
Copy link

davidar commented Apr 9, 2016

@Ericson2314 another option that was mentioned was for Nix and IPFS (and perhaps others) to try to standardise on a common archive format

@Ericson2314
Copy link
Member

@davidar Sure that's always good. For the shortish term, I was leaning towards a stripped down unixfs with just the attributes NAR cares about. As far as Nix is concerned this is basically the same format but with a different hashing scheme.

@Ericson2314
Copy link
Member

Yeah looking at Car, it's seems to be both an "IPFS Schema" over the IPFS Merkel DAG (Unless it just reuses unixfs), and then an interchange format for packing the dag into one binary blob.

That former is cool, but I don't think Nix even needs the latter (except perhaps as a new way to fall back on http etc if IPFS is not available while using a compatible format). For normal operation, I'd hope nix could just ask IPFS to populate the fuse filesystem that is the store given a hash, and everything else would be transparent.

@cleverca22
Copy link
Contributor

https://github.com/cleverca22/fusenar

i now have a nixos container booting with a fuse filesystem at /nix/store, which mmap's a bunch of .nar files, and transparently reads the requested files

@knupfer
Copy link

knupfer commented Jul 20, 2016

What is currently missing for using IPFS? How could I contribute? I really need this feature for work.

@knupfer
Copy link

knupfer commented Jul 20, 2016

Pinging @jbenet and @whyrusleeping because they are only enlisted on the old issue.

@copumpkin
Copy link
Member

@knupfer I think writing a fetchIPFS would be a pretty easy first step. Deeper integration will be more work and require touching Nix itself.

@knupfer
Copy link

knupfer commented Jul 28, 2016

Ok, I'm working on it but there are some problems. Apparently, ipfs doesn't save the executable flag, so stuff like stdenv doesn't work, because it expects an executable configure. The alternative would be to distribute tarballs and not directories, but that would be clearly inferior because it would exclude deduplication on file level. Any thoughts on that? I could make every file executable, but that would be not very nice...

@copumpkin
Copy link
Member

@knupfer it's not great, but would it be possible to distribute a "permissions spec file" paired with a derivation, that specifies file modes out of band? Think of it like a JSON file or whatever format and your thing pulls from IPFS, then applies the file modes to the contents of the directory as specified in the spec. The spec could be identified unique by the folder it's a spec for.

@copumpkin
Copy link
Member

copumpkin commented Jul 28, 2016

In fact, the unit of distribution could be something like:

{
  "contents": "/ipfs/12345",
  "permissions": "/ipfs/647123"
}

@knupfer
Copy link

knupfer commented Jul 28, 2016

Yep, that would work. Albeit it makes it more complicated for the user to add some sources to ipfs. But we could for example give an additional url in the fetchIPFS which wouldn't be in ipfs, and if it fetches from normal web automatically generate the permissions file and add that to ipfs... I'll think a bit about it.

@davidak
Copy link
Member

davidak commented Jul 28, 2016

ipfs doesn't save the executable flag

should it? @jbenet

how is ipfs-npm doing it? maybe also just distributes tarballs. that is of course not the most elegant solution.

@stale
Copy link

stale bot commented Nov 16, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Nov 16, 2021
@davidak
Copy link
Member

davidak commented Nov 16, 2021

What are the next steps to have this as an official feature? Do we still have to wait for CA or is it good enough? Do we need an RFC?

@stale stale bot removed the stale label Nov 16, 2021
@Ericson2314
Copy link
Member

@davidak As far as I am concerned, the functionality is good enough to head straight to code review and land some experimental features, as experimental features do not require an RFC. (c.f. NixOS/rfcs#92 (review)). (There are no a decent amount of conflicts in the later P.R.s, but I would happily go fix this if we started merging the earlier ones.)

Now, when I talked to @edolstra before, he was a bit skeptical of this, especially with there being good concrete use-cases from the get-go. I was hoping complete https://nlnet.nl/project/SoftwareHeritage-P2P/ to make a more concrete use-case before taking up the issue again. But we haven't started that yet because of staffing constraints which will hopefully dissipate soon.

I suppose I could start and RFC now anyways, even if it isn't strictly required, so the portion of the community interested can make itself heard, and we have a more spec-style feature list as opposed to the tutorial-style one that is https://github.com/obsidiansystems/ipfs-nix-guide/blob/master/tutorial.md. I didn't do that yet because, again, wanted the SWH use-case, and also because I have other RFCs in flight and limited time, but I could be convinced I ought to go write that RFC anyways :).

@Ericson2314
Copy link
Member

https://www.softwareheritage.org/2022/02/10/building-bridge-to-the-software-heritage-archive/ We have kicked off work on this! I hope once it wraps up, we will be able to a make a tighter case for Nix and IPFS, and open an RFC, so our stuff from 2020 can finally get merged on an experimental basis.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/should-ipfs-be-used-as-a-source-for-fetchurl-in-nixpkgs/16312/19

@Ericson2314
Copy link
Member

NixOS/rfcs#133 we have an RFC for finally getting some or all of our work merged upstream.

@stale stale bot removed the stale label Sep 7, 2022
@fricklerhandwerk fricklerhandwerk added feature Feature request or proposal performance UX The way in which users interact with Nix. Higher level than UI. labels Sep 12, 2022
@lemanschik
Copy link

oh i am happy that you understand that some one pointed me here i can tell you this is closed by

to be more exact the linked PR: incremental refactoring to web-modules of vscode / code-oss it will integrate a distributed build cache on web scale via p2p i also superseeded the IPFS Standard via web-modules a interplanetary module system that is *nix compatible i will update you all soon sorry i am alone.

@roberth roberth added significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. store Issues and pull requests concerning the Nix store fetching Networking with the outside (non-Nix) world, input locking labels Jun 2, 2023
@malikwirin
Copy link

Content addresed nix will be an important step for this

@lemanschik
Copy link

@malikwirin at last content addressed packages before build as not all builds are deterministic. That means when you build the same source 2 times you get a other binary. So you need to link the build output to the content that created it and that is done via git already as git works with 2 hashes one times the blob hash which is the content hash and then the commit hash that varies.

@malikwirin
Copy link

@malikwirin at last content addressed packages before build as not all builds are deterministic. That means when you build the same source 2 times you get a other binary. So you need to link the build output to the content that created it and that is done via git already as git works with 2 hashes one times the blob hash which is the content hash and then the commit hash that varies.

And in the flake.lock it uses the blob hash?

@lemanschik
Copy link

lemanschik commented Dec 1, 2024

@malikwirin nope that lock file references a commit hash but that leads to the correct blob hashes. the commit hash references a tree of commits that can get turned into a tree of blob hashes.

so the nix internal lock does use the commit hash but it can get easy turned into blob hashes and it saves some space my research shows that this pattern is useable for packaging as you can infer the real blob content hashes. via walking the git tree.

so long story short nix is content addressable at last the most parts. in cases where it builds files that are not part of the inital git repo it is not content addressed. And i am not sure if such files should get even content addressed in general.

Using the repo tree commit hash is a nice way to have optional content addressed nix but it does not end there!!!

In my research i use always BTRFS which does also content hashing and has configureable checksums other filesystems also use checksums so they are also content addressed.

My goal is a Universal Content Addressing on block level and none block level and it is possible.

But this Nix IPFS issue or lets call it discussion is useless.

The Main goal of this discussion is to get faster builds. And Faster builds can only get done on the buildsystem level because of variations. The solution for faster builds is a LLM or a NNCC Neuronal Network Compression Container. I am working on such Stuff to speed of builds of Software like the whole Google Stack eg ChromeOS Chome Android LaCrOs which are indeed based on a lot of shared components. Inside google that gets maneged via the so called Goma Distributed Build Backend.

Goma is exactly what this Issue is trying to replicate it wants to save time via storing already build parts and only build what is needed for the change.

Hope that makes some sense

Current Efforts World Wide

  • Distributed shared builds via efforts in all build tooling around the world
  • My AwesomeOS effort (LLM+NNCC) to build Stuff from scratch directly via LLM output (ASM + optional c bindings if a operating system is already installed else only ASM)

Related Issues

Incremental Path forward

To get the nix IPFS effort or Distributed build effort complet throwing in git-annex would be the next step to handle files that a binary or none tracked by normal git

https://git-annex.branchable.com/

@ThibaultLemaire
Copy link

Hi, and sorry to barge into this issue that I have only been quietly following (for some years now), but I fundamentally disagree with @lemanschik and don't want to see the discussion get side-tracked or down-right poisoned.

First and foremost: Calling this issue "useless" to boast about your own solution is just impolite. Please refrain from doing that, or at least mind your language (here or in any other project).

Judging from the figures on this issue, at least 3 people have worked on this, 35 others have participated in the discussion, and a 100 more cared enough about this issue to leave a positive reaction to it. I can understand that you believe your own work makes that irrelevant, but 1) there are better ways to suggest it and 2) you may very well be mistaken.

The Main goal of this discussion is to get faster builds

I disagree, both in form, and substance.

In form: You have worded this as if you have authority over the fact, while this is only what you think is the main goal. So please just indicate that. "I think the Main goal of this discussion is to get faster builds." or "I believe", or "To me", etc. This will let other people discussing it agree or disagree, and the people actually working on it gently correct you if that is not the case.

In substance, having faster builds might be a side-effect of using IPFS with nix, but to me it's definitely not the whole point at all. The reason I'd like to see nix + IPFS happen is to decentralize the current build infrastructure, and empower smaller teams and projects to maintain their own set of nix derivations.

https://cache.nixos.org/ is nice and all, but what if I want to make and distribute my own piece of software via nix, independently from nixpkgs? (Or what happens the day money to maintain Hydra runs out? Or what if I want a very old binary that got evicted from the cache? Or what if I want to maintain x86 builds, or ppc builds for very old hardware?) Flakes are awesome but they mean my users will have to recompile every time, and having my own cache server and having my users set it up properly isn't exactly as easy as nix flake run.

So, to me, the point of nix + IPFS is mostly to diversify the sources of binary caches and maybe alleviate some network traffic and disk usage from https://cache.nixos.org/.

@ThibaultLemaire
Copy link

ThibaultLemaire commented Dec 1, 2024

The less polite (and more political) reply

So, you see, even if you had a better solution for faster builds, that is not even the point. Now, about your solution.

I'm sorry but I'm having a hard time even understanding what it is exactly. I try not to shame people on how well they speak English, and if the rest of your comment hadn't pissed me off, I would probably have tried harder. But my general impression is a bunch of tangentially related lingo thrown around and barely articulated into a whole that remains completely fuzzy to me. (Also... Where are your commas? You know, this thing: ',' Is your key broken or something?)

And then you dropped the LLM nonsense. And this is when I knew I had to be harsh:

STOP USING AI

I know, AI is cool. I was the first to get curious and play with it as well.

But so are cars. The difference between cars and AI is that every aspect of our society has yet to rely on AI. It is not too late to forget it (or at least think 7 times before using any of it).

  • AI is yet another environmental foot-gun. It has enormous direct eCO2 emissions and water consumption, and even greater indirect ones (The making and throwing away of dedicated hardware for data-centers). And that's not even talking about metals and rare earths consumption (none of which are re-used, re-cycled, or even re-cyclable).
  • AI is trained by the unwilling work of billions of unpaid (or under paid) workers
  • AI is, and will continue to be a tool of control. Only Big Tech companies have the means (monetary and otherwise) to drive this domain, the rest of the world is condemned to use their tools on their terms.
  • AI is organized theft. Entire communities see their work appropriated by the platform they're using to train an AI model that will reproduce their work without credit or compensation. (Yes, I'm looking at you Github.)
  • etc.

I strongly encourage you to make your own research.

(This is a separate comment because it expresses my own political views and I don't want to mix that up with my previous points. Down-vote it or flag it at will.)

@malikwirin
Copy link

malikwirin commented Dec 2, 2024

The Main goal of this discussion is to get faster builds.

@lemanschik not at all for me. As far as I am concerned coming from Gentoo build times are already fast enough.

My interest in IPFS is increasing decentralization to secure community control in foss. So we depend less on centralized git and cache hosting.

For decentralizing git hosting Radicle might be a promising solution that is already compatible with Nix when using a http gateway.

@lemanschik
Copy link

ok my failure sorry for the confusion. If you all think that IPFS solves your caching mirroring issues then i am down with that. I see no way that it will work or look good in real life but if so many people agree on that i will not be the blocker.

and yes git mirroring is out of my view the right way to do so and my git-annex solution that i proposed would solve that needs. It is exactly made for the case as it mirrors LFS data tracked via Git.

i will not write here anymore as i see that IPFS is a thing in your world. I accept that. Hope to see you soon running IPFS nodes :)

@malikwirin
Copy link

@lemanschik please don't think your input is absolutely useless. We also need critics to improve our ideas.
I personally would not use the word "mirror". It implies there is a centralized source of truth thats more legitimate than the decentralized source. I come from a point, where I wish for Foss communities to prepare for possible Cyber-Balkanization or Splinternet and increased censorship.

I am also against working on IPFS compatibility for the sake of it. And I am very interested in other possible solution to achieve the goals of increased resilience through decentralization.

Git-Annex is also something I am currently looking into.
How is the compatibility between git-annex and IPFS?

@lemanschik
Copy link

@malikwirin you can also call it fork and not mirror every mirror is also a fork but it implicates that it gets auto updated from some where and does not get own changes thats true.

About your needs i did think about that for many years and the most importent things that i discovered are DNS as you always need a way to find a entrypoint you will always need some kind of distributed DNS system and the DNS system can also return dynamic entry's with nodes that got the content. DNS also replaces the need for hashed urls you can directly get blocks or block ranges via subdomain patterns.

I also identified WebRTC as the only real working P2P Protocol as it has decoupled ICE Internet Connection Exchange and is text based. So can get done even via external devices with the Image AI evolution this is more easy then ever to exchange Connection details and it works cross Router (Nat) Boundary's and so on.

To archive your goal you would need a dns system eg dns over https or dns plain and so on + webrtc txt entries to exchange the ICE Data.

Thats the only real true decentralized System that would work where we can sync our dns entries to build a global scale content index that is also accessable via good names if neeeded.

@malikwirin
Copy link

@lemanschik the term fork also implies a centralized source of truth

the contend addressing of ipfs should make the dns problem obsolete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal fetching Networking with the outside (non-Nix) world, input locking performance significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. store Issues and pull requests concerning the Nix store UX The way in which users interact with Nix. Higher level than UI.
Projects
None yet
Development

No branches or pull requests