msc placeholder: "Swarming LLM": decentralised artificial intelligence #7633

synctext · 2023-10-16T08:59:28Z

open master thesis project. Looking for student(s). Warning: Honor level challenge - master thesis project. ⚠️ ⚠️ ⚠️ Even post-doc or phd student topic level!

Goal: Collective intelligence based on a swarm of LLMs with true full decentralisation.

Outcome: Operational LLM swarm on mobiles. Operational decentralised LLM technology on a swarm of smartphones without any server, coordinator, or owner. Collective intelligence with Bitcoin and Bittorrent level of decentralisation, unstoppable. Combine the state-of-the-art LLM technology with the latest understanding of self-organising collective intelligence from biologists.

Intrinsic decentralisation. The fundamentals of biological-grounded intelligence indicate that intelligence is intrinsically decentralised. Read about decentralised swarm robotics. Very advanced bio-focused reading background: The collective intelligence of evolution and development. This advanced reading also shows that evolution is more intelligent than we realised - Connectionist models of conventional learning. Finally, this reading presents the detailed mechanism of decentralisation within biological creatures and the role of Credit assignment in individuals and collectives.
On-device LLM. Small model such as GPT-2 fit on a swarm of mobile phones. See tutorial about On-device text generation app using GPT-2. More reading of on-device LLM using TFLite, converting-dialogpt-gpt-2-to-tensorflow-lite.ipynb
Swarming LLM. Beyond Federated learning. The most difficult challenge of this master thesis project is collective learning. On-device LLMs are currently the cutting edge of science. The next step has not yet been explored: a collective of On-device LLMs without any central coordinator or {federated} server. Some pathfinding work in this direction is the distributed machine learning by @quintene. His pioneering work shows that adding new vectors to the learning space is not supported by the Google Research code used with TFLite. Ongoing challenge. Task: apply his technique for distributed machine learning of recommendations to state-of-the-art on-device LLM.
Pairwise training data exchange. Collective learning emerges from two agents exchanging new data. Pairwise data exchange without central coordination, without a giant dataset in the cloud using merely asynchronous messages is hard. High quality (distributed) training datasets are an unsolved challenge. Continuous learning by humans is probably required, giving new life to crowdsourcing ('human feedback') in the age of AI. See example, fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU. Broad context, Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning. Related approach, Learning Compressed Embeddings for On-Device Inference. Asynchronous (thnx @devos50!); slow training nodes should not hold back others: Asynchronous Decentralized Online Learning also Federated Learning with Buffered Asynchronous Aggregation.
Swarm networking. True swarming is self-organising. LLM technology traditionally relies critically on the cloud. Servers are essential to the functioning of such systems. This architecture is the complete opposite of Bitcoin and Bittorrent which are designed to eliminate the need for any intermediaries for financial transaction and content dissemination. Connecting 4G, 5G and wifi devices into a global network without any servers is unsolved. No DNS usage, no relay proxies, no STUN server, no TURN server. A pure autonomous swarm which does not rely on donated servers. See survey and experimentation with carrier-grade NATs of 5G networks. @OrestisKan has conducted the first successful birthday attack on the Vodaphone 5g network. This work will mature into an easy-to-use IPv8 library and be ready for integration at end of 2023.
Trustworthy swarming AI. Trust is beyond the scope of this thesis and reserved as a medium future challenge. This future work is to integrate ConTrib: Maintaining fairness in decentralized big tech alternatives by accounting work, tamper-proof accounting with IETF spec, and tollerance against the Sybil attack (MeritRank: Sybil Tolerant Reputation for Merit-based Tokenomics). Further info, read our work "Towards Sybil Resilience in Decentralized Learning"
Stop hallucinations. Future and ongoing work. LLM needs to be loaded with real facts, real place, real events, and real URLs + real-time updates (see our survey on "augmenting LLMs with knowledge"). Ongoing proof-of-principle by adding 25k Youtube URLs to an LLM using the unembedding matrix. Note - this is a ridiculous disruptive research direction (see startup in this area). If LLMs learn about URLs at scale, stop hallucinations, and support lifelong learning all sorts of services will be disrupted. Swarming LLM with URL understanding is an alternative for the Google search engine, Youtube video service, TikTok For You Page, Spotify (requires our artist investment token + shared investment FROST wallet and even Netflix (along with the "Web3 economics" model we have prototyped), and music discovery.

required reading

Primitive decentralised recommender from 2005: P2P-based PVR Recommendation using Friends, Taste Buddies and Superpeers
Pioneering collective learning from 2011: Gossip Learning with Linear Models on Fully Distributed Data
2012 running code of "distributed artificial intelligence": Distributed Machine Learning Using the Tribler Platform
2023 running lifelong learning code using our IPv8 the paper: Bridging the Gap between Federated and Decentralized Learning with Decentralized Sampling
2023 work by EPFL: Get More for Less in Decentralized Learning Systems

The text was updated successfully, but these errors were encountered:

synctext added the type: MSc Thesis Work label Oct 16, 2023

synctext added type: Epic type: project idea component: knowledge labels Feb 7, 2024

qstokkink removed the type: Epic label Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

msc placeholder: "Swarming LLM": decentralised artificial intelligence #7633

msc placeholder: "Swarming LLM": decentralised artificial intelligence #7633

synctext commented Oct 16, 2023 •

edited

Loading

msc placeholder: "Swarming LLM": decentralised artificial intelligence #7633

msc placeholder: "Swarming LLM": decentralised artificial intelligence #7633

Comments

synctext commented Oct 16, 2023 • edited Loading

synctext commented Oct 16, 2023 •

edited

Loading