Skip to content

Commit

Permalink
Update readme (#189)
Browse files Browse the repository at this point in the history
  • Loading branch information
SHAcollision authored Nov 12, 2024
1 parent b83c144 commit 2abd7f8
Showing 1 changed file with 92 additions and 91 deletions.
183 changes: 92 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,133 +1,134 @@
![Integration Tests](https://github.com/pubky/pubky-nexus/actions/workflows/test.yml/badge.svg?branch=main)

# Pubky-Nexus
# Pubky Nexus

The Nexus between Pubky homeservers and Pubky-App social frontend. Pubky Nexus constructs a social graph out of all of the events on pubky-core homeservers and exposes a social-media-like API capable of powerful Web-of-Trust inference.
Pubky Nexus is the central bridge connecting Pubky homeservers with Pubky-App’s social clients. By aggregating events from homeservers into a rich social graph, Nexus transforms decentralized interactions into a high-performance, fully featured social-media-like API. It's designed to support Social-Semantic-Graph (SSG) inference, and more.

## 💻 Development Roadmap
## 🌟 Key Features

### [Pubky Backend Development Roadmap](https://github.com/pubky/pubky-app-backend/issues/1)
- **Real-time Social Graph Aggregation**: Nexus ingests events from multiple Pubky homeservers, generating a structured social graph in real time.
- **Full-Content Indexing**: Nexus serves content directly, improving latency and user experience. Clients do not need to locate homeservers to retrieve content unless they wish to perform content attestation. We also envision a light-weight Nexus mode that merely point clients to homeserver locations using pubky uris.
- **High Performance & Scalability**: Built in Rust, Nexus is optimized for speed and efficiency, handling complex social queries across distributed systems with minimal latency.
- **Powerful Social Semantic Graph**: Nexus supports SSG-based interactions, fostering secure and trusted connections between users.
- **Graph-Enhanced Search & Recommendations**: Nexus leverages Neo4j to offer deep insights, relationships, and recommendations based on the social graph.
- **Flexible Caching Layer**: A Redis cache accelerates access to common queries and minimizes database load, ensuring a smooth user experience. Most requests can be served in less than 1 ms at constant time complexity with respect number of users.
- **Rich Observability**: Easily explore the indexed data using [Redis Insight](https://redis.io/insight/) and visualize the graph database with [Neo4j Browser](https://browser.neo4j.io/).

## 🏗️ Objective for Alpha v0.1.0 Milestone
## 🌐 Accessing the API

Reach feature parity with `skunk-work` indexer improving on the following:
> ⚠️ **Warning**: The API is currently **unstable**. We are using the `/v0` route prefix while the API undergoes active development and changes. Expect potential breaking changes as we work toward stability.
1. High performance: no inefficient lookups, maximum normalization, maximum atomic indexing, full async, full multi-thread, rust performance.
2. Clear vision forward: simplify the implementation of exciting future features: WoT, graph queries, etc.
3. Free of bugs: hopefully.
4. Cleaner dev experience.
5. Moden stack.
6. Excellent observability (browse over our indexes with [redis-insight](https://redis.io/insight/) or graph with [neo4j-browser](https://browser.neo4j.io/))
Nexus provides a REST API, accessible via Swagger UI:

## 🏠Architecture
- **Staging API** (latest): [https://nexus.staging.pubky.app/swagger-ui/](https://nexus.staging.pubky.app/swagger-ui/)
- **Production API** (current): [https://nexus.pubky.app/swagger-ui/](https://nexus.pubky.app/swagger-ui/)

![pubky-nexus-arch](docs/images/pubky-nexus-arch.png)
You can explore available endpoints, test queries, and view schema definitions directly within Swagger.

## 🏗️ Architecture Overview

Nexus is composed of several core components:

- **service.rs**: The REST API server for handling client requests, querying databases, and returning responses to the Pubky-App frontend.
- **watcher.rs**: The event aggregator that listens to homeserver events, translating them into social graph updates within the Nexus databases.
- **lib.rs**: A library crate containing common functionalities shared by `service` and `watcher`, including database connectors, models, and queries.

- **service.rs**: binary that serves REST request to the pubky-app clients reading from our DBs.
- **watcher.rs**: binary that subscribes to homeservers and populate our DBs
- **lib.rs**: library crate with all of the common functionalities (connector, models, queries) needed for `watcher` and `service`
### Data Flow

1. The watcher does effectively work as an aggregator (a translator from Homeserver events to a social network graph).
2. The service reads from the indexes and performs queries to the graph in order to serve responses to the pubky-app clients.
3. As long as it is possible and not too troublesome, most relationships, query results and cache should be indexed by `key: value` and retrieved from Redis. We should take inspiration on current use of LMDB in `skunk-works` (a lot of things can be done using plain `key: value` but some are too troublesome to implement: then we query our graph directly)
4. The social graph DB (Neo4J) is intended for holding a complete view of the network. It should be queried as little as possible but we can abuse it at the beginning in order to complete features faster.
![pubky-nexus-arch](docs/images/pubky-nexus-arch.png)

1. **Event Ingestion**: The watcher ingests events from Pubky homeservers and indexes them into our social graph.
2. **Indexing and Caching**: Relationships, common queries, and graph data are cached in Redis for high-speed access. Complex queries use Neo4j.
3. **API Responses**: The service server reads from these indexes to respond to client queries efficiently.

![pubky-nexus-graph](docs/images/pubky-nexus-graph.png)

## ⚙️ Preparing the Environment
Nexus graph schema.

Before running the project, several configurations must be set up. Let’s start by configuring the databases
## ⚙️ Setting Up the Development Environment

```bash
cd docker
# Create a new `.env` file from the `.env-sample` template
cp .env-sample .env
# Run the databases (Neo4j and Redis databases will spin up empty)
docker-compose up -d
# Populate the graph database with initial data
docker exec neo4j bash /db-graph/run-queries.sh
```
To get started with Nexus, first set up the required databases: Neo4j and Redis.

### 1. Configure Databases

1. Clone the repository and navigate to the project directory.
2. Copy the environment template and set up the Docker environment:

```bash
cd docker
cp .env-sample .env
docker-compose up -d
```

3. Populate the Neo4j database with initial data:

```bash
docker exec neo4j bash /db-graph/run-queries.sh
```

Once the `Neo4j` graph database is seeded with data, the next step is to populate the `Redis` database by running the _nexus-service_

> If the Redis cache is empty, the nexus-service will handle it automatically. If not follow the steps of warning section

4. Run the Nexus service:

```bash
cargo run
```
5. **Access Redis and Neo4j UIs**:
- Redis UI: [http://localhost:8001/redis-stack/browser](http://localhost:8001/redis-stack/browser)
- Neo4J UI: [http://localhost:7474/browser/](http://localhost:7474/browser/)

## 🚀 Contributing

To contribute to Nexus, follow these steps:

1. **Fork the Repository** and create a feature branch.
2. **Write Tests**: Ensure new features and changes are tested and benchmarked.
3. **Submit a Pull Request** and provide a description of the changes.

### Running Tests

To run all tests:

```bash
cargo run
cargo test
```

## 👨‍💻 Quick Development Setup

To enable auto-rebuilding and testing while developing within the `/service`:
To test specific modules or features:

```bash
# Install `cargo-watch` to monitor changes and auto-rebuild on save
cargo install cargo-watch
# Ensure the environment variables are set. You might have already done this in the previous step:
cp .env-sample .env
# Run the service and tests in separate terminals:

# Terminal 1: Start the service with auto-reload on changes:
cargo watch -q -c -w src/ -x "run --bin service"
# The service will be available at localhost:8080/v0/info on your browser

# Terminal 2: Run tests (note that for tests to pass, the Neo4j instance must have example data)
# Ensure you've followed the steps above to set up Neo4j with the example dataset
cargo watch -q -c -w tests/ -x "test -- --nocapture"
# Test specific folder of a domain
cargo test watcher:users
# Test specific test
cargo test test_homeserver_user_event
# Run benchmarks (e.g., get user by ID benchmark)
cargo bench --bench user get_user_view_by_id
```

## Developing the homeserver watcher

Running the `/tests/` that require the homeserver does not require running a homeserver. However, running the playground or the `watcher.rs` binary does. This is how you can run a pubky homeserver locally in testnet mode.

We are using `pubky` repo as a git submodule of `pubky-nexus`, given that `pubky` is still a private repository and the crates for the client and homeserver are not yet published.
**Benchmarking**:

```bash
git submodule init
git submodule update --init --recursive
cd pubky/pubky-homeserver
cargo run -- --testnet
cargo bench --bench user get_user_view_by_id
```

Take a look at the logs for

1. `testnet.bootstrap=["127.0.0.1:6881"]`
2. Your homeserver listening url `http://localhost:15411` and
3. the pubky URI `pubky://8pinxxgqs41n4aididenw5apqp1urfmzdztr8jt4abrkdn435ewo` and make sure your `.env` has the correct settings
## ⚠️ Troubleshooting

## ⚠️ Warning
If tests or the development environment seem out of sync, follow these steps to reset:

There are scenarios where the **integration tests** might fail. This typically occurs when new changes are pulled from the repository, as the schemas for our indexes may have changed, or when the database data is out of sync with the current integration tests. To resolve this, you need to reset the Neo4j graph database and Redis cache, and then re-seed them with the correct data. Follow these steps:
1. **Reset Neo4j**:

### Real time explore the databases
```bash
docker exec neo4j bash -c "cypher-shell -u neo4j -p 12345678 'MATCH (n) DETACH DELETE n;'"
docker exec neo4j bash /db-graph/run-queries.sh
```

```bash
# Run the following Cypher query to remove all nodes and relationships in the database
docker exec neo4j bash -c "cypher-shell -u neo4j -p 12345678 'MATCH (n) DETACH DELETE n;'"
# Re-populate the database with the correct dataset
docker exec neo4j bash /db-graph/run-queries.sh
# Set the REINDEX environment variable to true for the reindexing process
REINDEX=true
# Start the reindexing process
cargo run
# After reindexing, set REINDEX to false to prevent reindexing on every build
REINDEX=false
```
2. **Re-index Redis Cache**:

In some cases, compilation might fail due to issues with the dependency in the pubky repository. To resolve this, run the following command:

```bash
git pull --recurse-submodule
```
```bash
REINDEX=true cargo run
```

## Useful links
## 🌐 Useful Links

- Swagger UI: http://localhost:8080/swagger-ui/
- Redis: http://localhost:8001/redis-stack/browser
- Neo4J: http://localhost:7474/browser/
- **Swagger API**:
- Staging: [https://nexus.staging.pubky.app/swagger-ui/](https://nexus.staging.pubky.app/swagger-ui/)
- Production: [https://nexus.pubky.app/swagger-ui/](https://nexus.pubky.app/swagger-ui/)
- **Local Redis Insight**: [http://localhost:8001/redis-stack/browser](http://localhost:8001/redis-stack/browser)
- **Local Neo4J Browser**: [http://localhost:7474/browser/](http://localhost:7474/browser/)

0 comments on commit 2abd7f8

Please sign in to comment.