ArcadeSocial

NIP-32 creates a robust label system by which public keys (identities) or objects may be labeled or rated by others. This is compelling because it allows for a decentralized reputation system that may be applied to anything on the nostr protocol. Arcade Labs contributed to this NIP, and the improvements made on it are very welcome.

However, reputation events are not spam/Sybil resistant. When ratings are essentially free to create, they can be problematic when bad actors get involved.

Luckily, we see an easy solution to this problem, and we have built it into our open-source nostr library, arclib.

When it comes to determining the reputation of a person (pubkey) based on NIP-32 ratings, we can overlay the ratings onto your social graph and weight the ratings according to the rater's social graph distance from you. Suddenly, the only ratings that really matter are your own, your friends, and your friends' friends — in that order.

Arcade uses this social graph rating system to enhance NIP-32 labels to produce the first sensible, simple, and effective decentralized rating system on nostr, and perhaps the internet.

If you're interested in using this system, which we call ArcadeSocial, you can include it in your project from the arclib repository on GitHub.

In Arcade you'll see that every user now has a reputation next to their name. This reputation is comprised of all the ratings that user has received overlaid on your personal social graph. Therefore, an individual's rating will be different to almost everyone. That's OK. It's personalized to you based on your social network, and it updates automatically.

Here are the technical details of how we implement ArcadeSocial.

ArcadeSocial Specification

NIP-32 Ratings

Here is an example of a rating event that assigns a rating to another user (evaluee) using NIP-32:

{ 
    kind: 1985,
    tags: [
      ["L", "city.arcade"],
      ["l", "trade", "city.arcade", "{\"quality\":" + <quality> + "}"],
      ["p", <evaluee pubkey>],
      ["e", <completed trade event>]
    ],
    content: <optional message>
}

<quality> is a number between 0 and 1 inclusive
<evaluee pubkey> is the hex public key that you are rating
<optional message> is a public message that goes with your rating
<completed trade event> is an event ID that represents the completed trade

The first l tag parameter is trade, which is the rating label that the quality amount is applied to. In Arcade, the trade label is used to evaluate your counterparty after the trade has been completed.

The L tag references the Arcade nomenclature, city.arcade, which can be used by anyone — even other apps. It is defined below:

Arcade City Nomenclature for NIP-32

The "L" namespace tag must be city.arcade to specify this nomenclature.

The official Arcade City nomenclature includes the following labels:

trade: quality between 0 and 1 representing the evaluee's desirability as a counterparty.
social: quality between 0 and 1 representing the evaluee's social conduct. A general reputation label.

Additionally, the Arcade City nomenclature specifies that unofficial labels may also be used under this nomenclature, but they must follow these rules:

The label must be in English.
If the label is a noun, it must be singular. Ex: party, not parties
If the label is a verb, it MUST be either a verb (bare infinitive, no "-ing"). Ex: trade, not trading.

New labels may be added to the official Arcade City NIP-32 Nomenclature at any time, and PRs are welcome to make suggestions.

Quality Scoring Mechanisms

The quality value of the rating between 0 and 1 is a numeric reflection of the user's assessment of their counterparty in the context of a trade. Arcade uses a unique system to gather this assessment and create a gamut of scores of a certain distribution.

When a trade is completed and no kind 1985 rating event is detected for <completed trade event> the user will be presented with the option to rate their counterparty.

Below we discuss our QTS rating system and why it is better than a traditional 5 Star rating system.

The "Qualitative Thumb System" (QTS)

The rater is first asked to provide a "thumbs up" or "thumbs down" on the counterparty to indicate their general satisfaction. As per NIP-32, the resulting score that is used for the quality must be between 0 and 1 inclusive, so the "thumbs up/down" is interpreted like this:

Thumbs up: +0.50
Thumbs down: +0.00

Then, there are 5 bonus toggle-buttons available for the user to interact with. Each button that is toggled ON will add +0.10 to the quality score:

Friendly
Responsive
Expert
Good Value
Flexible

This means there is a maximum score of 1.00 if all 5 toggle-buttons are ON with a Thumbs Up.

The practical result of this kind of rating system is this:

Any score 0.50 or higher has a positive sentiment.
Any score less than 0.50 has a negative sentiment.
Scores significantly higher than 50% are "extra good" because they were achieved by bonuses (bonus toggle buttons)
Scores slightly less than 50% are still acceptable. These would be seen after an average of some low scores and some medium/high scores.
The numbers behind this scoring system should not be shown to the end user because this will lead to misinterpretations. People do not normally think of a 50% average rating as "good". The resulting reputation of a user will not be displayed as a number.

This kind of rating system makes the range of desirable ratings as large as possible: from 50% to 100%. This equalizes the balance between negative and positive sentiment, allowing users to not only compare positive ratings, but also to compare positive and negative ratings (eg "Is this good rating stronger than this bad rating?")

Most importantly, it effortlessly translates the qualitative experience of a trade into something quantitative and meaningful while liberating the user from having to do the work of translating their feelings into a number.

We'd like to demonstrate why we think this system is better than a traditional rating system.

The 5 Star Average Rating System Sucks

We assert that the QTS rating system is better than traditional 5 Star rating systems used by Uber, Lyft, Amazon, iTunes, Google, and many other products and companies, for the following reasons.

The majority of users tend to leave only 5 stars or 1 star.

This behavior can be observed anywhere you find a 5 star rating system, and this tendency leads to very bipolar aggregate rating. One 1-star rating can bring down a reputation significantly unless there are many (thousands) of 5-star ratings.
The 5 Star rating system weights the majority of ratings as "negative" sentiment.

If you translate a 5 star rating into a percent, 0% to 90% are generally regarded as "unsatisfactory". This is exemplified by the "4.5 stars+" search filter on Amazon, or the Uber policy of suspending drivers when they fall below a 4.6 star rating. Giving a driver a 1-star rating on Uber basically ends their career instantly.
The 5 Star rating system does not enable you to clearly distinguish between "good" ratings.

If a rating between 0 and 4.4 stars is generally "bad", then that only leaves ratings between 4.5 and 5 to represent all "good" ratings. Humans aren't great at making sense of super small fractional numbers, and this thin sliver of data makes it difficult to tell how good one rating is versus another.

The result of this is that everything is basically either varying degrees of "bad" or simply "good" with very little nuance. This is good for the purveyor of the rating system (eg Amazon) because they want their customers to be generally satisfied but they don't want customers to have the data to distinguish between vendors.

Most 5 Star rating systems could simply be replaced with a pure thumbs up/down average to roughly the same effect.

Social Graph Reputation

Once a user has been rated, their reputation may be shown to other users by processing all their ratings. However, it is trivial to spam ratings, so ArcadeSocial utilizes your social graph to make sense of the ratings for a given user.

In ArcadeSocial, ratings are given a weight based on how close they are to you in your social graph. Your own rating has the heaviest weight by an overwhelming amount. Your friends' ratings will be weighted next heaviest. Then, your friends' friends' ratings will be rated next heaviest. And anyone beyond that will have their ratings weighted the least heavy, making them mostly ineffectual.

The practical result of this is that the people you follow, or presumably trust, are the people whose opinions you care about the most. It is quite unlikely that spammers are people you follow, and therefore their ratings are squashed into oblivion.

Here is the algorithm:

Collect all 1985 rating events for the evaluee pubkey, optionally filtered by nomenclature L and/or label l:

// in this example we are evaluating a user's `trade` rating.
filter: {
  kinds: [1985],
  "#p": ["<evaluee pubkey>"],
  "#L": ["city.arcade"],
  "#l": ["trade"]
}

For each event author, filter only the most recent event by created_at. The following steps only operate on the remaining events.
For each event, parse the quality as a float as score. Discard events with unparseable values or values outside of 0-1 range.
For each event, get the social graph distance between yourself and the event author as distance. If the distance is not known or calculable, the distance is set to 3.
For each event, calculate ( 1 / distance ** 2 ) as weight. If distance is 0 (you to yourself), weight is set to 10,000.
For each event, calculate weight * score as scaledScore.
Calculate the sum of all weights as weightSum
Calculate the sum of all scaledScores as scaledScoreSum
Calculate scaledScoreSum / weightSum as weightedAverageScore.
The weightedAverageScore is the social graph reputation of the rating.

Distance

You have 0 distance to yourself.

You have 1 distance from any pubkey you follow ("friends").

You have 2 distance from any pubkey you don't follow but your friends do follow.

You have 3 distance if your friends do not follow. 3 is the maximum distance. If the pubkey is not in the graph, they get a distance of 3 automatically.

Building a social graph to accommodate this only requires a depth of 2 (level 1 is your friends, level 2 is who your friends follow). Building a social graph deeper than 2 levels is unnecessarily expensive for negligible gain.

The Score

The resulting reputation score will be a minimum of zero and may actually be above 1. A score of 1 could be considered "fully trustworthy", so any score above 1 may be considered bonus reputation. Practically speaking, we only care about the gradient between 0 and 1.

The reputation score number, as mentioned earlier, should not be shown directly to the end user. Instead, Arcade presents the score as badge in a series of badges.

reputation icons stacked

This icon is positioned next to the user's username/NIP-05 identifier.

Rank

Instead of using the numeric social reputation, we use an icon that represents a ranking of the user through a static hierarchy of symbols. We may decide to give these symbols names at some point, solidifying their status as rankings in the Arcade ecosystem.

It is important to show users this symbol hierarchy if they click onto a ranking so that they can understand where a symbol falls on a general gradient from "bad" to "good" without showing the users numbers that may improperly alter their perception of the ranking.

Color gradients and other thematic design elements may also be implemented as a non-numeric way to aid in the communication of one's rank, in addition to the symbols.

Level

An additional metric to assess other users is by their number of completed trades. Using each trade as 1 experience point, we can create a level system where Arcade users can level up! This level will be shown as a number for their currently attained level, as well as a progress bar to show their progress toward their next level.

This level and progress bar will be shown next to the rank icon under the username.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly