Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scittgit: statement URN used for indexing #16

Open
1 of 5 tasks
johnandersen777 opened this issue Nov 29, 2024 · 4 comments
Open
1 of 5 tasks

scittgit: statement URN used for indexing #16

johnandersen777 opened this issue Nov 29, 2024 · 4 comments

Comments

@johnandersen777
Copy link
Contributor

johnandersen777 commented Nov 29, 2024

We need a way to say "these are my identities"

This way any instance you query returns transparent statements (and payload) which allow resolution across federated instances. This enables query in isolated or segmented networks regardless of protocol.

Use insert transparent policy and GitHub Actions schema exec SCITT policy engine to verify did:plc or other ownership on insert. We can use this to verify the post used as the users index without needing to pin it. This will become our decentralized indexing mechanism.

  • SCITT URN of index statement becomes like how we post / use PGP fingerprints
    • This signed statement URN functions as the root node via the payload for OpenSSF S2C2F ING-4 discovery across trust boundaries.
    • SCITT instances federate statements which are aligned with insert policy which will verify identities of each signer, to prove that is their index root.
      • The insert policy on this instance is like your ING-4 keybase verifier
    • docs: adrs: governance: Policies #8
      • We use in repo upstream.yml policy for each branch to define who we federate with
        • Instead of keys we use the URNs of each owners root index
  • TODOs
  • Future
    • SCITT instance who we trust due to being attested, that way we trust it as the identity verification and discovery service for within the trust boundary based on the software it's running and the policy it runs.
    • For subjects (feeds) we may want to use a data type (com.atproto.x.y.z style) as the subject. This would mean a URN for a payload of the URN of the thing we want to reference + the data type string identifier.
    • Use the way we push git internal files to push files stored within repos, this becomes the basis for our policy engines. This is important because we build the BOM tree / workflow dep tree (workflow graph and action sources) which are consumed by policy engines across federated sets of instances. Once this is complete we'll start on exec of polices stored in this format.
@johnandersen777 johnandersen777 transferred this issue from publicdomainrelay/gitatp Nov 29, 2024
@johnandersen777
Copy link
Contributor Author

johnandersen777 commented Nov 30, 2024

$ python -u scitt_emulator/create_statement.py --out statement.signed --subject test --content-type 'text/plain' --payload test
urn:ietf:params:scitt:signed-statement:sha256:base64url:Q9OWRUhmW_ZOPTU3yaau9AfCGxPHO2QQ8fuP0dztdnw=
diff --git a/scitt_emulator/create_statement.py b/scitt_emulator/create_statement.py
index da1c6da..4caea85 100644
--- a/scitt_emulator/create_statement.py
+++ b/scitt_emulator/create_statement.py
@@ -2,6 +2,7 @@
 # Licensed under the MIT License.
 import base64
 import pathlib
+import hashlib
 import argparse
 from typing import Union, Optional, List
 
@@ -169,6 +170,26 @@ def create_claim(
     if private_key_pem_path and not private_key_pem_path.exists():
         private_key_pem_path.write_bytes(key_as_pem_bytes)
 
+    # https://github.com/TimothyClaeys/pycose/blob/e527e79b611f6cc6673bbb694056a7468c2eef75/pycose/messages/sign1message.py#L66C9-L79
+    msg.signature = b""
+    # https://github.com/TimothyClaeys/pycose/blob/e527e79b611f6cc6673bbb694056a7468c2eef75/pycose/messages/cosemessage.py#L143
+    claim = msg.encode(tag=True, sign=False)
+
+    # https://www.ietf.org/archive/id/draft-ietf-scitt-architecture-10.html#appendix-B.2-5
+    # signed statement and statement are identical AFAIK
+    message_type = "signed-statement"
+
+    hash_name = "sha256"
+    hash_instance = hashlib.new(hash_name)
+    hash_instance.update(claim)
+
+    base_encoding = "base64url"
+    base64url_encoded_bytes_digest = base64.urlsafe_b64encode(
+        hash_instance.digest(),
+    ).decode()
+
+    return f"urn:ietf:params:scitt:{message_type}:{hash_name}:{base_encoding}:{base64url_encoded_bytes_digest}"
+
 
 def cli(fn):
     p = fn("create-claim", description="Create a fake SCITT claim")
@@ -195,7 +216,8 @@ def cli(fn):
 def main(argv=None):
     parser = cli(argparse.ArgumentParser)
     args = parser.parse_args(argv)
-    args.func(args)
+    urn = args.func(args)
+    print(urn)
 
 
 if __name__ == "__main__":

johnandersen777 added a commit to johnandersen777/scitt-api-emulator that referenced this issue Nov 30, 2024
@johnandersen777 johnandersen777 changed the title scittatp: statement URN used for indexing scittgit: statement URN used for indexing Dec 11, 2024
@johnandersen777
Copy link
Contributor Author

johnandersen777 commented Dec 11, 2024

  • We create the tree similarly to how we do with ATProto
    • We use the subject as a URN (as described somewhere in the docs)
      • urn:ietf:params:scitt:signed-statement:sha256:base64url:Q9OWRUhmW_ZOPTU3yaau9AfCGxPHO2QQ8fuP0dztdnw=
  • When we federate, the subjects are "feeds" (we were calling them that for a while in SCITT)
    • This allows us to create the tree like structure we have with ATProto
    • Federation uses URNs to de-duplicate across instances
      • TODO work out ING-4 style mods per org policy / instance
    • We can also use this for attaching CI/CD results similar to how we have .git and metadata feeds within indexes
  • https://bsky.app/profile/john.atproto.chadig.com/post/3lcldiuh5ck2i
name: 'Maintainers of main branch'
data:
  federation:
    - protocol: 'publicdomainrelay/federation-git@v1'
      data:
        repos:
          - namespace: 'publicdomainrelay'
            name: 'example-policy-maintainers'
            group: true
            indexes:
              - 'github'
          - namespace: 'john'
            name: 'test-4'
            indexes:
              - 'atproto'
          - namespace: 'alice'
            name: 'example-policy-maintainers-stored-in-atproto'
            indexes:
              - 'atproto'
  namespaces:
    publicdomainrelay:
      indexes:
        github:
          protocol: 'publicdomainrelay/index-github@v1'
          data:
            owner: 'publicdomainrelay'
    john:
      indexes:
        github:
          protocol: 'publicdomainrelay/index-github@v1'
          data:
            owner: 'johnandersen777'
        atproto:
          protocol: 'publicdomainrelay/index-atproto-v2@v1'
          data:
            handle: 'john.atproto.chadig.com'
            uri: 'at://did:plc:w4524qnuvc7o6ojwjwtnvh75/app.bsky.feed.post/3lc2smchqf22i'
            cid: 'bafyreiebgxcpue5xjy5hmpfw7mnwdc2ss7nsia2ixmdm4zd7twu6bgqbky'
    alice:
      indexes:
        github:
          protocol: 'publicdomainrelay/index-github@v1'
          data:
            owner: 'aliceoa'
        atproto:
          protocol: 'publicdomainrelay/index-atproto-v2@v1'
          data:
            handle: 'alice.atproto.chadig.com'
            uri: 'at://did:plc:vjnm5ukoaxy4fi4clcqhagud/app.bsky.feed.post/3lbxet47fu22i'
            cid: 'bafyreicrrqguwnmkc6djw4motgree4qdt3agfjnesv532kxxgdrlomphqi'
  owners:
    - 'publicdomainrelay'
    - 'john'
    - 'alice'
  # TODO Pull requests. If you want to confirm a pull request, we have to have
  # the HEAD for the branch advanced by each user within their repo. So each
  # owner has confirmed that that ref advanced.

@johnandersen777
Copy link
Contributor Author

I have choice words for Microsoft......

@johnandersen777
Copy link
Contributor Author

  • I strongly dislike the need for a secondary index to support content
    addressability of statements.
    • Maybe that's just because I care about the content in the TRANSPARENCY
      service, but that's just me. Fucking UUIDs, sure, pick chaos. Who needs
      useful constructs like content addressability when you have chaos? Maybe I'm
      too bitter and jadded to go back to software if we're just going to keep
      burning everything to the fucking ground. Why can't we have nice things?
  • ING-4 attestations in SCITT indexed by URN allows for mirroring an upstream
    across walled gardens to be the issuance of a transparent statements against
    each upstream (or rather, the statement representing the upstream)
  • Arch doc with URN definitions
B.1. Identifiers For Binary Content
Identifiers for binary content, such as Statements, or even Artifacts themselves are computed as follows:

Let the base64url-encoded-bytes-digest for the message be the base64url encoded digest with the chosen hash algorithm of bytes / octets.

Let the SCITT name for the message be the URN constructed from the following URI template, according to [RFC6570]:

Let the message-type, be "statement" for Statements about Artifacts.

urn:ietf:params:scitt:\
{message-type}:\
{hash-name}:{base-encoding}:\
{base64url-encoded-bytes-digest}
B.2. Identifiers For SCITT Messages
Identifiers for COSE Sign 1 based messages, such as identifiers for Signed Statements and Receipts are computed as follows:

Let the base64url-encoded-to-be-signed-bytes-digest for the message be the base64url encoded digest with the chosen hash algorithm of the "to-be-signed bytes", according to Section 8.1 of [RFC9052].

Let the SCITT name for the message be the URN constructed from the following URI template, according to [RFC6570]:

Let the message-type, be "signed-statement" for Signed Statements, and "receipt" for Receipts.

urn:ietf:params:scitt:\
{message-type}:\
{hash-name}:{base-encoding}:\
{base64url-encoded-to-be-signed-bytes-digest}
Note that this means the content of the signature is not included in the identifier, even though signature related Claims, such as activation or expiration information in protected headers are included.

As a result, an attacker may construct a new Signed Statement that has the same identifier as a previous Signed Statement, but has a different signature.

B.3. Identifiers For Transparent Statements
Identifiers for Transparent Statements are defined as identifiers for binary content, but with "transparent-statement" as the message-type.

urn:ietf:params:scitt:\
{message-type}:\
{hash-name}:{base-encoding}:\
{base64url-encoded-bytes-digest}
Note that because this identifier is computed over the unprotected header of the Signed Statement, any changes to the unprotected header, such as changing the order of the unprotected header map key value pairs, adding additional Receipts, or adding additional proofs to a Receipt, will change the identifier of a Transparent Statement.

Note that because this identifier is computed over the signatures of the Signed Statement and signatures in each Receipt, any canonicalization of the signatures after the fact will produce a distinct identifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant