Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Support for "standard" hexadecimal hashes #49

Open
Lekensteyn opened this issue Aug 25, 2014 · 11 comments
Open

Support for "standard" hexadecimal hashes #49

Lekensteyn opened this issue Aug 25, 2014 · 11 comments

Comments

@Lekensteyn
Copy link

Given that the many tools output base-16 representations of the hashes, what about adding support for this? The length of the hash could be used to detect the format (base64 vs hex):

# sha256: 1YiBrxeoKyxVnuPvaVEF96AfayYZap-zXLHxUPxPjzw
# sha256: 3f7a8ec45765cc33daac2448c609ab08e76ffb5a
peep==1.3
@willkg
Copy link
Collaborator

willkg commented Aug 29, 2014

Which tools are you referring to?

@Lekensteyn
Copy link
Author

sha256sum, Python's hashlib.hexdigest(), openssl sha256.

Base64-encoded digests are slightly unusual.

@willkg
Copy link
Collaborator

willkg commented Aug 29, 2014

Are you using those tools with peep? Is using base64 representation making something you want to do difficult?

@Lekensteyn
Copy link
Author

The hex digests are posted on PyPi, GPG-signed release announcements (in the case of Django), so yes, it introduces an additional step of complexity.

I was thinking of this workflow:

  1. Pick the published sums from a known-good source.
  2. Paste them into the requirements file
  3. Install everything using peep.

If I have to use base64-encoded strings, then I need to convert the hex digests to bytes and then convert it to base64 (substituting the two non-alphanumeric characters which I need to figure out first (and append padding?)).

So it is easier if I could use the existing sources rather than converting it to an unusual format.

@willkg
Copy link
Collaborator

willkg commented Aug 29, 2014

Got it. That's a much more complete report of the issue than the original description.

More complexity yields more bugs, so I don't think we should support two hash formats. I think the way forward with this would be either:

  1. decide it's not worth doing
  2. switch from base64 to base16

I'm inclined to go with number 2 though I suspect the use case of using base16 encodings to make it easier to use other peoples' hashes leads to a false sense of security which makes me wonder whether we should decide it's not worth doing.

@erikrose Your thoughts?

@Lekensteyn
Copy link
Author

The biggest threat I am worried about is that the file gets changed in the future (or now, via a MitM), not necessarily fake checksums.

The sources for checksums would be:

  • GPG-signed release announcements
  • checksums used by Linux distributions (in their packaging)
  • ...
  • PyPi

@mythmon
Copy link
Contributor

mythmon commented Oct 2, 2014

+1 to allowing base16 checksums. I don't think I've ever seen base64 hashes anywhere else, which makes it hard to use other tools. I tried to use sha256sum to verify hashes for a while, and it really wasn't clear to me until later why they weren't matching up.

base64 hashes are slightly shorter (43 vs 64 characters), but I don't think that matters much.

Personally I think that peep should be able to handle more types of checksum than just base64 encoded sha256 checksums. I would be ok with auto-detecting base64 or base16.

@erikrose
Copy link
Owner

erikrose commented Oct 3, 2014

Where are the hex sha256 digests on PyPI? All I see is sha1 hashes, and I have to dig into the DOAP records to see them. If you're talking about md5 hashes, those and sha1 are fairly thoroughly broken. While I could see them being useful and easier to guard solely against accidents, I am reticent to make them the path of least resistance.

@erikrose
Copy link
Owner

erikrose commented Oct 3, 2014

Supporting the hex versions of sha256 hashes seems like a no-brainer to me; we can easily distinguish them based on length.

@Lekensteyn
Copy link
Author

Depending on your threat model, md5 can be sufficient "for now". Collision vulnerabilities affect the trust you can have in the integrity of files with a checksum from untrusted sources. If you already have a md5 hash which is not specially crafted, then a second preimage attack is more difficult.

SHA-1 does not have known preimage vulnerabilities at all.

So, when do you expect to implement the hex versions?

@erikrose
Copy link
Owner

erikrose commented Oct 8, 2014

I can't believe I haven't mentioned the peep hash command. If you have the tarballs or wheels downloaded, that's what you run to get the wheel-formatted hashes of them; you don't need to muck around in the REPL yourself or anything like that. FWIW, the base64'd hashes come from http://legacy.python.org/dev/peps/pep-0427/#signed-wheel-files, the format wheels use internally. Those didn't become popular [yet], so the lack of tooling around them is annoying.

Hex versions honestly aren't a big priority for me, since they don't scratch any of my itches, but I'll gladly accept a patch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants