Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to pyca/cryptography as crypto backend #41

Merged
merged 7 commits into from
Aug 31, 2018

Conversation

sklemer1
Copy link
Contributor

@sklemer1 sklemer1 commented Aug 7, 2018

x509_rsa_pem consists of an old fork of the no longer maintained pycrypto lib as well as an old version of pyasn which got some security fixes in the meantime. Also it's not supporting EC and high level python only without SIMD optimizations. So the plan is to replace it with pyca/cryptography.

I ripped the x509_rsa_pem folder away and changed the XMlSecCrypto object as well as the utils.pem2cert function that is used by pyff. Sadly the latter one gave pretty internal APIs to the outside so I also had to add some kind of wrapper/dummy class for the meantime that implements everything that I supposed people might have used. We should mark this deprecated somehow and make up some nice API for things like cert.subject, cert.keysize etc. (or decide to expose the cryptography-object) and change to 2 parts of pyff using it.

I'm not very satisfied with __init__._verify and sign (pretty hard to understand what is going on there -- The difference among digest(), _cm_alg(), _transform() is hard to grab). I also moved the digest and padding to pyca as I trust them more and hope for optimized algorithms. Therefor I had to change the XMlSecCrypto.sign() and verify() interface to include the hashing algorithm, but it's downwards compatible.

Cryptography distinguishes certificates and public keys. I assumed that we will always only have to deal with certs, never with pure pub keys.

This is a first version of the patch for review trying to change as few as possible. After getting this together we might want to use more of crypotgraphys functions, e.g. in crypto._cert_fingerprint(), password encrypted private keys, certificate chains etc.

The attached patch passes all internal and pyff unit tests that are also passed with master. I couldn't test pkcs11 which I don't have setup for atm.

Have a look at it and start flaming ;)

All Submissions:

[...]

  • This project follows PEP8 style guide. Have you run your code against the 'flake8' linter?

flake8 complains very heavily about a lot of stuff already in there -- we might want to address this :).

@leifj leifj requested review from c00kiemon5ter and fredrikt August 7, 2018 14:51
@leifj
Copy link
Contributor

leifj commented Aug 7, 2018

Its easy to test p11 - just install softhsm and pykcs11 and those tests should be run automatically!

Copy link
Member

@c00kiemon5ter c00kiemon5ter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is good and should be easy to merge. I made some comments for things that I think need to change. I don't think there is anything really crucial there. Lets discuss these.

def get_issuer(self):
return self.get_issuer()

def getValidity(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lie. This is not get-ting information, but parsing and creating. This work should be done once on __init__(). (The same could be said for every other method that dereferences self.cert.)

return self.cert.issuer

def get_issuer(self):
return self.get_issuer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#itsatrap! This will recurse until the stack explodes.
I guess this should be self.getIssuer() .. but then I see that his was broken before:

-    def get_issuer(self):
-        return self.get_issuer()

Lets fix this and never look back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go one further and remove it as this was obviously never used by anyone and I'd like to get rid of this interface anyway -- see below.


def getValidity(self):
d = dict()
d['notAfter'] = [self.cert.not_valid_after.strftime("%y%m%d%H%M%SZ")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time format %y%m%d%H%M%SZ should be declared once with a proper name.

@@ -0,0 +1,38 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line should be removed; or better yet, replaced with an appropriate docstring.

@@ -0,0 +1,38 @@

class LegacyCertificate(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it legacy because it is deprecated? If so, we should add deprecation warnings on module import and class usage.

Copy link
Contributor Author

@sklemer1 sklemer1 Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called it legacy because it emulates the inner-api of pycrypto that was shown to the outside by us (and used at least by pyff). Shall I just add a calls to warn.warning or do we want something more sophisticated like pyff's decorator:

def deprecated(logger=log):
    """This is a decorator which can be used to mark functions
    as deprecated. It will result in a warning being emitted
    when the function is used."""

    def decorating(func):
        def new_func(*args, **kwargs):
            msg = "Call to deprecated function %s at %s:%d" % (func.__name__,
                                                               func.func_code.co_filename,
                                                               func.func_code.co_firstlineno + 1)
            if logger:
                logger.warn(msg)
            else:
                print(msg)

            return func(*args, **kwargs)

        return new_func
    return decorating

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it a shim and name it as such. If its legacy then there must be a clear path to removing the API but I don't really see that... or what am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A deprecation warning serves two purposes:

  1. indicate that the call will be removed in the future
  2. suggest a migration path away from this call

The @deprecated decorator should include a reason parameter.
The deprecation notice is also a warning and should not bother with the logger.
Warnings can be optionally logged using logging.captureWarnings().

See also, the deprecated and deprecation packages as to how they handle this behaviour.

Copy link
Contributor Author

@sklemer1 sklemer1 Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leifj Let's look at pem2cert: What do I want this function for?

a) parse a pem and tell me if it's correctly formed
b) give me some getters or attributes to access properties like keysize, issuer, etc.

At the moment this is done via a dictionary (fine) where one element is an object which itself used to be the backend cert object. Both are used at the same time at least by pyff. I want to get rid of the exposure of the backend object but therefor we have to provide what's missing in the dictionary -- and I'm not sure if a dict is the right type of object for this job.

I'd say we create a new class (or 2) Certificate and Key that has getters for everything interesting. Again, c.f. #42 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for getters only indicates that this is a read-only object; iow it should be immutable. Another hint we get is that it only stores values. dict is not the right type because it allows its contents to change. To keep things simple, I would make this a namedtuple (which can also be exposed as dict); then revise if the needs turn out to be different.

self.cert_pem = self.key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce whitespace - indent to four spaces more for each nested block.

if private:
self.key = serialization.load_pem_private_key(file.read(),password=None, backend=default_backend())
if not isinstance(self.key, rsa.RSAPrivateKey):
raise XMLSigException("We don't support non-RSA keys at the moment.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] non-RSA private keys [...]

else:
self.key = load_pem_x509_certificate(file.read(), backend=default_backend())
if not isinstance(self.key.public_key(), rsa.RSAPublicKey):
raise XMLSigException("We don't support non-RSA keys at the moment.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] non-RSA public keys [...]

@@ -230,6 +277,7 @@ def __delitem__(self, key):
del self.certs[key]

def _cert_fingerprint(cert_pem):
# XXX might use cryptography internals instead of parsing it on our own
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the less parsing we do, the better

NEWS.txt Outdated
@@ -158,3 +158,6 @@ News
*Release fre 25 maj 2018 19:43:54 CEST*

* fix verification bug affecting sha512

0.19
===
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a mistake(?). I do not see any changelog.

@c00kiemon5ter
Copy link
Member

(or decide to expose the cryptography-object)

Do not do that. This means the library would be very tightly coupled to an API we do not have any control over. Wrap the object, define the needed interface, and expose only the parts that are necessary. The wrapper-object gets implemented by internally invoking the wrapped-object methods or properties, encapsulating its data and behaviour.

@c00kiemon5ter c00kiemon5ter changed the title Switch to pyca.cryptography as crypto backend Switch to pyca/cryptography as crypto backend Aug 7, 2018
@sklemer1
Copy link
Contributor Author

sklemer1 commented Aug 8, 2018

Any ideas for a 'good' API that we might want to expose? Scope should be similar to what we have in the util-functions but keep EC in mind. Anyway: This discussion should go into its own feature request #42 and has nothing to do with the backend ;).

@sklemer1
Copy link
Contributor Author

sklemer1 commented Aug 8, 2018

I pushed a new version with the trivial things fixed. What's missing besides the cert-object-API (deprecation or no deprecation) discussion that I don't wont to address now but in #42 is the hash_alg handling -- will add this and then I hope things are ready to fly.

@coveralls
Copy link

coveralls commented Aug 8, 2018

Coverage Status

Coverage decreased (-4.8%) to 51.978% when pulling a354558 on sklemer1:newcrypto into 982550b on IdentityPython:master.

@sklemer1
Copy link
Contributor Author

sklemer1 commented Aug 8, 2018

Its easy to test p11 - just install softhsm and pykcs11 and those tests should be run automatically!

About this: I'm on some recent ubuntu that comes with softhsm2 which behaves in quiet some ways different than softhsm and than the unit tests expect them (slot 0 is renumbered after init so basically everything throws errors about a non-existing slot; new config format; new environment var; different lib pathes). I have a patch for this halfway ready but am away until next week.

@leifj
Copy link
Contributor

leifj commented Aug 14, 2018

Where are we with this PR?

@sklemer1
Copy link
Contributor Author

sklemer1 commented Aug 14, 2018

I started to encapsulate our supported hashes into some kind of factory -- should be extensible enough for now to later support ECDH et al. We might want to change other parts to use this as well and start some overall cleanup but I think this might do the trick to get this PR done.

I won't touch the concept of the XMlSecCrypto in this PR and won't do the #42 "give users an abstract certificate representation" part here.

So I think this could be merged :). Further work later on.

@sklemer1
Copy link
Contributor Author

Okay... one more commit :). I replaced the cert parsing and fingerprinting. I tried to keep the CertDict interface stable, BUT it will now return complete PEMs including the header and not a b64 encoded DER like it used to. Tried it with pyff's unit tests and some scripts of my own -- still seems to work.

Someone should write a unit test for the CertDict.

Copy link
Contributor

@fredrikt fredrikt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at the diff and have no objections. I've also tried the new version with the eduid idp and unless I screwed up getting the right code into all the right containers, it seems to work without issues.

Copy link
Member

@c00kiemon5ter c00kiemon5ter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see these changes. I am open to discussing the concepts in more detail if you want to,

I will not insist though, if you want to merge this and revisit later. In the future we can refine this even further.

@@ -116,7 +116,7 @@ def _get_by_id(t, id_v):

def _alg(elt):
"""
Return the hashlib name of an Algorithm. Hopefully.
Return the xmldsig name of an Algorithm. Hopefully.
:returns: None or string
"""
uri = elt.get('Algorithm', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What follows this line is a bit funny. This should just return uri

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! But I want to emphasize that this piece of code was from @leifj 😛 ... if I read the github blame correctly.

ALGORITHM_SIGNATURE_RSA_SHA512 = "http://www.w3.org/2001/04/xmldsig-more#rsa-sha512"



class _HashType(object):
def __init__(self, hashlib_name, xmldsig_name, pyca_name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be holding on three different names. There should be just one internal name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this, I see we need one name for use by xmldsig, one for hashlib and one for pyca. The correct way to implement this is to make it explicit by denoting the respective conversions. We need

  • our own internal name
  • appropriate mappings from each producer
  • appropriate mappings to each consumer/user

The internal name is used everywhere, while conversion takes place on the edges.

sign_alg_internal_to_xmldsig = {
  'sha1': ALGORITHM_SIGNATURE_RSA_SHA1,
  'sha256': ALGORITHM_SIGNATURE_RSA_SHA256,
  'sha384': ALGORITHM_SIGNATURE_RSA_SHA384,
  'sha512': ALGORITHM_SIGNATURE_RSA_SHA512,
}

sign_alg_xmldsig_to_internal = {
  ALGORITHM_SIGNATURE_RSA_SHA1: 'sha1',
  ALGORITHM_SIGNATURE_RSA_SHA256: 'sha256',
  ALGORITHM_SIGNATURE_RSA_SHA384: 'sha384',
  ALGORITHM_SIGNATURE_RSA_SHA512: 'sha512',
}

sign_alg_internal_to_hashlib = {
  'sha1': 'sha1',
  'sha256': 'sha256',
  'sha384': 'sha384',
  'sha512': 'sha512',
}

sign_alg_internal_to_pyca_cryptography = {
  'sha1': 'SHA1',
  'sha256': 'SHA256',
  'sha384': 'SHA384',
  'sha512': 'SHA512',
}

# and the other way around like xmldsig ..

Then, consumers convert to the internal representation and producers to the desired representation.

def _sig_alg(si):
    sm = si.find(".//{%s}SignatureMethod" % NS['ds'])
    sig_alg = _alg(sm)
    if sm is None or sig_alg is None:
        raise XMLSigException("No SignatureMethod")
    return constants.sign_alg_xmldsig_to_internal[sig_alg]

def sign(self, data, hash_alg, pad_alg="PKCS1v15"):
  # [...]
  hash_alg = constants.sign_alg_internal_to_pyca_cryptography[hash_alg]
  hasher = getattr(pyca_hashes, hash_alg)
  padder = getattr(pyca_padding, pad_alg)
  return self.key.sign(data, padder(), hasher())

In the above, the factory is the mapping itself.
If you want to handle errors explicitly you need an actual factory function.

def sign_alg_xmldsig_to_internal(xmldsig_sign_alg):
  try:
    constants.sign_alg_xmldsig_to_internal[xmldsig_sign_alg]
  except KeyError as e:
    raise UnsupportedSignAlg(xmldsig_sign_alg) from e

(which can be generalized to a factory-of-factories by making constants.sign_alg_xmldsig_to_internal a parameter)


PS: the same holds for padding


This now raises another question: Do we need to be managing hashlib names when we are using a full-featured cryptography library? I do not think so. We can get rid of hashlib and use the primitives from pyca/cryptography directly, thus not having to maintain that mapping and also not having to maintain a synchronization of supported hashes between xmldsig, hashlib and pyca/cryptography.

self.name = hashlib_name
self.xmldsig_name = xmldsig_name
if pyca_name is None:
self.pyca_name = self.name.upper()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this (.upper) is correct. Currently and for the algorithms we use, this may hold. But, for the general case, I do not think pyca/cryptography has made any promise to such a rule. They are free to decide how they name the algorithms they implement and how to expose them.


def _sig_digest(si):
return (_sig_alg(si).split("-"))[1]
return getattr(constants.Hashes,sig_alg.lower())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.lower is a detail that the caller should not care about. This call belongs to the factory internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually kept this -- don't know -- it's internal stuff anyway. Could be implemented in sign_alg_xmldsig_sig_to_internal() but... ... don't know.

@sklemer1
Copy link
Contributor Author

sklemer1 commented Aug 29, 2018

I completely agree with your words about hashlib-usage and looked why and where we used hashlib and hashlib names. The only place I found was first of all unrelated to my code and 2nd could be transformed into pyca/cryptography-calls. I don't think hashlib names were ever exposed to outside callers.

With this in mind I got rid of the hash-method-abstraction-thingy, clarified the 2 different types/ parts of hashing/ digesting that goes on in the code and hope that things got simpler with this. _digest() went into the crypto.py-file because it's using the pyca-backend now which I didn't want to expose to __init__. We do this in utils.py for now but this might be changed in the future :).

I also removed the Padding-abstraction for now as it wasn't used by my code at the moment anyway.


One more thing: As I, or better my work place DFN, contributed some substantial code now -- where can I put our name as I don't see a contributors-file and the source files are missing copyright-statements.

@c00kiemon5ter
Copy link
Member

c00kiemon5ter commented Aug 29, 2018

I think it is simpler and good enough for now.


@leifj should answer your other question; I'm not good with these legal-type of things, but I think either we introduce a CONTRIBUTORS.txt file, or amend LICENSE.txt with the appropriate attribution.

@leifj
Copy link
Contributor

leifj commented Aug 29, 2018

The question of attribution is probably best resolved for the whole IdentityPython project - a topic for the next call? My gut feeling is that CONTRIBUTORS is best since we actually want most of the (c) to be with IdentityPython longterm.

@leifj
Copy link
Contributor

leifj commented Aug 29, 2018

I'm gona pull the trigger on this PR tomorrow because I gotta hit the sack right now and I want to do some tests with some of my eIDAS cruft too. I expect all of that to be 100% ok tho. Thx for all the great work on this byt @sklemer1 @c00kiemon5ter and @fredrikt . This feels like a major milestone for this library and for the IdentityPython community!

@leifj leifj merged commit f6c52ea into IdentityPython:master Aug 31, 2018
@sklemer1 sklemer1 deleted the newcrypto branch October 17, 2018 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants