Custom service to service authentication using IAM/KMS

Similar to abusing IAM and STS, we can also abuse IAM and KMS to let Amazon do our service-to-service authentication for us. Unlike STS, though, KMS is almost perfect for this use case.

Let’s recap a bit from the STS post, though. What I’m aiming for is service to service authentication with the following specs:

  1. Has no chicken and egg trust problem. Re-use AWS to provide the chicken, we’ll use it to lay the eggs.
  2. Can be used only from one service to another service. The service receiving the token shouldn’t be able to reuse the token to impersonate the sender.
  3. Has token rotation, and specifically lifetime validity constraints.
  4. Can have future lifetime validity constraints, so that it can be stored with enqueued work.
  5. Can have scoped tokens, also to support enqueued work.

KMS at first glance looks like a relatively boring HSM as a service, providing encryption/decryption and random data generation. When looking at KMS’s implementation of AES GSM with AAD (additional authenticated data), which it calls encryption context, KMS becomes really interesting. It’s interesting because it’s possible to use the encryption context along with IAM policy to restrict access to encryption or decryption requests. Let’s see an example:

>>> import boto3
>>> import datetime
>>>
>>> now = datetime.datetime.now()
>>> not_after = now + datetime.timedelta(minutes=60)
>>> now = now.strftime("%Y%m%dT%H%M%SZ")
>>> not_after = not_after.strftime("%Y%m%dT%H%M%SZ")
>>>
>>> kms = boto3.client('kms')
>>> token = kms.encrypt(KeyId='alias/authnz-testing', Plaintext='testdata', EncryptionContext={'from': 'servicea-development-iad', 'to': 'serviceb-development-iad', 'not_before': now, 'not_after': not_after})
>>> token
{u'KeyId': u'arn:aws:kms:us-east-1:12345:key/abcdefgh-1234-5678-9abcd-ee72ac95ae8c', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RequestId': '3a48f2ad-072d-11e5-88fb-17df9ce1a01a'}, u'CiphertextBlob': '\n \x999\x9e$yO\x92\x1dg\xbbZ^S\x84\xdaI\xbf\x14@\x81\x8a\x1c\xf2\xf8Z\x05\xed\xed\xb2\x8d)T\x12\x8f\x01\x01\x01\x02\x00x\x999\x9e$yO\x92\x1dg\xbbZ^S\x84\xdaI\xbf\x14@\x81\x8a\x1c\xf2\xf8Z\x05\xed\xed\xb2\x8d)T\x00\x00\x00f0d\x06\t*\x86H\x86\xf7\r\x01\x07\x06\xa0W0U\x02\x01\x000P\x06\t*\x86H\x86\xf7\r\x01\x07\x010\x1e\x06\t`\x86H\x01e\x03\x04\x01.0\x11\x04\x0c\xd3\x96\x0c\x91\x83\xd2l!\xfb\xa6\xc2\x90\x02\x01\x10\x80#\x97Z\xd1\xbb\xb4_\x12\xea\x1a\xed\x85\x0e\x9b1\xfa0j\xca1(\xc7\xc3\x8czT\xd4\x8fk\x08\x00\xa8\xcd\xe5\x82\xb3'}
>>> kms.decrypt(CiphertextBlob=token['CiphertextBlob'], EncryptionContext={'from': 'servicea-development-iad', 'to': 'serviceb-development-iad', 'not_before': now, 'not_after': not_after})
{u'Plaintext': 'testdata', u'KeyId': u'arn:aws:kms:us-east-1:12345:key/abcdefgh-1234-5678-9abcd-ee72ac95ae8c', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RequestId': '6450392b-072d-11e5-87df-5345698b39e1'}}

You may see where I’m going here. Like the previous STS post, I’m doing from and to mappings so that I can use IAM policy to limit a token from a service to a service so that it can’t be re-used by the ‘to’ service to authenticate to other services as the ‘from’ service. Something new I’ve added, though, is not_before and not_after, which is a time period the auth token is valid for. Unlike the STS solution, this allows us to enqueue work with a token that’s valid during the period the work is expected to be done.

So, this is the context we’re working with. Using either KMS key policy or KMS grants, we can limit which principles can encrypt or decrypt using the key. Most importantly, we can use the encryption context to control this. Let’s make some grants:

$ salt-call boto_kms.create_grant 'alias/authnz-testing' grantee_principal='arn:aws:iam::12345:user/servicea-development-iad' operations='["Encrypt"]' constraints='{"EncryptionContextSubset":{"from":"servicea-development-iad"}}' > /dev/null
$ salt-call boto_kms.create_grant 'alias/authnz-testing' grantee_principal='arn:aws:iam::12345:user/servicea-development-iad' operations='["Decrypt"]' constraints='{"EncryptionContextSubset":{"to":"servicea-development-iad"}}' > /dev/null
$ salt-call boto_kms.create_grant 'alias/authnz-testing' grantee_principal='arn:aws:iam::12345:user/serviceb-development-iad' operations='["Decrypt"]' constraints='{"EncryptionContextSubset":{"to":"serviceb-development-iad"}}' > /dev/null
$ salt-call boto_kms.create_grant 'alias/authnz-testing' grantee_principal='arn:aws:iam::12345:user/serviceb-development-iad' operations='["Encrypt"]' constraints='{"EncryptionContextSubset":{"from":"serviceb-development-iad"}}' > /dev/null
$ salt-call boto_kms.list_grants 'alias/authnz-testing'
local:
    ----------
    grants:
        |_
          ----------
          Constraints:
              ----------
              EncryptionContextSubset:
                  ----------
                  from:
                      servicea-development-iad
          GrantId:
              WZ9Y6I7S05pR0LjYzEXKhzVX0JWzapkxPjl3KiXH8BrMI1d4D5pecZ51FnOe11g56
          GranteePrincipal:
              arn:aws:iam::12345:user/servicea-development-iad
          IssuingAccount:
              arn:aws:iam::12345:root
          Operations:
              - Encrypt
        |_
          ----------
          Constraints:
              ----------
              EncryptionContextSubset:
                  ----------
                  to:
                      servicea-development-iad
          GrantId:
              EFm4L4FCsnM5ba23dmdC05Stw1oojsYVjONDkCwJpegmHdJ0gRF8jQd9NZmdXYXfA
          GranteePrincipal:
              arn:aws:iam::12345:user/servicea-development-iad
          IssuingAccount:
              arn:aws:iam::12345:root
          Operations:
              - Decrypt
        |_
          ----------
          Constraints:
              ----------
              EncryptionContextSubset:
                  ----------
                  from:
                      serviceb-development-iad
          GrantId:
              JoT9F5h19KqpunXfo89CnDB1PI1ig4ApuOYwsP20Pc6GFOBX1lWlx72oAh600aYXN
          GranteePrincipal:
              arn:aws:iam::12345:user/serviceb-development-iad
          IssuingAccount:
              arn:aws:iam::12345:root
          Operations:
              - Encrypt
        |_
          ----------
          Constraints:
              ----------
              EncryptionContextSubset:
                  ----------
                  to:
                      serviceb-development-iad
          GrantId:
              8hDVrUmkgcZxIJ8h2WHtgSU7sy3HcSm5dQg3u0uWKBpBcbPGUL27rkGmjTUcvn9JD
          GranteePrincipal:
              arn:aws:iam::12345:user/serviceb-development-iad
          IssuingAccount:
              arn:aws:iam::12345:root
          Operations:
              - Decrypt

As a quick aside: anything that we’re doing here through grants we can also do through key policy. However, key policies are limited in size and can’t easily be dynamically updated. Though we can try to limit the size of the policies by using IAM policy variables, the variable we’d need to use for this (aws:userid) doesn’t work because it includes the instance-id along with the role and we can’t target the ‘to’ service that way. We’ll need grants for each service and we can create and revoke grants at will, which is why I’ve chosen them.

I have two grants per service. One grant that allows the service to decrypt anything that’s sent to it (to) and another to encrypt anything that it’s going to send (from). The important bits are the GranteePrincipal, Operations and Constraints attributes. We allow the GranteePrincipal to perform the Operations listed, as long as the encryption context contains at least the key/value listed in the Constraints. We specify ‘at least the key/value listed’ by using EncryptionContextSubset in the constraints, rather than EncryptionContextEquals.

One thing ignored in the grants is not_before and not_after. A nice property of encryption context is that however data is encrypted is also how it must be decrypted. So, for instance, this doesn’t work:

>>> key = kms.encrypt(KeyId='alias/authnz-testing', Plaintext='testdata', EncryptionContext={'from': 'servicea-development-iad', 'to': 'serviceb-development-iad', 'not_before': now, 'not_after': not_after})
>>> key
{u'KeyId': u'arn:aws:kms:us-east-1:12345:key/abcdefgh-1234-5678-9abcd-ee72ac95ae8c', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RequestId': '3a48f2ad-072d-11e5-88fb-17df9ce1a01a'}, u'CiphertextBlob': '\n \x999\x9e$yO\x92\x1dg\xbbZ^S\x84\xdaI\xbf\x14@\x81\x8a\x1c\xf2\xf8Z\x05\xed\xed\xb2\x8d)T\x12\x8f\x01\x01\x01\x02\x00x\x999\x9e$yO\x92\x1dg\xbbZ^S\x84\xdaI\xbf\x14@\x81\x8a\x1c\xf2\xf8Z\x05\xed\xed\xb2\x8d)T\x00\x00\x00f0d\x06\t*\x86H\x86\xf7\r\x01\x07\x06\xa0W0U\x02\x01\x000P\x06\t*\x86H\x86\xf7\r\x01\x07\x010\x1e\x06\t`\x86H\x01e\x03\x04\x01.0\x11\x04\x0c\xd3\x96\x0c\x91\x83\xd2l!\xfb\xa6\xc2\x90\x02\x01\x10\x80#\x97Z\xd1\xbb\xb4_\x12\xea\x1a\xed\x85\x0e\x9b1\xfa0j\xca1(\xc7\xc3\x8czT\xd4\x8fk\x08\x00\xa8\xcd\xe5\x82\xb3'}
>>> kms.decrypt(CiphertextBlob=key['CiphertextBlob'], EncryptionContext={'from': 'servicea-development-iad', 'to': 'serviceb-development-iad'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rlane/Envs/boto3/lib/python2.7/site-packages/botocore/client.py", line 249, in _api_call
    raise ClientError(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidCiphertextException) when calling the Decrypt operation: None

The decryption request fails because the not_before and not_after keys/values are missing (or incorrect). Based on that, we can check the time validity in our application logic. Of course, if KMS could handle this for us, it would be nice, but at this time it’s not possible to use IAM policy variables in grants, only in key policy.

A downside to needing to pass the encryption context in for the decryption request is that it’s necessary to know all of the information to pass in. This means when we make a request from servicea to serviceb, we need to pass this information along with it. This is another reason I don’t bother with IAM policy for not_before and not_after; it’s necessary to pass these along with the request anyway.

Let’s look at some code for doing KMS authentication. Here’s the server side (using flask):

def get_key_arn():
    # You should cache this.
    key = kms.describe_key(
        KeyId='alias/{0}'.format(app.config['MASTER_KEY_ID'])
    )
    return key['KeyMetadata']['Arn']


def decrypt_token(token, _from, not_before, not_after):
    time_format = "%Y%m%dT%H%M%SZ"
    now = datetime.datetime.utcnow()
    _not_before = datetime.datetime.strptime(not_before, time_format)
    _not_after = datetime.datetime.strptime(not_after, time_format)
    # Ensure the token is within the validity window.
    if not (now >= _not_before) and (now <= _not_after):
        raise TokenDecryptError('Authentication error.')
    try:
        token = base64.b64decode(token)
        data = kms.decrypt(
            CiphertextBlob=token,
            EncryptionContext={
                # This token is sent to us.
                'to': app.config['IAM_ROLE'],
                # From another service.
                'from': _from,
                # It's valid from this time.
                'not_before': not_before,
                # And valid to this time.
                'not_after': not_after
            }
        )
        # Decrypt doesn't take KeyId as an argument. We need to verify the correct
        # key was used to do the decryption.
        # Annoyingly, the KeyId from the data is actually an arn.
        key_arn = data['KeyId']
        if key_arn != get_key_arn():
            raise TokenDecryptError('Authentication error.')
        plaintext = data['Plaintext']
    # We don't care which exception is thrown. If anything fails, we fail.
    except Exception:
        raise TokenDecryptError('Authentication error.')
    return plaintext


def require_auth(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        try:
            authz_subset = keymanager.decrypt_token(
                request.headers['X-Auth-Token'],
                request.headers['X-Auth-From'],
                request.headers['X-Auth-Not-Before'],
                request.headers['X-Auth-Not-After']
            )
            if key_has_privilege(authz_subset, f.func_name):
                return f(*args, **kwargs)
            else:
                return abort(401)
        except TokenDecryptError:
            return abort(401)
        # Paranoia
        return abort(401)
    return decorated

And here’s the client code (using requests):

import datetime
import boto3
import base64
import requests

now = datetime.datetime.utcnow()
not_after = now + datetime.timedelta(minutes=60)
now = now.strftime("%Y%m%dT%H%M%SZ")
not_after = not_after.strftime("%Y%m%dT%H%M%SZ")
auth_context = {
    'from': 'servicea-development-iad',
    'to': 'serviceb-development-iad',
    'not_before': now,
    'not_after': not_after
}
kms = boto3.client('kms')
token = kms.encrypt(
    KeyId='alias/authnz-testing',
    Plaintext='{"Actions":"GetMyUser"}',
    EncryptionContext=auth_context
)['CiphertextBlob']
token = base64.b64encode(token)
headers = {
    'X-Auth-Token': token,
    'X-Auth-From': auth_context['from'],
    'X-Auth-Not-Before': auth_context['not_before'],
    'X-Auth-Not-After': auth_context['not_after']
}
response = requests.get('/myuser', headers=headers)

Notice that there’s something extra fun we’re doing here: we’re limiting the authorization scope of the authentication token from the client side. Even if this token gets stolen, it’s only allowed to perform the actions specified in the token, which is also encrypted, so the attacker wouldn’t know which actions it’s allowed to perform with the token. Doing this is likely a good idea for asynchronous calls enqueued for the future.

Of course, we have to consider both KMS rate limiting and latency. In general we should use full-privilege tokens that last long enough to ensure we never hit rate limiting. We should also try to avoid the encrypt/decrypt latency that comes with calls to KMS. We’ll need some caching for this to work at any reasonable scale.

I won’t go into lengthy detail here, since there’s numerous ways of handling this, but I’ll give a few ideas:

  1. Slightly change our model. Rather than calling encrypt and decrypt, we could create data keys, pass the data keys along with data encrypted using the data key (which would be used as our token). We can then cache the data key in a central location (like DynamoDB, etcd, zookeeper, etc.). The data key itself would be encrypted with the encryption context described above. All clients and all targeted services could keep a decrypted version of the data key in-memory for the validity period. Assuming 50 clients and 50 servers and a 1 hour data key validity it’s 100 kms decryption requests per hour and only 1 encryption request per hour. Using this strategy it’s also possible to handle the encryption/decryption requests out of band of the applications to avoid the latency hit. Assuming a large number of service-to-service mappings, it could be complex to manage this out of band, though.
  2. Only use KMS auth to get a session, then use the session for all further calls. This is the lazy way, since it isn’t much work, but it also won’t provide quite the win, either. We have to take the latency hit for each initial auth, we’ll need to have an encryption request per hour per client, and will need a decryption request for every initial request from each client. Another downside here is that unless we cache the authz payload somewhere centrally, we’ll lose the ability to scope the tokens.
  3. Fingerprint the token on initial decryption and store it in a centralized cache (like memcache or redis) along with its authz payload and validity data. Subsequent requests would check the data in the cache, allowing it to avoid a decryption request. Similar to #2, it’s not quite as affective as #1, since we need an encryption request per client each hour and would also need a decryption request for each initial request from each client, every hour. It also requires a central cache for each service accepting requests.

In all of the above solutions, if the caching fails, we fallback to encryption/decryption requests, which puts us at risk for rate limit failures, but could still allow authentication to continue working.

Like the STS solution, this is a proof of concept. It’s mostly an idea of how you could re-use AWS’s services to avoid having to do the initial trust step in your bootstrap process. I’m sure it has some holes I haven’t thought about, and I’d love to get your feedback!

Custom service to service authentication using IAM/STS

Lately I’ve wanted to be able to use IAM directly for authentication. Specifically, I wanted a way for a service to be able to verify that a request from another service was from a particular IAM role and that the request’s auth is still valid. I want this because I want to avoid the chicken and egg problem of bootstrapping auth systems in the cloud. You need some secret on your instances that allow it to talk to an auth system to do authentication between services. Amazon already provides this in the form of IAM roles. Unfortunately, only AWS services have access to verify these credentials. However, it’s possible to abuse IAM and STS to achieve this goal.

DynamoDB offers fine-grained IAM resource protection. If we have a table with a string hash key of role_name, we can allow a service with the role example-production-iad to access its item in the table using IAM policy attached to the example-production-iad role. We can pass the role credentials from example-production-iad into another service, which would allow the other service to fetch the item using the passed-in credentials. By doing so we can verify that the service making the request is example-production-iad, since we’re limiting access to that item to example-production-iad.

Of course it’s insecure to pass the IAM credentials from services into other services, since it means that the other service has the entire permission set of the role being passed in. Thankfully, we can limit the scope of an assumed role as much as we want by passing in the policy we want to limit the token to.

First modify the role’s trust relationships:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com",
        "AWS": "arn:aws:iam::12345:role/example-production-iad"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then give the service access to read it’s own item in DynamoDB:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:us-east-1:12345:table/authnz"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "example-production-iad"
                    ],
                    "dynamodb:Attributes": [
                        "role_name"
                    ]
                },
                "StringEquals": {
                    "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
                }
            }
        }
    ]
}

Note that the above limits the returned attributes to the id attribute, which is also the hash key, meaning the service can’t get anything back that it isn’t already sending in.

Now, we can limit the scope of the token by assuming the role:

import boto
import boto.sts

conn = boto.sts.connect_to_region('us-east-1')
role = conn.assume_role("arn:aws:iam::12345:role/example-production-iad", 'auth', policy='{"Version":"2012-10-17","Statement":[{"Sid":"example1234","Effect":"Allow","Action":["dynamodb:GetItem"],"Condition":{"ForAllValues:StringEquals":{"dynamodb:LeadingKeys":"example-production-iad","dynamodb:Attributes":["role_name"]},"StringEquals":{"dynamodb:Select":"SPECIFIC_ATTRIBUTES"}},"Resource":["arn:aws:dynamodb:us-east-1:12345:table/authnz"]}]}')

These role credentials are now only allowed to get the role_name field of its own item in the DynamoDB table.

There’s a couple problems with this so far:

  1. This solution allows the target service to authenticate as example-production-iad to other services, since the role doesn’t have any information about the scope of the token (from service/to service).
  2. If you allow a role to assume itself, the assumed role credentials can be used to assume the role, which lets you extend the lifetime of the token via another token, which you can use to get another token which has another extended lifetime… you can probably see where this is going:
>>> conn = boto.sts.connect_to_region('us-east-1')
>>> role = conn.assume_role("arn:aws:iam::12345:role/example-production-iad", 'auth', policy='{"Version":"2012-10-17","Statement":[{"Sid":"example1234","Effect":"Allow","Action":["dynamodb:GetItem"],"Condition":{"ForAllValues:StringEquals":{"dynamodb:LeadingKeys":"example-production-iad","dynamodb:Attributes":["role_name"]},"StringEquals":{"dynamodb:Select":"SPECIFIC_ATTRIBUTES"}},"Resource":["arn:aws:dynamodb:us-east-1:1234:table/authz"]}]}', duration_seconds=900)
>>> 
>>> conn2 = boto.sts.connect_to_region('us-east-1', aws_access_key_id=role.credentials.access_key, aws_secret_access_key=role.credentials.secret_key, security_token=role.credentials.session_token)
>>> role2 = conn.assume_role("arn:aws:iam::12345:role/example-production-iad", 'auth', policy='{"Version":"2012-10-17","Statement":[{"Sid":"example1234","Effect":"Allow","Action":["dynamodb:GetItem"],"Condition":{"ForAllValues:StringEquals":{"dynamodb:LeadingKeys":"example-production-iad","dynamodb:Attributes":["role_name"]},"StringEquals":{"dynamodb:Select":"SPECIFIC_ATTRIBUTES"}},"Resource":["arn:aws:dynamodb:us-east-1:12345:table/authnz"]}]}', duration_seconds=900)
>>> role.credentials.expiration
u'2015-04-30T18:47:23Z'
>>> role2.credentials.expiration
u'2015-04-30T18:52:43Z'

As you can see, once you assume a role, you can use the currently unexpired credentials to re-assume the role which gives you a new set of credentials that expire in the future. Once you assume a role, you can assume it forever. This isn’t any better than using IAM users (it’s probably much, much worse, in fact).

Let’s make some changes to make this more secure. First, let’s add an example-production-iad-auth role that the example-production-iad role can assume. This solves problem #2, since we won’t let the auth role assume itself or other roles. We can also limit the scope of this role directly on the role’s policy, rather than having to limit the scope when assuming the role.

Next, let’s change up the dynamo table to add some scoping data. Rather than just having a primary hash key of the role’s name, let’s also add from and to fields, which will contain sets of role names. Now let’s modify the IAM policy for example-production-iad-auth:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:us-east-1:12345:table/authnz"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "example-production-iad"
                    ],
                    "dynamodb:Attributes": [
                        "role_name",
                        "from"
                    ]
                },
                "StringEquals": {
                    "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
                }
            },
            "Action": [
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:us-east-1:12345:table/authnz"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "targetservice-production-iad",
                        "targetservice2-production-iad"
                    ],
                    "dynamodb:Attributes": [
                        "role_name",
                        "to"
                    ]
                },
                "StringEquals": {
                    "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
                }
            }
        }
    ]
}

Now we’re allowing example-production-iad-auth to get the example-production-iad and targetservice-production-iad’s items, but we limit access to the from and to fields respectively. This lets us scope the assumed role from example-production-iad to targetservice-production-iad. Notice that we’re allowing access from example-production-iad to multiple target services. Based on this, we’ll want to again use STS’s scope limiting functionality to limit the assumed role’s scope from example-production-iad to whichever service it’s authenticating to.

Note that we also have a nice feature added by making the to and from on the item’s lists of iam role names. When we fetch the item we can examine the returned from field to ensure the sender is still in the list, letting us immediately revoke access from a service.

There’s still a few more problems with this solution:

  1. We can’t have future scoped auth tokens. STS only supports assumed roles that are created now and expire within 15 minutes to 1 hour. This is a hard limitation at this point in time. For things like far in the future enqueued callback jobs, this is problematic.
  2. We have to create two roles for every service. Assuming 100 services and 3 environments, that’s a lot of roles to maintain (though with good orchestration, that’s not a major problem).
  3. For every auth we’re doing a batch-get of two dynamodb items. This could get expensive quick. We could cache a role for the duration of its expiration, but only the caller knows this. You can’t lookup the expiration of a token. At best we can cache this for whatever period of time we’re willing to accept the risk for.

The nicest thing about this solution is that it has built-in key rotation. STS assumed roles are limited to a minimum lifetime of 15 minutes and a maximum of 1 hour, there’s no rate limiting on assuming roles and there’s no cost associated with assuming roles.

It’s of course possible to extend this concept to authorization as well. We can add a policy field to our dynamo items. When a service authenticates a request, it can also get back a set of actions the requester is allowed to perform on which resources. Basically we’re extending STS and IAM completely at that point.

One thing you may be asking yourself is: do we actually need to use DynamoDB here? If we’re doing authentication by making a call to a resource protected by fine grained access policy, then we can use any service that supports this, including S3, which is incredibly cheap for calls and has relatively high rate limits. The biggest reason for using DynamoDB is that you get reliable performance guarantees and fairly low latency. S3’s latency is often quite high and the latency is also completely unpredictable. If you’re needing to auth every request, it’s necessary to know how much latency overhead you’re adding.

This auth experiment was something I did as a hackathon project over a couple days. Since then I’ve found a much better method of accomplishing service to service authentication using AWS, and that’s by abusing KMS. I’ll get into that in my next blog post.

Using development branch SaltStack python modules in the stable release

SaltStack has a lot of development work occurring and sometimes you want early access to features. Thankfully, for a number of Salt’s features it’s possible to very easily bring them into the stable release without needing to maintain a fork or a non-standard version.

Salt is built to be very modular, so that you can write/run your own custom modules. This same mechanism works for overriding the core modules as well. I’ll explain how this works in a masterless setup (since that’s what I use), but the process is pretty similar for master/minion setups as well.

Salt’s minion configuration has an option module_dirs that lets you define extra modules. Salt will look inside of this directory for grains, module, pillar, states and utils directories. Inside of each of those directories are python files that are associated with each type of module. utils is a bit of a special case. In Salt 2015.5 (Lithium) a new module abstraction layer was added to be able to override common utility code that other modules may use as common code. Let’s backport boto_kms as an example.

First, let’s add a modules directory, in /etc/salt/minion:

module_dirs:
  - /srv/salt/extra_modules

Now, we’ll copy some modules from the develop branch of salt:

$ cd git/salt/salt

$ mkdir /srv/salt/extra_modules/modules
$ cp modules/boto_kms.py /srv/salt/extra_modules/modules

$ mkdir /srv/salt/extra_modules/states
$ cp states/boto_kms.py /srv/salt/extra_modules/states

$ mkdir /srv/salt/extra_modules/utils
$ cp utils/boto.py /srv/salt/extra_modules/utils

Now we can run boto_kms:

$ salt-call boto_kms.describe_key 'alias/my-master-key'
local:
    ----------
    key_metadata:
        ----------
        AWSAccountId:
            12345
        Arn:
            arn:aws:kms:us-east-1:12345:key/76g7gg7g-778g-8h9h-hh8h-n99n9n9n9n
        CreationDate:
            1423687326.34
        Description:
            my-master-key
        Enabled:
            True
        KeyId:
            76g7gg7g-778g-8h9h-hh8h-n99n9n9n9n
        KeyUsage:
            ENCRYPT_DECRYPT

In general we try to keep the boto modules in a relatively stable state in the develop branch. Lyft’s development model for the boto modules is to upstream changes to the develop branch first, then override the modules in our own repos after they’ve been merged into develop. This also means that we try our best to ensure the boto modules are backwards compatible with the stable release at all times.

KMS support added to SaltStack Beryllium (development) branch

In the development branch of SaltStack (to be the Beryllium release) I’ve added the boto_kms state and execution modules. This allows you to manage KMS master keys, their policies, key rotation, and other attributes via states. It also allows you to make KMS calls from other state and execution modules.

Here’s an example of managing a key and its attributes via a state:

Ensure my-master-key is managed:
  boto_kms.key_present:
    - name: my-master-key
    - policy:
        Id: key-consolepolicy
        Statement:
          - Action: 'kms:*'
            Effect: Allow
            Principal:
              AWS:
                - 'arn:aws:iam::12345:user/rlane'
            Resource: '*'
            Sid: Enable IAM User Permissions
          - Action:
              - kms:Describe*
              - kms:Put*
              - kms:Create*
              - kms:Update*
              - kms:Enable*
              - kms:Revoke*
              - kms:List*
              - kms:Get*
              - kms:Disable*
              - kms:Delete*
            Effect: Allow
            Principal:
              AWS:
                - 'arn:aws:iam::12345:root'
            Resource: '*'
            Sid: Allow access for Key Administrators
          - Action:
              - kms:DescribeKey
              - kms:GenerateDataKey*
              - kms:Encrypt
              - kms:ReEncrypt*
              - kms:Decrypt
            Effect: Allow
            Principal:
              AWS:
                - 'arn:aws:iam::12345:role/my-service'
            Resource: '*'
            Sid: Allow use of the key
          - Action:
              - kms:ListGrants
              - kms:CreateGrant
              - kms:RevokeGrant
            Condition:
              Bool:
                'kms:GrantIsForAWSResource': true
            Effect: Allow
            Principal:
              AWS:
                - 'arn:aws:iam::12345:user/rlane'
            Resource: '*'
            Sid: Allow attachment of persistent resources
        Version: '2012-10-17'
    - description: 'Testing key. Feel free to disable.'
    - key_rotation: False
    - enabled: True

Note that you should be very careful when defining the policy for a key. It’s apparently possible to create a key that even your root user can’t access. If you can’t access the key, then you also can’t modify the policy, so you have a permanently broken key (I created 4 broken keys while creating these modules).

Though this module is written for the development branch the state module is API stable and both modules are stable enough for use. The execution module’s API may change slightly before release. It’s possible to use this module in the 2015.5 (Lithium) release of salt, by including it as a custom module.

Investigating local queuing: Redis, NSQ and LMDB

Systems designed for cloud services assume instances can die at any time, so they’re written to defend against this. It’s important to also remember that networks in cloud services are also incredibly unreliable, and often much less reliable than the instances themselves. When considering a design, it’s important to remember that a node can be partitioned from other services and possibly for long periods of time.

One easy consideration here is logs (including stats and analytics events). We want to ensure delivery of logs, but we also don’t want delivery to affect service operation.

There’s lots of ways to handle this. Our original solution was to write logs to files, then to forward them along with logstash. We were doing this for bulk logs and for analytics events. However, logstash was using considerable resources per-node, so we switched to a local Redis daemon and a local python daemon (using gevent) to forward analytics events.

For short partition times a local Redis daemon with a worker is quite effective. Delivery is quick and the queue stays empty. For long partition times (or a long failure in a remote service) we’d continue serving requests, but at some point Redis would run out of memory and we could start dropping events.

We’ve been really happy with the Redis based solution. To date we haven’t had a partition event (either network failure or service failure) long enough for us to worry, but we also had a mismash of solutions for handling analytics events (and partitioning) across our services and wanted a standard solution for the problem. We had the option of rolling the local Redis solution out to everything, or going with something that was a bit more robust.

We made a choice to do a bit of investigation into options that were in-memory, but could go to disk when the data-set grows past memory limits. I won’t go too much into the details of the investigation (sorry), but we eventually narrowed the choice to NSQ.

During this same time period I had been solving another issue using LMDB, a memory-mapped database that’s thread-safe and multiprocess-safe. I wondered if we could avoid running a daemon for the queue at all, and simply have the processes push and pop from LMDB. Less daemons can mean less work and fewer possible failure points.

Before going too far into LMDB I also considered some other memory-mapped databases, but most explicitly state that they’re only thread safe and shouldn’t be used multi-process (like LevelDB and its variants). BDB could be a consideration, but its licensing change to AGPL3 makes it a bit toxic.

Initial testing for LMDB was promising. Write speeds were more than adequate, even with the writes being serialized across processes. Library support was generally adequate, even across languages. However, the major consideration was deadlocks across processes. LMDB claims it supports multi-process concurrency, which is true assuming perfect conditions.

With LMDB, reads are never blocked, but writes are serial, using a mutually exclusive lock at the database level. The lock is taken when a write transaction starts. Once a write transaction starts, all other threads or processes waiting for the write lock will block. If a process starts a transaction and exits uncleanly, any other process that was waiting on the lock will block indefinitely.

In LMDB’s development branch support has been added for robust mutexes, which solves this problem; however, it’s not available in a stable release and I also can’t seem to find any information about robust mutex support across containers, which would be necessary for this solution to work for us in the distant future.

LMDB was a fun diversion and mental exercise, but wasn’t an ideal solution for this. After spending a couple days on exploring LMDB I moved on to a product we had been wanting to explore for a while: NSQ.

NSQ is a realtime distributed messaging platform. In our use-case we’re only using it for local queuing, though. It’s really well suited for it. Performance for our use case was more than adequate and library support is reasonable. Even in the cases where there’s no libraries, the protocol for writes is simple and can either be TCP or HTTP based. The language support in python is good, assuming you’re using tornado, but the support for gevent isn’t wonderful. There’s a fork of the bitly python library that has gevent support, but it hasn’t been updated in a while and it looks like it was meant as a temporary project to make the bitly library more generic and that effort hasn’t been fully followed through.

From the consumer side we’re using the same in-house custom python daemon, adapted for NSQ, using the forked nsq-py project. The fork met the needs of our use-case, though it had issues with stale connections (which we’ve fixed).

The biggest benefit we’ve gained from the switch is that we can backoff to disk in case of long partitions. That said, there’s a lot of options that we have now as well. NSQ has a healthy suite of utilities. We could replace our custom python daemon with nsq_to_http, we could listen to a topic from multiple channels for a fast path (off to http) and a slow path (off to S3) for events, and we could forward from the local NSQ to centralized NSQs using nsq_to_nsq. Additionally the monitoring of NSQ is quite good. There’s a really helpful CLI utility nsq_stat for quick monitoring and by default NSQ ships stats off to statsd.

NSQ doesn’t seem to have a robust method of restarting or reloading for configuration changes, but we will rarely need to restart the daemon. NSQ process restarts are generally less than one second, so for services pushing into NSQ we do retries with backoff and take the latency hit associated with it.

DynamoDB support in SaltStack 2015.2

In the 2015.2 SaltStack release we’ve added the boto_dynamodb execution module and boto_dynamodb state module. This allows you to create DynamoDB tables via states:

Ensure DynamoDB table exists:
  boto_dynamodb.present:
    - table_name: {{ grains.cluster_name }}
    - read_capacity_units: 10
    - write_capacity_units: 10
    - hash_key: id
    - hash_key_data_type: S
    - global_indexes:
      - data_type_date_index:
        - name: "data_type_date_index"
        - read_capacity_units: 10
        - write_capacity_units: 10
        - hash_key: data_type
        - hash_key_data_type: S
        - range_key: modified_date
        - range_key_data_type: S
      - data_type_revision_index:
        - name: "data_type_revision_index"
        - read_capacity_units: 10
        - write_capacity_units: 10
        - hash_key: data_type
        - hash_key_data_type: S
        - range_key: revision
        - range_key_data_type: N

Note that at this point the module will only create and delete tables and indexes. It doesn’t currently support dynamically adding and removing indexes or changing read or write capacity units. These are features we’d love to see, and will likely add in the future. If you’d like to beat us to this, please send in pull requests!


Want to help us write and upstream software like this? Apply for a position at Lyft. If you want to work directly with me, apply for a DevOps Engineer, Senior DevOps Engineer, or Senior Platform Engineer position.

Splunk saved search state and execution module support in SaltStack

We (Lyft) believe strongly in the concept of infrastructure as code. If it isn’t automated, it isn’t finished. This belief also applies to our monitoring and alerting. We’re using Splunk saved searches for portions of our alerting and want to ensure that our developers can quickly and easily define alarms in a standard way to be able to share alarms between services.

We’ve added the splunk_search execution module and splunk_search state module to the 2015.2 Saltstack release (in release candidate status at the time of this writing) so that we can manage our searches via orchestration.

This lets us define a saved search with an alarm, as a state, like so:

Manage splunk search {{ grains.service_group }} no call volume:
  splunk_search.present:
  - name: {{ grains.service_group }} no call volume
  - action.email.format: plain
  - action.email.inline: '1'
  - action.email.sendresults: '1'
  - action.email.to: {{ grains.service_group }}@myorg.pagerduty.com
  - actions: email
  - alert.expires: 1d
  - alert.severity: '4'
  - alert.suppress: '1'
  - alert.suppress.period: 30m
  - alert.track: '1'
  - alert_comparator: greater than
  - alert_threshold: '0'
  - alert_type: number of events
  - cron_schedule: '*/5 * * * *'
  - description: '**MANAGED BY ORCHESTRATION** Fires when {{ grains.service_group }} has no volume for X minutes'
  - dispatch.earliest_time: -6m
  - dispatch.latest_time: -1m
  - dispatch.ttl: 1p
  - is_scheduled: '1'
  - search: 'index=* source="*access.log" host="{{ grains.service_group }}*" | regex method="GET|POST|PUT|DELETE" | stats count as count | where count = 0'

This saved search will send an alert to pager duty if the service has no call volume for a period of time across all of its nodes.

Like all of our modules, these were written mostly for our use-cases, but we hope they’re useful to you as well. Please contribute back if there are any features you need!


Want to help us write and upstream software like this? Apply for a position at Lyft. If you want to work directly with me, apply for a DevOps Engineer, Senior DevOps Engineer, or Senior Platform Engineer position.

Grafana dashboard orchestration using SaltStack

As mentioned in my post on Cloudwatch alarms, we (Lyft) believe that it should be easy to do the right thing and difficult to do the wrong thing. We operate on the concept “If you build it, you run it.” Running your own service isn’t easy, if you don’t have the right tools to help you, though.

We’re using Graphite and Grafana for time series data dashboards. With a consistent configuration management pattern all new services start with their data flowing into Graphite. Dashboard management is tricky, though. We encourage teams to add custom metrics to their services and use them in panels and rows for their services, but we also want to provide a number of consistent panels/rows for all services. We also want to avoid making teams go between multiple dashboards to monitor their own services.

To make it easy for services to manage their own dashboards we’re using Grafana backed with Elasticsearch. Teams can add new metrics to their services, then add rows and panels to their dashboards. Our services are very consistent, though, and there’s a number of dashboards that basically all services need, and a subset of dashboards that services of a specific type need. So, what we want is a set of managed dashboards that can easily defined in code.

To handle this, we’ve added a grafana state module and an elasticsearch execution module to the 2015.2 SaltStack release (in release candidate at the time of this writing). The Grafana state lets you manage rows in dashboards. In the case no dashboard exists the module will create the dashboard, but will only manage rows after that point. Dashboards and rows can be defined directly through the state, but since dashboard definitions can be verbose (and laborious to define) it’s also possible to define them through specified pillar keys, or through default pillar keys.

Here’s an example of defining a dashboard through a state:

    Ensure myservice dashboard is managed:
      grafana.dashboard_present:
        - name: myservice
        - dashboard:
            annotations:
              enable: true
              list: []
            editable: true
            hideAllLegends: false
            hideControls: false
            nav:
              - collapse: false
                enable: true
                notice: false
                now: true
                refresh_intervals:
                  - 10s
                  - 30s
                  - 1m
                  - 5m
                  - 15m
                  - 30m
                  - 1h
                  - 2h
                  - 1d
                status: Stable
...
        - rows:
            - collapse: false
              editable: true
              height: 150px
              title: System Health
              panels:
                - aliasColors: {}
                  id: 200000
                  annotate:
                    enable: false
                  bars: false
                  datasource: null
                  editable: true
                  error: false
                  fill: 7
                  grid:
                    leftMax: 100
                    leftMin: null
                    rightMax: null
                    rightMin: null
                    threshold1: 60
                    threshold1Color: rgb(216, 27, 27)
...

This is just a small excerpt from what would be a very, very long dashboard definition. Adding this to every service would be really painful and difficult to maintain. So, let’s move this into the pillars:

grafana.sls:

grafana_dashboards:
  default:
    annotations:
      enable: true
      list: []
    editable: true
    hideAllLegends: false
    hideControls: false
    nav:
      - collapse: false
        enable: true
        notice: false
        now: true
        refresh_intervals:
          - 1m
          - 5m
          - 15m
          - 30m
...

grafana_rows:
  service:
    - collapse: false
      editable: false
      height: 25px
      title: "Panels/rows marked with (M) are managed by orchestration. Don't edit them!"
      panels: []
      showTitle: true
    - collapse: false
      editable: false
      height: 150px
      title: {{ grains.service_name }} (M)
      panels:
        - aliasColors: {}
          aliasYAxis: {}
          annotate:
            enable: false
          bars: false
          datasource: null
          editable: false
...
  systemhealth:
    - collapse: false
      editable: false
      height: 150px
      title: System Health (M)
      showTitle: true
      panels:
        - aliasColors: {}
          annotate:
            enable: false
          bars: false
...

Notice that we’re making it possible to define multiple dashboards and multiple rows, by making them keys in the related dictionaries. Let’s see how this is used:

Ensure {{ grains.service_name }} grafana dashboard is managed:
  grafana.dashboard_present:
    - name: {{ grains.service_name }}
    - dashboard_from_pillar: 'grafana_dashboards:default'
    - rows_from_pillar:
      - 'grafana_rows:service'
      - 'grafana_rows:systemhealth'
      ...

Now with a very small amount of code in a service’s orchestration, the service can have a default dashboard with a managed set of rows. The best part is that if we need to modify these rows we can now modify them in a single place and all services will have their dashboards updated to look like every other service.

We’re really excited to share this back with the community and hope that people will enjoy it and contribute back with features they’d like added. Here’s one example of an addition we’d love to see:

It would be nice to be able to define the dashboards through file templates, rather than just through pillars, since you can pass context from the state into file templates, whereas it’s not possible to do so through pillars.


Want to help us write and upstream software like this? Apply for a position at Lyft. If you want to work directly with me, apply for a DevOps Engineer, Senior DevOps Engineer, or Senior Platform Engineer position.

SaltConf15: Sequentially Ordered Execution in SaltStack talk and slides

Here’s another talk that I gave at SaltConf15. It’s about sequentially ordered Salt and if you’ve read my blog posts on it, this probably won’t add a lot of technical info, but it’ll give a lot more context behind why you’d want to use Salt in a sequentially ordered way. Enjoy!

Sequentially Ordered Execution in SaltStack

Here’s the slides.. Note: though my blog is creative commons licensed, the slides are all rights reserved (sorry!).

SaltConf15: Masterless SaltStack at Scale talk and slides

I gave a talk at SaltConf15. It’s about masterless SaltStack, AWS orchestration, Docker management using Salt and other fun things. Enjoy!

Here’s the slides.. Note: though my blog is creative commons licensed, the slides are all rights reserved (sorry!).