Attach read replicas, EC2 subnets, and EC2 security groups to RDS ins…

…tances (#20) * Add RDS DNS endpoint ingestion (#6) * Add RDS DNS endpoint ingestion * Update schema for RDS endpoint fields * Added logging if endpoint is missing * Add flake8 linter to default unit tests. (#8) * Add flake8 lint tests. * add newline * fix flake errors in rds * Attach RDS read replicas to each other (#7) * Attach read replicas * Update schema doc for IS_READ_REPLICA_OF * Fix format strings * Add lastupdated field to rds instance * Add docs on extending with Analysis Jobs (#14) * Add link to analysis job documentation, add link to angrypuppy (just for completeness sake:)) * Tabs to spaces * PR comments * Attach EC2 Security Groups to RDS Instances (#9) * Add EC2 security group relationship to RDS * Update schema illustration * Fix format strings * Add lastupdated field to rds instance * Denote indexed fields with bolded notation in the schema docs * Fix cleanup job to handle orphaned relationships. Fix EC2 sec group query to use indexed field (id). * Attach EC2Subnets to RDSInstances (#10) * Attach DBSubnetGroup to RDSInstance. Attach EC2Subnets to DBSubnetGroups. * Add lastupdated field to rds instance * Add docs on DBSubnetGroups * Clean up orphaned rels between DBSubnetGroups and EC2Subnets * Add arn to db subnet group * MERGE subnets and security groups instead of MATCHing them. Refactor DB Subnet Group ARN out to a function. * ARN fix. Indent fix. Make it more obvious where failures come from by removing extra if-elses. * Increment prerelease version to 0.2.0rc1 (#17) * rc2 (#19)
lyft · Mar 14, 2019 · 6763580 · 6763580
1 parent 405f866
commit 6763580
Show file tree

Hide file tree

Showing 11 changed files with 463 additions and 76 deletions.
diff --git a/Makefile b/Makefile
@@ -1,2 +1,7 @@
-test:
+test: test_lint test_unit
+
+test_lint:
+	flake8
+
+test_unit:
 	pytest tests/unit
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ Cartography aims to enable a broad set of exploration and automation scenarios.
 
 Service owners can generate asset reports, Red Teamers can discover attack paths, and Blue Teamers can identify areas for security improvement.   All can benefit from using the graph for manual exploration through a web frontend interface, or in an automated fashion by calling the APIs.
 
-Cartography is not the only [security](https://github.com/dowjones/hammer) [graph](https://github.com/BloodHoundAD/BloodHound) [tool](https://github.com/Netflix/security_monkey) [out](https://github.com/duo-labs/cloudmapper) there, but it differentiates itself by being fully-featured yet generic and extensible enough to help make anyone better understand their risk exposure, regardless of what platforms they use.  Rather than being focused on one core scenario or attack vector like the other linked tools, Cartography focuses on flexibility and exploration.
+Cartography is not the only [security](https://github.com/dowjones/hammer) [graph](https://github.com/BloodHoundAD/BloodHound) [tool](https://github.com/Netflix/security_monkey) [out](https://github.com/vysecurity/ANGRYPUPPY) [there](https://github.com/duo-labs/cloudmapper), but it differentiates itself by being fully-featured yet generic and [extensible](docs/writing-analysis-jobs.md) enough to help make anyone better understand their risk exposure, regardless of what platforms they use.  Rather than being focused on one core scenario or attack vector like the other linked tools, Cartography focuses on flexibility and exploration.
 
 You can learn more about the story behind Cartography in our [presentation at BSidesSF 2018](https://www.youtube.com/watch?v=8TV9TSNh7pA).
 
@@ -140,6 +140,10 @@ RETURN a.name as AWSAccount, count(rds) as UnencryptedInstances
 If you want to learn more in depth about Neo4j and Cypher queries you can look at [this tutorial](https://neo4j.com/developer/cypher-query-language/) and see this [reference card](https://neo4j.com/docs/cypher-refcard/current/).
 
 
+## Extending Cartography with Analysis Jobs
+You can add your own custom attributes and relationships without writing Python code!  Here's [how](docs/writing-analysis-jobs.md).
+
+
 ## Contributing
 
 ### Code of conduct

diff --git a/cartography/data/jobs/cleanup/aws_import_rds_instances_cleanup.json b/cartography/data/jobs/cleanup/aws_import_rds_instances_cleanup.json
@@ -1,8 +1,47 @@
 {
-  "statements": [{
-    "query": "MATCH (n:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE n.lastupdated <> {UPDATE_TAG} WITH n LIMIT {LIMIT_SIZE} DETACH DELETE (n) return COUNT(*) as TotalCompleted",
-    "iterative": true,
-    "iterationsize": 100
-  }],
+  "statements": [
+    {
+      "query": "MATCH (sng:DBSubnetGroup)<-[:MEMBER_OF_DB_SUBNET_GROUP]-(:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE sng.lastupdated <> {UPDATE_TAG} WITH sng LIMIT {LIMIT_SIZE} DETACH DELETE (sng) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete DBSubnetGroups that no longer exist and DETACH them from their RDS instances."
+    },
+    {
+      "query": "MATCH (:DBSubnetGroup)<-[r:MEMBER_OF_DB_SUBNET_GROUP]-(:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete the link between orphaned DB Subnet Groups and their RDS Instances."
+    },
+    {
+      "query": "MATCH (:EC2Subnet)<-[r:RESOURCE]-(:DBSubnetGroup)<-[:MEMBER_OF_DB_SUBNET_GROUP]-(:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete the link between orphaned DB Subnet Groups and their EC2 Subnets."
+    },
+    {
+      "query": "MATCH (n:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE n.lastupdated <> {UPDATE_TAG} WITH n LIMIT {LIMIT_SIZE} DETACH DELETE (n) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete RDS instances that no longer exist and DETACH them from all nodes they were previously connected to."
+    },
+    {
+      "query": "MATCH (:RDSInstance)<-[r:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "If an RDS instance still exists but is no longer associated with its old AWS Account, delete the relationship between them."
+    },
+    {
+      "query": "MATCH (:EC2SecurityGroup)<-[r:MEMBER_OF_EC2_SECURITY_GROUP]-(:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "If an RDS instance still exists and is no longer a part of its old EC2SecurityGroup, delete the relationship between them."
+    },
+    {
+      "query": "MATCH (:RDSInstance)<-[r:IS_READ_REPLICA_OF]-(:RDSInstance)<-[:RESOURCE]-(:AWSAccount{id: {AWS_ID}}) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "If an RDS instance still exists and is no longer a read replica of another RDS instance, delete the relationship between them."
+    }
+  ],
   "name": "cleanup RDSInstance"
 }
diff --git a/cartography/intel/aws/rds.py b/cartography/intel/aws/rds.py
@@ -48,52 +48,196 @@ def load_rds_instances(neo4j_session, data, region, current_aws_account_id, aws_
     rds.preferred_backup_window = {PreferredBackupWindow},
     rds.latest_restorable_time = {LatestRestorableTime},
     rds.preferred_maintenance_window = {PreferredMaintenanceWindow},
-    rds.backup_retention_period = {BackupRetentionPeriod}
+    rds.backup_retention_period = {BackupRetentionPeriod},
+    rds.endpoint_address = {EndpointAddress},
+    rds.endpoint_hostedzoneid = {EndpointHostedZoneId},
+    rds.endpoint_port = {EndpointPort},
+    rds.lastupdated = {aws_update_tag}
     WITH rds
     MATCH (aa:AWSAccount{id: {AWS_ACCOUNT_ID}})
     MERGE (aa)-[r:RESOURCE]->(rds)
     ON CREATE SET r.firstseen = timestamp()
     SET r.lastupdated = {aws_update_tag}
     """
+    read_replicas = []
+
     for rds in data.get('DBInstances', []):
         instance_create_time = str(rds['InstanceCreateTime']) if 'InstanceCreateTime' in rds else None
         latest_restorable_time = str(rds['LatestRestorableTime']) if 'LatestRestorableTime' in rds else None
+
+        ep = _validate_rds_endpoint(rds)
+
+        # Keep track of instances that are read replicas so we can attach them to their source instances later
+        if rds.get("ReadReplicaSourceDBInstanceIdentifier"):
+            read_replicas.append(rds)
+
         neo4j_session.run(
             ingest_rds_instance,
             DBInstanceArn=rds['DBInstanceArn'],
-            DBInstanceIdentifier=rds.get('DBInstanceIdentifier', None),
-            DBInstanceClass=rds.get('DBInstanceClass', None),
-            Engine=rds.get('Engine', None),
-            MasterUsername=rds.get('MasterUsername', None),
-            DBName=rds.get('DBName', None),
+            DBInstanceIdentifier=rds['DBInstanceIdentifier'],
+            DBInstanceClass=rds.get('DBInstanceClass'),
+            Engine=rds.get('Engine'),
+            MasterUsername=rds.get('MasterUsername'),
+            DBName=rds.get('DBName'),
             InstanceCreateTime=instance_create_time,
-            AvailabilityZone=rds.get('AvailabilityZone', None),
-            MultiAZ=rds.get('MultiAZ', None),
-            EngineVersion=rds.get('EngineVersion', None),
-            PubliclyAccessible=rds.get('PubliclyAccessible', None),
-            DBClusterIdentifier=rds.get('DBClusterIdentifier', None),
-            StorageEncrypted=rds.get('StorageEncrypted', None),
-            KmsKeyId=rds.get('KmsKeyId', None),
-            DbiResourceId=rds.get('DbiResourceId', None),
-            CACertificateIdentifier=rds.get('CACertificateIdentifier', None),
-            EnhancedMonitoringResourceArn=rds.get('EnhancedMonitoringResourceArn', None),
-            MonitoringRoleArn=rds.get('MonitoringRoleArn', None),
-            PerformanceInsightsEnabled=rds.get('PerformanceInsightsEnabled', None),
-            PerformanceInsightsKMSKeyId=rds.get('PerformanceInsightsKMSKeyId', None),
-            DeletionProtection=rds.get('DeletionProtection', None),
-            BackupRetentionPeriod=rds.get('BackupRetentionPeriod', None),
-            PreferredBackupWindow=rds.get('PreferredBackupWindow', None),
+            AvailabilityZone=rds.get('AvailabilityZone'),
+            MultiAZ=rds.get('MultiAZ'),
+            EngineVersion=rds.get('EngineVersion'),
+            PubliclyAccessible=rds.get('PubliclyAccessible'),
+            DBClusterIdentifier=rds.get('DBClusterIdentifier'),
+            StorageEncrypted=rds.get('StorageEncrypted'),
+            KmsKeyId=rds.get('KmsKeyId'),
+            DbiResourceId=rds.get('DbiResourceId'),
+            CACertificateIdentifier=rds.get('CACertificateIdentifier'),
+            EnhancedMonitoringResourceArn=rds.get('EnhancedMonitoringResourceArn'),
+            MonitoringRoleArn=rds.get('MonitoringRoleArn'),
+            PerformanceInsightsEnabled=rds.get('PerformanceInsightsEnabled'),
+            PerformanceInsightsKMSKeyId=rds.get('PerformanceInsightsKMSKeyId'),
+            DeletionProtection=rds.get('DeletionProtection'),
+            BackupRetentionPeriod=rds.get('BackupRetentionPeriod'),
+            PreferredBackupWindow=rds.get('PreferredBackupWindow'),
             LatestRestorableTime=latest_restorable_time,
-            PreferredMaintenanceWindow=rds.get('PreferredMaintenanceWindow', None),
+            PreferredMaintenanceWindow=rds.get('PreferredMaintenanceWindow'),
+            EndpointAddress=ep.get('Address'),
+            EndpointHostedZoneId=ep.get('HostedZoneId'),
+            EndpointPort=ep.get('Port'),
             Region=region,
             AWS_ACCOUNT_ID=current_aws_account_id,
             aws_update_tag=aws_update_tag
         )
+        _attach_ec2_security_groups(neo4j_session, rds, aws_update_tag)
+        _attach_ec2_subnet_groups(neo4j_session, rds, region, current_aws_account_id, aws_update_tag)
+    _attach_read_replicas(neo4j_session, read_replicas, aws_update_tag)
+
+
+def _attach_ec2_subnet_groups(neo4j_session, instance, region, current_aws_account_id, aws_update_tag):
+    """
+    Attach RDS instance to its EC2 subnets
+    """
+    attach_rds_to_subnet_group = """
+    MERGE(sng:DBSubnetGroup{id:{sng_arn}})
+    ON CREATE SET sng.firstseen = timestamp()
+    SET sng.name = {DBSubnetGroupName},
+    sng.vpc_id = {VpcId},
+    sng.description = {DBSubnetGroupDescription},
+    sng.status = {DBSubnetGroupStatus},
+    sng.lastupdated = {aws_update_tag}
+    WITH sng
+    MATCH(rds:RDSInstance{id:{DBInstanceArn}})
+    MERGE(rds)-[r:MEMBER_OF_DB_SUBNET_GROUP]->(sng)
+    ON CREATE SET r.firstseen = timestamp()
+    SET r.lastupdated = {aws_update_tag}
+    """
+    db_sng = instance['DBSubnetGroup']
+    arn = _get_db_subnet_group_arn(region, current_aws_account_id, db_sng['DBSubnetGroupName'])
+    neo4j_session.run(
+        attach_rds_to_subnet_group,
+        sng_arn=arn,
+        DBSubnetGroupName=db_sng['DBSubnetGroupName'],
+        VpcId=db_sng.get("VpcId"),
+        DBSubnetGroupDescription=db_sng.get('DBSubnetGroupDescription'),
+        DBSubnetGroupStatus=db_sng.get('SubnetGroupStatus'),
+        DBInstanceArn=instance['DBInstanceArn'],
+        aws_update_tag=aws_update_tag
+    )
+    _attach_ec2_subnets_to_subnetgroup(neo4j_session, db_sng, region, current_aws_account_id, aws_update_tag)
+
+
+def _attach_ec2_subnets_to_subnetgroup(neo4j_session, db_subnet_group, region, current_aws_account_id, aws_update_tag):
+    """
+    Attach EC2Subnets to the DB Subnet Group.
+
+    From https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html:
+    `Each DB subnet group should have subnets in at least two Availability Zones in a given region. When creating a DB
+    instance in a VPC, you must select a DB subnet group. Amazon RDS uses that DB subnet group and your preferred
+    Availability Zone to select a subnet and an IP address within that subnet to associate with your DB instance.`
+    """
+    attach_subnets_to_sng = """
+    MATCH(sng:DBSubnetGroup{id:{sng_arn}})
+    MERGE(subnet:EC2Subnet{subnetid:{SubnetIdentifier}})
+    ON CREATE SET subnet.firstseen = timestamp()
+    MERGE(sng)-[r:RESOURCE]->(subnet)
+    ON CREATE SET r.firstseen = timestamp()
+    SET r.lastupdated = {aws_update_tag},
+    subnet.availability_zone = {SubnetAvailabilityZone},
+    subnet.lastupdated = {aws_update_tag}
+    """
+    for sn in db_subnet_group.get('Subnets', []):
+        subnet_id = sn.get('SubnetIdentifier')
+        arn = _get_db_subnet_group_arn(region, current_aws_account_id, db_subnet_group['DBSubnetGroupName'])
+        neo4j_session.run(
+            attach_subnets_to_sng,
+            SubnetIdentifier=subnet_id,
+            sng_arn=arn,
+            aws_update_tag=aws_update_tag,
+            SubnetAvailabilityZone=sn.get('SubnetAvailabilityZone', {}).get('Name')
+        )
+
+
+def _attach_ec2_security_groups(neo4j_session, instance, aws_update_tag):
+    """
+    Attach an RDS instance to its EC2SecurityGroups
+    """
+    attach_rds_to_group = """
+    MATCH (rds:RDSInstance{id:{RdsArn}})
+    MERGE (sg:EC2SecurityGroup{id:{GroupId}})
+    MERGE (rds)-[m:MEMBER_OF_EC2_SECURITY_GROUP]->(sg)
+    ON CREATE SET m.firstseen = timestamp()
+    SET m.lastupdated = {aws_update_tag}
+    """
+    for group in instance.get('VpcSecurityGroups', []):
+        neo4j_session.run(
+            attach_rds_to_group,
+            RdsArn=instance['DBInstanceArn'],
+            GroupId=group['VpcSecurityGroupId'],
+            aws_update_tag=aws_update_tag
+        )
+
+
+def _attach_read_replicas(neo4j_session, read_replicas, aws_update_tag):
+    """
+    Attach read replicas to their source instances
+    """
+    attach_replica_to_source = """
+    MATCH (replica:RDSInstance{id:{ReplicaArn}}),
+    (source:RDSInstance{db_instance_identifier:{SourceInstanceIdentifier}})
+    MERGE (replica)-[r:IS_READ_REPLICA_OF]->(source)
+    ON CREATE SET r.firstseen = timestamp()
+    SET r.lastupdated = {aws_update_tag}
+    """
+    for replica in read_replicas:
+        neo4j_session.run(
+            attach_replica_to_source,
+            ReplicaArn=replica['DBInstanceArn'],
+            SourceInstanceIdentifier=replica['ReadReplicaSourceDBInstanceIdentifier'],
+            aws_update_tag=aws_update_tag
+        )
+
+
+def _validate_rds_endpoint(rds):
+    """
+    Get Endpoint from RDS data structure.  Log to debug if an Endpoint field does not exist.
+    """
+    ep = rds.get('Endpoint', {})
+    if not ep:
+        logger.debug("RDS instance does not have an Endpoint field.  Here is the object: %r", rds)
+    return ep
+
+
+def _get_db_subnet_group_arn(region, current_aws_account_id, db_subnet_group_name):
+    """
+    Return an ARN for the DB subnet group name by concatenating the account name and region.
+    This is done to avoid another AWS API call since the describe_db_instances boto call does not return the DB subnet
+    group ARN.
+    Form is arn:aws:rds:{region}:{account-id}:subgrp:{subnet-group-name}
+    as per https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html
+    """
+    return f"arn:aws:rds:{region}:{current_aws_account_id}:subgrp:{db_subnet_group_name}"
 
 
-def cleanup_rds_instances(neo4j_session, common_job_parameters):
+def cleanup_rds_instances_and_db_subnet_groups(neo4j_session, common_job_parameters):
     """
-    Remove RDS graph nodes that were created from other ingestion runs
+    Remove RDS graph nodes and DBSubnetGroups that were created from other ingestion runs
     """
     run_cleanup_job('aws_import_rds_instances_cleanup.json', neo4j_session, common_job_parameters)
 
@@ -107,4 +251,4 @@ def sync_rds_instances(neo4j_session, boto3_session, regions, current_aws_accoun
         logger.info("Syncing RDS for region '%s' in account '%s'.", region, current_aws_account_id)
         data = get_rds_instance_data(boto3_session, region)
         load_rds_instances(neo4j_session, data, region, current_aws_account_id, aws_update_tag)
-    cleanup_rds_instances(neo4j_session, common_job_parameters)
+    cleanup_rds_instances_and_db_subnet_groups(neo4j_session, common_job_parameters)
diff --git a/docs/images/cartography-schema-complete-open-source.png b/docs/images/cartography-schema-complete-open-source.png
diff --git a/docs/images/exposed-internet.png b/docs/images/exposed-internet.png