Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature crawler #711

Merged
merged 66 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
6b31eb8
wip
iakov-aws Nov 6, 2023
564e978
wip
iakov-aws Nov 24, 2023
14929cb
add management of CUR fields
iakov-aws Dec 23, 2023
d057156
wip
iakov-aws Dec 23, 2023
3423e1f
better type management
iakov-aws Dec 23, 2023
b23941a
refactoring
iakov-aws Dec 23, 2023
c6caf78
fixes
iakov-aws Dec 23, 2023
d0b8578
add warning if crawler is not well configured
iakov-aws Dec 23, 2023
3508256
sort
iakov-aws Dec 23, 2023
6d8f314
fix import
iakov-aws Dec 23, 2023
3acf9a6
Merge branch 'main' into feature-crawler
iakov-aws Dec 23, 2023
1edc2d2
Merge
iakov-aws Dec 23, 2023
900a42a
refactoring cur
iakov-aws Dec 24, 2023
9839247
remove resource id dep
iakov-aws Dec 24, 2023
17db51a
remove resource_id dep
iakov-aws Dec 24, 2023
7c53141
more refactoring
iakov-aws Dec 24, 2023
c4af0fa
more refactoring
iakov-aws Dec 24, 2023
7785277
fixes
iakov-aws Dec 24, 2023
3f54b03
merge
iakov-aws Dec 24, 2023
d129f30
fixes
iakov-aws Dec 24, 2023
ca45cdc
Merge branch 'feature-manage-cur-fields' into feature-crawler
iakov-aws Dec 24, 2023
3b7e6b0
add cur creation
iakov-aws Dec 25, 2023
bb3d1d1
fixes
iakov-aws Dec 25, 2023
a25e737
fixes
iakov-aws Dec 25, 2023
113b92f
fixes
iakov-aws Dec 25, 2023
e25d45b
better creation workflow and messages plus database creation
iakov-aws Dec 26, 2023
cbd7d4a
doc and workflow fixes
iakov-aws Dec 26, 2023
4246a5b
doc and workflow fixes
iakov-aws Dec 26, 2023
60d33f9
add cur.yaml
iakov-aws Dec 26, 2023
a41d21c
add QS DS role management
iakov-aws Dec 26, 2023
4a78973
refactor iam
iakov-aws Dec 26, 2023
f8a9fab
allow customer to create dataset with new QS role
iakov-aws Dec 27, 2023
6e46436
release 0.3.0
iakov-aws Dec 27, 2023
16d427f
Merge branch 'main' into feature-crawler
iakov-aws Dec 27, 2023
30f623b
refactor export
iakov-aws Dec 28, 2023
e4b2902
lint
iakov-aws Dec 30, 2023
36a72e9
lint
iakov-aws Dec 30, 2023
0df6174
fix cur yaml
iakov-aws Jan 24, 2024
5eca19b
fix role creation
iakov-aws Jan 24, 2024
37b2e40
workaround IAM
iakov-aws Jan 24, 2024
da1436a
wip
iakov-aws Jan 24, 2024
3dc81a3
merge
iakov-aws Jan 24, 2024
a55b8a9
more fixes
iakov-aws Jan 24, 2024
1a4a4c1
various fixes for the crawler and roles
iakov-aws Jan 25, 2024
3191172
various fixes for the crawler and roles
iakov-aws Jan 25, 2024
520fd35
Update cid/helpers/quicksight/__init__.py
iakov-aws Jan 31, 2024
4a1c6a8
Update cid/cli.py
iakov-aws Jan 31, 2024
98d16cc
Update cid/commands/init_qs.py
iakov-aws Jan 31, 2024
be8c3ae
Update cid/commands/init_qs.py
iakov-aws Jan 31, 2024
bfbf9a0
Merge branch 'main' into feature-crawler
iakov-aws Jan 31, 2024
ea82b5b
review fixes
iakov-aws Jan 31, 2024
67d1e06
wip
iakov-aws Feb 1, 2024
1da954d
align roles
iakov-aws Mar 10, 2024
8e3e6b1
merge
iakov-aws Mar 10, 2024
f5c63b3
lint
iakov-aws Mar 10, 2024
8dd9f38
more fixes
iakov-aws Mar 10, 2024
bc21c90
Merge branch 'main' into feature-crawler
iakov-aws Mar 22, 2024
6e4cdcb
minor fixes
iakov-aws Mar 26, 2024
19301b0
align role names
iakov-aws Mar 26, 2024
010d9e9
merge main
iakov-aws Mar 26, 2024
28dbe5b
fix comment
iakov-aws Mar 26, 2024
f404504
merge
iakov-aws Mar 26, 2024
b1f75c3
remove print
iakov-aws Mar 26, 2024
4640404
Update cid/common.py
iakov-aws Mar 27, 2024
a4d14fb
Update cid/common.py
iakov-aws Mar 28, 2024
68a841f
merge
iakov-aws Mar 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,21 +83,28 @@ cid-cmd share
```

#### Initialize Amazon QuickSight
One time action to intialize Amazon QuickSight Enerprise Edition.
One time action to initialize Amazon QuickSight Enterprise Edition.

```bash
cid-cmd initqs
cid-cmd init-qs
```

#### Initialize CUR
One time action to initialize Athena table and Crawler from s3 with CUR data.

```bash
cid-cmd init-cur
```

#### Delete Dashboard and all dependencies unused by other
Delete Dashboards and all dependencies unused by other CID-managed dashboards.(including QuickSight datasets, Athena views and tables)
```bash
cid-cmd delete
```

#### Delete Command Options:
```
--dashboard-id TEXT QuickSight dashboard id
--dashboard-id TEXT QuickSight dashboard id
--athena-database TEXT Athena database
```

Expand Down
110 changes: 44 additions & 66 deletions cfn-templates/cid-cfn.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
AWSTemplateFormatVersion: '2010-09-09'
Description: Deployment of Cloud Intelligence Dashboards v0.2.47
Description: Deployment of Cloud Intelligence Dashboards v0.3.0
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
Expand Down Expand Up @@ -324,7 +324,7 @@ Resources:
FunctionName: !Sub 'CidSpiceRefreshLambda${Suffix}'
Role: !GetAtt SpiceRefreshExecutionRole.Arn
Description: 'Refresh QuickSight DataSets for CID'
Runtime: python3.10
Runtime: python3.11
Architectures: [ x86_64 ] #Compatible with arm64 but it is not supported in all regions
MemorySize: 128
Timeout: 60
Expand Down Expand Up @@ -481,7 +481,7 @@ Resources:
FunctionName: !Sub CidInitialSetup-DoNotRun${Suffix}
Role: !GetAtt 'InitLambdaExecutionRole.Arn'
Description: "CID legacy setup"
Runtime: python3.10
Runtime: python3.11
Handler: 'index.lambda_handler'
Code:
ZipFile: |
Expand Down Expand Up @@ -519,7 +519,7 @@ Resources:
FunctionName: !Sub "CidCustomResourceFunctionInit-DoNotRun${Suffix}"
Role: !GetAtt 'InitLambdaExecutionRole.Arn'
Description: "Do what CFN cannot: start crawler, delete bucket with objects and delete an non empty workgroup"
Runtime: python3.10
Runtime: python3.11
Architectures: [ x86_64 ] #Compatible with arm64 but it is not supported in all regions
MemorySize: 128
Timeout: 300
Expand Down Expand Up @@ -763,7 +763,7 @@ Resources:
Role: !GetAtt 'ProcessPathLambdaExecutionRole.Arn'
FunctionName: !Sub "CidCustomResourceProcessPath-DoNotRun${Suffix}"
Description: "Do what CFN cannot: process string of path"
Runtime: python3.10
Runtime: python3.11
Architectures: [ x86_64 ] #Compatible with arm64 but it is not supported in all regions
MemorySize: 128
Timeout: 60
Expand Down Expand Up @@ -1091,15 +1091,17 @@ Resources:
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
- Sid: AllowAthenaReads
Effect: Allow
Action:
- lakeformation:GetDataAccess
- athena:ListDataCatalogs
- athena:ListDatabases
- athena:ListTableMetadata
Resource: "*" # required https://docs.aws.amazon.com/lake-formation/latest/dg/access-control-underlying-data.html
# Cannot restrict this. See https://docs.aws.amazon.com/athena/latest/ug/datacatalogs-example-policies.html#datacatalog-policy-listing-data-catalogs
- Effect: Allow
- Sid: AllowGlue
Effect: Allow
Action:
- glue:GetPartition
- glue:GetPartitions
Expand All @@ -1120,7 +1122,8 @@ Resources:
- !Sub arn:${AWS::Partition}:glue:${AWS::Region}:${AWS::AccountId}:database/optimization_data
- !Sub arn:${AWS::Partition}:glue:${AWS::Region}:${AWS::AccountId}:table/cid_data_collection/*
- !Sub arn:${AWS::Partition}:glue:${AWS::Region}:${AWS::AccountId}:database/cid_data_collection
- Effect: Allow
- Sid: AllowAthena
Effect: Allow
Action:
- athena:ListDatabases
- athena:ListDataCatalogs
Expand All @@ -1132,11 +1135,11 @@ Resources:
- athena:ListTableMetadata
- athena:GetTableMetadata
Resource:
- !Sub 'arn:${AWS::Partition}:athena:${AWS::Region}:${AWS::AccountId}:datacatalog/${GlueDataCatalog}'
- Fn::If:
- NeedDatabase
- !Sub arn:${AWS::Partition}:athena:${AWS::Region}:${AWS::AccountId}:database/${CidDatabase}
- !Sub arn:${AWS::Partition}:athena:${AWS::Region}:${AWS::AccountId}:database/${DatabaseName}
- !Sub 'arn:${AWS::Partition}:athena:${AWS::Region}:${AWS::AccountId}:datacatalog/${GlueDataCatalog}'
- Fn::If:
- NeedAthenaWorkgroup
- !Sub 'arn:${AWS::Partition}:athena:${AWS::Region}:${AWS::AccountId}:workgroup/${MyAthenaWorkGroup}'
Expand All @@ -1159,68 +1162,41 @@ Resources:
- NeedAthenaQueryResultsBucket
- !Sub 'arn:${AWS::Partition}:s3:::${MyAthenaQueryResultsBucket}/*'
- !Sub 'arn:${AWS::Partition}:s3:::${AthenaQueryResultsBucket}/*'
- Sid: AllowListBucket
Effect: Allow
Action: s3:ListBucket
Resource:
- !Sub arn:aws:s3:::${ODCPath.Bucket}
- !If
- NeedQuickSightDataSourceRoleAndCUR
- !Sub arn:aws:s3:::${CURPath.Bucket}
- !Ref "AWS::NoValue"
- Sid: AllowReadBucket
Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectVersion
Resource:
- !Sub arn:aws:s3:::${ODCPath.Bucket}/*
- !If
- NeedQuickSightDataSourceRoleAndCUR
- !Sub arn:aws:s3:::${CURPath.Bucket}/*
- !Ref "AWS::NoValue"
- !If
- NeedQuickSightDataSourceKMS
- Sid: AllowKmsDecrypt
Effect: Allow
Action:
- 'kms:Decrypt'
Resource: !Split [ ',', !Ref DataBucketsKmsKeysArns ]
- !Ref "AWS::NoValue"
Metadata:
cfn_nag:
rules_to_suppress:
- id: 'W11'
reason: "Need to use * for Lakeformation and Athena"
- id: 'W28'
reason: "Need explicit name to give permissions"
QuickSightDataSourceRolePolicyForODCBucket:
Type: AWS::IAM::Policy
Condition: NeedQuickSightDataSourceRole # We need ODC bucket even if ODC dashboards are not activated (ex: for account map)
Properties:
PolicyName: QuickSightDataSource-S3AccessODC
PolicyDocument:
Version: 2012-10-17
Statement:
- Sid: CidAllowListBucket
Effect: Allow
Action: s3:ListBucket
Resource: !Sub arn:${AWS::Partition}:s3:::${ODCPath.Bucket}
- Sid: CidAllowReadBucket
Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectVersion
Resource: !Sub arn:${AWS::Partition}:s3:::${ODCPath.Bucket}/*
Roles:
- !Ref QuickSightDataSourceRole
QuickSightDataSourceRolePolicyForCURBucket:
Type: AWS::IAM::Policy
Condition: NeedQuickSightDataSourceRoleAndCUR
Properties:
PolicyName: QuickSightDataSource-S3AccessCUR
PolicyDocument:
Version: 2012-10-17
Statement:
- Sid: CidAllowListBucket
Effect: Allow
Action: s3:ListBucket
Resource: !Sub arn:${AWS::Partition}:s3:::${CURPath.Bucket}
- Sid: CidAllowReadBucket
Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectVersion
Resource: !Sub arn:${AWS::Partition}:s3:::${CURPath.Bucket}/*
Roles:
- !Ref QuickSightDataSourceRole

KmsPolicyForQuickSightDataSourceRole:
Type: AWS::IAM::Policy
Condition: NeedQuickSightDataSourceKMS
Properties:
PolicyName: QuickSightDataSourceKmsDecryption
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- 'kms:Decrypt'
Resource: !Split [ ',', !Ref DataBucketsKmsKeysArns ]
Roles:
- !Ref QuickSightDataSourceRole

CidAthenaDataSource:
Type: AWS::QuickSight::DataSource
Expand Down Expand Up @@ -1466,7 +1442,7 @@ Resources:
FunctionName: !Sub 'CidCustomResourceDashboard${Suffix}'
Description: 'A lambda that manage create delete update of Athena views, QuickSight Datasets and dashboards using CID-CMD tool'
Role: !GetAtt CidExecRole.Arn
Runtime: python3.10
Runtime: python3.11
Architectures: [ x86_64 ] #Compatible with arm64 but it is not supported in all regions
MemorySize: 2688
Timeout: 300 # Time of discovery depend on number of dashboards
Expand Down Expand Up @@ -1577,9 +1553,11 @@ Resources:
Description: An AWS managed layer with a cid-cmd package installed
Content:
S3Bucket: !Sub '${LambdaLayerBucketPrefix}-${AWS::Region}'
S3Key: 'cid-resource-lambda-layer/cid-0.2.47.zip' #replace version here if needed
S3Key: 'cid-resource-lambda-layer/cid-0.3.0.zip' #replace version here if needed
CompatibleRuntimes:
- python3.10
- python3.11
- python3.12

CostIntelligenceDashboard:
Type: Custom::CidDashboard
Expand Down
3 changes: 1 addition & 2 deletions cid/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@

__version__ = '0.2.47'
__version__ = '0.3.0'
89 changes: 89 additions & 0 deletions cid/builtin/core/data/queries/shared/cur.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
DatabaseName: "${athena_database_name}"
TableInput:
Name: "${athenaTableName}"
Owner: owner
Retention: 0
TableType: EXTERNAL_TABLE
Parameters:
compressionType: none
classification: parquet
UPDATED_BY_CRAWLER: CidCurCrawler # Hard coded Crawler Name
StorageDescriptor:
BucketColumns: []
Compressed: false
Location: "${location}"
NumberOfBuckets: -1
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
SerdeInfo:
Parameters:
serialization.format: '1'
SerializationLibrary: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
StoredAsSubDirectories: false
Columns: # All fields required for CID
- {"Name": "bill_bill_type", "Type": "string" }
- {"Name": "bill_billing_entity", "Type": "string" }
- {"Name": "bill_billing_period_end_date", "Type": "timestamp" }
- {"Name": "bill_billing_period_start_date", "Type": "timestamp" }
- {"Name": "bill_invoice_id", "Type": "string" }
- {"Name": "bill_payer_account_id", "Type": "string" }
- {"Name": "identity_line_item_id", "Type": "string" }
- {"Name": "identity_time_interval", "Type": "string" }
- {"Name": "line_item_availability_zone", "Type": "string" }
- {"Name": "line_item_legal_entity", "Type": "string" }
- {"Name": "line_item_line_item_description", "Type": "string" }
- {"Name": "line_item_line_item_type", "Type": "string" }
- {"Name": "line_item_operation", "Type": "string" }
- {"Name": "line_item_product_code", "Type": "string" }
- {"Name": "line_item_resource_id", "Type": "string" }
- {"Name": "line_item_unblended_cost", "Type": "double" }
- {"Name": "line_item_usage_account_id", "Type": "string" }
- {"Name": "line_item_usage_amount", "Type": "double" }
- {"Name": "line_item_usage_end_date", "Type": "timestamp" }
- {"Name": "line_item_usage_start_date", "Type": "timestamp" }
- {"Name": "line_item_usage_type", "Type": "string" }
- {"Name": "pricing_lease_contract_length", "Type": "string" }
- {"Name": "pricing_offering_class", "Type": "string" }
- {"Name": "pricing_public_on_demand_cost", "Type": "double" }
- {"Name": "pricing_purchase_option", "Type": "string" }
- {"Name": "pricing_term", "Type": "string" }
- {"Name": "pricing_unit", "Type": "string" }
- {"Name": "product_cache_engine", "Type": "string" }
- {"Name": "product_current_generation", "Type": "string" }
- {"Name": "product_database_engine", "Type": "string" }
- {"Name": "product_deployment_option", "Type": "string" }
- {"Name": "product_from_location", "Type": "string" }
- {"Name": "product_group", "Type": "string" }
- {"Name": "product_instance_type", "Type": "string" }
- {"Name": "product_instance_type_family", "Type": "string" }
- {"Name": "product_license_model", "Type": "string" }
- {"Name": "product_operating_system", "Type": "string" }
- {"Name": "product_physical_processor", "Type": "string" }
- {"Name": "product_processor_features", "Type": "string" }
- {"Name": "product_product_family", "Type": "string" }
- {"Name": "product_product_name", "Type": "string" }
- {"Name": "product_region", "Type": "string" }
- {"Name": "product_servicecode", "Type": "string" }
- {"Name": "product_storage", "Type": "string" }
- {"Name": "product_tenancy", "Type": "string" }
- {"Name": "product_to_location", "Type": "string" }
- {"Name": "product_volume_api_name", "Type": "string" }
- {"Name": "product_volume_type", "Type": "string" }
- {"Name": "reservation_amortized_upfront_fee_for_billing_period", "Type": "double" }
- {"Name": "reservation_effective_cost", "Type": "double" }
- {"Name": "reservation_end_time", "Type": "string" }
- {"Name": "reservation_reservation_a_r_n", "Type": "string" }
- {"Name": "reservation_start_time", "Type": "string" }
- {"Name": "reservation_unused_amortized_upfront_fee_for_billing_period", "Type": "double" }
- {"Name": "reservation_unused_recurring_fee", "Type": "double" }
- {"Name": "savings_plan_amortized_upfront_commitment_for_billing_period", "Type": "double" }
- {"Name": "savings_plan_end_time", "Type": "string" }
- {"Name": "savings_plan_offering_type", "Type": "string" }
- {"Name": "savings_plan_payment_option", "Type": "string" }
- {"Name": "savings_plan_purchase_term", "Type": "string" }
- {"Name": "savings_plan_savings_plan_a_r_n", "Type": "string" }
- {"Name": "savings_plan_savings_plan_effective_cost", "Type": "double" }
- {"Name": "savings_plan_start_time", "Type": "string" }
- {"Name": "savings_plan_total_commitment_to_date", "Type": "double" }
- {"Name": "savings_plan_used_commitment", "Type": "double" }
PartitionKeys: ${partitions} # can be a list
55 changes: 55 additions & 0 deletions cid/builtin/core/data/resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -623,6 +623,19 @@ views:
- aws_accounts
ta_descriptions:
File: shared/ta_descriptions.sql
cur:
type: Glue_Table
File: shared/cur.yaml
crawler: cur
parameters:
partitions:
type: cur.partitions
description: 'comma separated list of CUR partitions ex: ["source_account_id","cur_name_1","cur_name_2","year","month"] ex2: ["year","month"]'
default: '[{"Name":"source_account_id","Type":"string"},{"Name":"cur_name_1","Type":"string"},{"Name":"cur_name_2","Type":"string"},{"Name":"year","Type":"string"},{"Name":"month","Type":"string"}]'
location:
type: cur.location
description: 's3 path'
default: 's3://cid-{account_id}-shared/cur/'

# Refresh Schedules for QuickSight DataSets
schedules:
Expand All @@ -632,3 +645,45 @@ schedules:
Interval: DAILY
TimeOfTheDay: '02:00-05:00'
RefreshType: FULL_REFRESH

crawlers:
cur:
data:
Name: 'CidCmdCurCrawler'
Description: A recurring crawler that keeps your CUR table in Athena up-to-date.
Role: ${crawler_role_arn}
DatabaseName: "${athena_database_name}"
Targets:
S3Targets:
- Path: ${location}
Exclusions:
- '**.json'
- '**.yml'
- '**.sql'
- '**.csv'
- '**.csv.metadata'
- '**.gz'
- '**.zip'
- '**/cost_and_usage_data_status/*'
- 'aws-programmatic-access-test-object'
SchemaChangePolicy:
DeleteBehavior: LOG
RecrawlPolicy:
RecrawlBehavior: CRAWL_EVERYTHING
Schedule: cron(0 2 * * ? *)
Configuration: |
{
"Version":1.0,
"Grouping": {
"TableGroupingPolicy": "CombineCompatibleSchemas"
},
"CrawlerOutput":{
"Tables":{
"AddOrUpdateBehavior":"MergeNewColumns"
}
}
}
parameters:
s3path:
default: 's3://cid-{account_id}-cur/cur/'
description: CUR path
Loading
Loading