Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature crawler #711

Merged
merged 66 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
6b31eb8
wip
iakov-aws Nov 6, 2023
564e978
wip
iakov-aws Nov 24, 2023
14929cb
add management of CUR fields
iakov-aws Dec 23, 2023
d057156
wip
iakov-aws Dec 23, 2023
3423e1f
better type management
iakov-aws Dec 23, 2023
b23941a
refactoring
iakov-aws Dec 23, 2023
c6caf78
fixes
iakov-aws Dec 23, 2023
d0b8578
add warning if crawler is not well configured
iakov-aws Dec 23, 2023
3508256
sort
iakov-aws Dec 23, 2023
6d8f314
fix import
iakov-aws Dec 23, 2023
3acf9a6
Merge branch 'main' into feature-crawler
iakov-aws Dec 23, 2023
1edc2d2
Merge
iakov-aws Dec 23, 2023
900a42a
refactoring cur
iakov-aws Dec 24, 2023
9839247
remove resource id dep
iakov-aws Dec 24, 2023
17db51a
remove resource_id dep
iakov-aws Dec 24, 2023
7c53141
more refactoring
iakov-aws Dec 24, 2023
c4af0fa
more refactoring
iakov-aws Dec 24, 2023
7785277
fixes
iakov-aws Dec 24, 2023
3f54b03
merge
iakov-aws Dec 24, 2023
d129f30
fixes
iakov-aws Dec 24, 2023
ca45cdc
Merge branch 'feature-manage-cur-fields' into feature-crawler
iakov-aws Dec 24, 2023
3b7e6b0
add cur creation
iakov-aws Dec 25, 2023
bb3d1d1
fixes
iakov-aws Dec 25, 2023
a25e737
fixes
iakov-aws Dec 25, 2023
113b92f
fixes
iakov-aws Dec 25, 2023
e25d45b
better creation workflow and messages plus database creation
iakov-aws Dec 26, 2023
cbd7d4a
doc and workflow fixes
iakov-aws Dec 26, 2023
4246a5b
doc and workflow fixes
iakov-aws Dec 26, 2023
60d33f9
add cur.yaml
iakov-aws Dec 26, 2023
a41d21c
add QS DS role management
iakov-aws Dec 26, 2023
4a78973
refactor iam
iakov-aws Dec 26, 2023
f8a9fab
allow customer to create dataset with new QS role
iakov-aws Dec 27, 2023
6e46436
release 0.3.0
iakov-aws Dec 27, 2023
16d427f
Merge branch 'main' into feature-crawler
iakov-aws Dec 27, 2023
30f623b
refactor export
iakov-aws Dec 28, 2023
e4b2902
lint
iakov-aws Dec 30, 2023
36a72e9
lint
iakov-aws Dec 30, 2023
0df6174
fix cur yaml
iakov-aws Jan 24, 2024
5eca19b
fix role creation
iakov-aws Jan 24, 2024
37b2e40
workaround IAM
iakov-aws Jan 24, 2024
da1436a
wip
iakov-aws Jan 24, 2024
3dc81a3
merge
iakov-aws Jan 24, 2024
a55b8a9
more fixes
iakov-aws Jan 24, 2024
1a4a4c1
various fixes for the crawler and roles
iakov-aws Jan 25, 2024
3191172
various fixes for the crawler and roles
iakov-aws Jan 25, 2024
520fd35
Update cid/helpers/quicksight/__init__.py
iakov-aws Jan 31, 2024
4a1c6a8
Update cid/cli.py
iakov-aws Jan 31, 2024
98d16cc
Update cid/commands/init_qs.py
iakov-aws Jan 31, 2024
be8c3ae
Update cid/commands/init_qs.py
iakov-aws Jan 31, 2024
bfbf9a0
Merge branch 'main' into feature-crawler
iakov-aws Jan 31, 2024
ea82b5b
review fixes
iakov-aws Jan 31, 2024
67d1e06
wip
iakov-aws Feb 1, 2024
1da954d
align roles
iakov-aws Mar 10, 2024
8e3e6b1
merge
iakov-aws Mar 10, 2024
f5c63b3
lint
iakov-aws Mar 10, 2024
8dd9f38
more fixes
iakov-aws Mar 10, 2024
bc21c90
Merge branch 'main' into feature-crawler
iakov-aws Mar 22, 2024
6e4cdcb
minor fixes
iakov-aws Mar 26, 2024
19301b0
align role names
iakov-aws Mar 26, 2024
010d9e9
merge main
iakov-aws Mar 26, 2024
28dbe5b
fix comment
iakov-aws Mar 26, 2024
f404504
merge
iakov-aws Mar 26, 2024
b1f75c3
remove print
iakov-aws Mar 26, 2024
4640404
Update cid/common.py
iakov-aws Mar 27, 2024
a4d14fb
Update cid/common.py
iakov-aws Mar 28, 2024
68a841f
merge
iakov-aws Mar 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,21 +83,28 @@ cid-cmd share
```

#### Initialize Amazon QuickSight
One time action to intialize Amazon QuickSight Enerprise Edition.
One time action to initialize Amazon QuickSight Enterprise Edition.

```bash
cid-cmd initqs
cid-cmd init-qs
```

#### Initialize CUR
One time action to initialize Athena table and Crawler from s3 with CUR data.

```bash
cid-cmd init-cur
```

#### Delete Dashboard and all dependencies unused by other
Delete Dashboards and all dependencies unused by other CID-managed dashboards.(including QuickSight datasets, Athena views and tables)
```bash
cid-cmd delete
```

#### Delete Command Options:
```
--dashboard-id TEXT QuickSight dashboard id
--dashboard-id TEXT QuickSight dashboard id
--athena-database TEXT Athena database
```

Expand Down
4 changes: 2 additions & 2 deletions cfn-templates/cid-cfn.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
AWSTemplateFormatVersion: '2010-09-09'
Description: Deployment of Cloud Intelligence Dashboards v0.2.42
Description: Deployment of Cloud Intelligence Dashboards v0.3.0
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
Expand Down Expand Up @@ -1506,7 +1506,7 @@ Resources:
Description: An AWS managed layer with a cid-cmd package installed
Content:
S3Bucket: !Sub '${LambdaLayerBucketPrefix}-${AWS::Region}'
S3Key: 'cid-resource-lambda-layer/cid-0.2.42.zip' #replace version here if needed
S3Key: 'cid-resource-lambda-layer/cid-0.3.0.zip' #replace version here if needed
CompatibleRuntimes:
- python3.10

Expand Down
3 changes: 1 addition & 2 deletions cid/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@

__version__ = '0.2.42'
__version__ = '0.3.0'
89 changes: 89 additions & 0 deletions cid/builtin/core/data/queries/shared/cur.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
DatabaseName: "${athena_database_name}"
TableInput:
Name: "${athenaTableName}"
Owner: owner
Retention: 0
TableType: EXTERNAL_TABLE
Parameters:
compressionType: none
classification: parquet
UPDATED_BY_CRAWLER: CidCurCrawler # Hard coded Crawler Name
StorageDescriptor:
BucketColumns: []
Compressed: false
Location: "${location}"
NumberOfBuckets: -1
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
SerdeInfo:
Parameters:
serialization.format: '1'
SerializationLibrary: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
StoredAsSubDirectories: false
Columns: # All fields required for CID
- {"Name": "bill_bill_type", "Type": "string" }
- {"Name": "bill_billing_entity", "Type": "string" }
- {"Name": "bill_billing_period_end_date", "Type": "timestamp" }
- {"Name": "bill_billing_period_start_date", "Type": "timestamp" }
- {"Name": "bill_invoice_id", "Type": "string" }
- {"Name": "bill_payer_account_id", "Type": "string" }
- {"Name": "identity_line_item_id", "Type": "string" }
- {"Name": "identity_time_interval", "Type": "string" }
- {"Name": "line_item_availability_zone", "Type": "string" }
- {"Name": "line_item_legal_entity", "Type": "string" }
- {"Name": "line_item_line_item_description", "Type": "string" }
- {"Name": "line_item_line_item_type", "Type": "string" }
- {"Name": "line_item_operation", "Type": "string" }
- {"Name": "line_item_product_code", "Type": "string" }
- {"Name": "line_item_resource_id", "Type": "string" }
- {"Name": "line_item_unblended_cost", "Type": "double" }
- {"Name": "line_item_usage_account_id", "Type": "string" }
- {"Name": "line_item_usage_amount", "Type": "double" }
- {"Name": "line_item_usage_end_date", "Type": "timestamp" }
- {"Name": "line_item_usage_start_date", "Type": "timestamp" }
- {"Name": "line_item_usage_type", "Type": "string" }
- {"Name": "pricing_lease_contract_length", "Type": "string" }
- {"Name": "pricing_offering_class", "Type": "string" }
- {"Name": "pricing_public_on_demand_cost", "Type": "double" }
- {"Name": "pricing_purchase_option", "Type": "string" }
- {"Name": "pricing_term", "Type": "string" }
- {"Name": "pricing_unit", "Type": "string" }
- {"Name": "product_cache_engine", "Type": "string" }
- {"Name": "product_current_generation", "Type": "string" }
- {"Name": "product_database_engine", "Type": "string" }
- {"Name": "product_deployment_option", "Type": "string" }
- {"Name": "product_from_location", "Type": "string" }
- {"Name": "product_group", "Type": "string" }
- {"Name": "product_instance_type", "Type": "string" }
- {"Name": "product_instance_type_family", "Type": "string" }
- {"Name": "product_license_model", "Type": "string" }
- {"Name": "product_operating_system", "Type": "string" }
- {"Name": "product_physical_processor", "Type": "string" }
- {"Name": "product_processor_features", "Type": "string" }
- {"Name": "product_product_family", "Type": "string" }
- {"Name": "product_product_name", "Type": "string" }
- {"Name": "product_region", "Type": "string" }
- {"Name": "product_servicecode", "Type": "string" }
- {"Name": "product_storage", "Type": "string" }
- {"Name": "product_tenancy", "Type": "string" }
- {"Name": "product_to_location", "Type": "string" }
- {"Name": "product_volume_api_name", "Type": "string" }
- {"Name": "product_volume_type", "Type": "string" }
- {"Name": "reservation_amortized_upfront_fee_for_billing_period", "Type": "double" }
- {"Name": "reservation_effective_cost", "Type": "double" }
- {"Name": "reservation_end_time", "Type": "string" }
- {"Name": "reservation_reservation_a_r_n", "Type": "string" }
- {"Name": "reservation_start_time", "Type": "string" }
- {"Name": "reservation_unused_amortized_upfront_fee_for_billing_period", "Type": "double" }
- {"Name": "reservation_unused_recurring_fee", "Type": "double" }
- {"Name": "savings_plan_amortized_upfront_commitment_for_billing_period", "Type": "double" }
- {"Name": "savings_plan_end_time", "Type": "string" }
- {"Name": "savings_plan_offering_type", "Type": "string" }
- {"Name": "savings_plan_payment_option", "Type": "string" }
- {"Name": "savings_plan_purchase_term", "Type": "string" }
- {"Name": "savings_plan_savings_plan_a_r_n", "Type": "string" }
- {"Name": "savings_plan_savings_plan_effective_cost", "Type": "double" }
- {"Name": "savings_plan_start_time", "Type": "string" }
- {"Name": "savings_plan_total_commitment_to_date", "Type": "double" }
- {"Name": "savings_plan_used_commitment", "Type": "double" }
PartitionKeys: ${partitions} # can be a list
55 changes: 55 additions & 0 deletions cid/builtin/core/data/resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -623,6 +623,19 @@ views:
- aws_accounts
ta_descriptions:
File: shared/ta_descriptions.sql
cur:
type: Glue_Table
File: shared/cur.yaml
crawler: cur
parameters:
partitions:
type: cur.partitions
description: 'comma separated list of CUR partitions ex: ["source_account_id","cur_name_1","cur_name_2","year","month"] ex2: ["year","month"]'
default: '[{"Name":"source_account_id","Type":"string"},{"Name":"cur_name_1","Type":"string"},{"Name":"cur_name_2","Type":"string"},{"Name":"year","Type":"string"},{"Name":"month","Type":"string"}]'
location:
type: cur.location
description: 's3 path'
default: 's3://cid-{account_id}-shared/cur/'

# Refresh Schedules for QuickSight DataSets
schedules:
Expand All @@ -632,3 +645,45 @@ schedules:
Interval: DAILY
TimeOfTheDay: '02:00-05:00'
RefreshType: FULL_REFRESH

crawlers:
cur:
data:
Name: 'CidCmdCurCrawler'
Description: A recurring crawler that keeps your CUR table in Athena up-to-date.
Role: ${crawler_role_arn}
DatabaseName: "${athena_database_name}"
Targets:
S3Targets:
- Path: ${location}
Exclusions:
- '**.json'
- '**.yml'
- '**.sql'
- '**.csv'
- '**.csv.metadata'
- '**.gz'
- '**.zip'
- '**/cost_and_usage_data_status/*'
- 'aws-programmatic-access-test-object'
SchemaChangePolicy:
DeleteBehavior: LOG
RecrawlPolicy:
RecrawlBehavior: CRAWL_EVERYTHING
Schedule: cron(0 2 * * ? *)
Configuration: |
{
"Version":1.0,
"Grouping": {
"TableGroupingPolicy": "CombineCompatibleSchemas"
},
"CrawlerOutput":{
"Tables":{
"AddOrUpdateBehavior":"MergeNewColumns"
}
}
}
parameters:
s3path:
default: 's3://cid-{account_id}-cur/cur/'
description: CUR path
35 changes: 30 additions & 5 deletions cid/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,23 +227,48 @@ def cleanup(ctx, **kwargs):
@cid_command
def share(ctx, dashboard_id, **kwargs):
"""Share QuickSight resources (Dashboard, Datasets, DataSource)"""

ctx.obj.share(dashboard_id)

@click.option('-v', '--verbose', count=True)
@click.option('-y', '--yes', help='confirm all', is_flag=True, default=False)
@cid_command
def initqs(ctx, **kwargs):
def init_qs(ctx, **kwargs):
"""Initialize Amazon QuickSight

\b

--enable-quicksight-enterprise (yes|no) Confirm the activation of QuickSight
--account-name NAME Unique QuickSight account name (Unique across all AWS users)
--account-name NAME Unique QuickSight account name (Unique across all AWS users)
--notification-email EMAIL User's email for QuickSight notifications
"""

ctx.obj.initqs(**kwargs)
ctx.obj.init_qs(**kwargs)

@click.option('-v', '--verbose', count=True)
@cid_command
def init_cur(ctx, **kwargs):
iakov-aws marked this conversation as resolved.
Show resolved Hide resolved
"""Initialize CUR table

\b
--view-cur-location s3://BUCKET/PATH S3 path with CUR data. We support only 2 types of CUR path: 's3://{bucket}/cur' and 's3://{bucket}/{prefix}/{name}/{name}'
iakov-aws marked this conversation as resolved.
Show resolved Hide resolved
--crawler-role ROLE Name or ARN of crawler role
"""

ctx.obj.init_cur(**kwargs)

@click.option('-v', '--verbose', count=True)
@click.option('-y', '--yes', help='confirm all', is_flag=True, default=False)
@cid_command
def teardown(ctx, **kwargs):
"""Delete all CID assets

\b

THIS IS VERY DANGEROUS. DO NOT USE IT.
"""

ctx.obj.teardown(**kwargs)

if __name__ == '__main__':
main()


12 changes: 6 additions & 6 deletions cid/commands/init_qs.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,27 @@ def __init__(self, cid, **kwargs):
self.cid = cid

def execute(self, *args, **kwargs):
"""Execute the initilization"""
"""Execute the initialization"""
self._create_quicksight_enterprise_subscription()

def _create_quicksight_enterprise_subscription(self):
"""Enable QuickSight Enterprise if not enabled already"""
cid_print('Analysing QuickSight Status')
cid_print('Analyzing QuickSight Status')
if self.cid.qs.edition(fresh=True) in ('ENTERPRISE', 'ENTERPRISE_AND_Q'):
cid_print(f'QuickSight Edition is {self.cid.qs.edition()}')
return

cid_print(
'<BOLD><RED>IMPORTANT<END>: <BOLD>Amazion QuickSight Enterprise Edition is required for Cost Intelligence Dashboard. '
'<BOLD><RED>IMPORTANT<END>: <BOLD>Amazon QuickSight Enterprise Edition is required for Cost Intelligence Dashboard. '
iakov-aws marked this conversation as resolved.
Show resolved Hide resolved
'This will lead to costs in your AWS account (https://aws.amazon.com/quicksight/pricing/).<END>'
iakov-aws marked this conversation as resolved.
Show resolved Hide resolved
)

if not self.cid.all_yes and not get_yesno_parameter(
param_name='enable-quicksight-enterprise',
message='Please, confirm enabling of Amazion QuickSight Enterprise',
message='Please, confirm enabling of Amazon QuickSight Enterprise',
default='no'
):
cid_print('\tInitalization cancelled')
cid_print('\tInitialization cancelled')
return

for counter in range(MAX_ITERATIONS):
Expand Down Expand Up @@ -74,7 +74,7 @@ def _create_quicksight_enterprise_subscription(self):
cid_print(f'\tQuickSight Edition is {self.cid.qs.edition()}.')

def _get_account_name_for_quicksight(self):
"""Get the account name for quicksight"""
"""Get the account name for quicksight"""
for _ in range(MAX_ITERATIONS):
account_name = get_parameter(
'account-name',
Expand Down
Loading