Skip to content

Commit

Permalink
add comare_workflow_task sample to monitoring/app (#405)
Browse files Browse the repository at this point in the history
* add comare_workflow_task sample to monitoring/app
  • Loading branch information
o-mura authored Jul 4, 2024
1 parent 0ee80ba commit 83bf2e8
Show file tree
Hide file tree
Showing 11 changed files with 134 additions and 0 deletions.
34 changes: 34 additions & 0 deletions scenarios/monitoring/app/compare_workflow_task/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Workflow: Scenario (compare tasks's duration in workflow between multiple attempts)

## Scenario

The purpose of this scenario is to compare tasks's duration in workflow between multiple attemps.

### Steps
#### 1. push this workflow to Treasure Data
```
> cd compare_workflow_task
> td push compare_workflow_task
```

#### 2. configure endpoint settings
- api_endpoint
- workflow_endpoint
![](images/1.png)

#### 3. configure attempts (you want to compare attempt)
![](images/2.png)

#### 4. register td.apikey as a secret (Owner of td.apikey must be attempts which you specify.)
![](images/3.png)

#### 5. run workflow
![](images/4.png)


After this workflow run, you can get the following query result.
![](images/5.png)

![](images/6.png)

You can compare tasks's duration between multiple attempts.
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
_export:
td:
database: temporary_${session_id}
tables:
tasks: tasks
api_endpoint: api.treasuredata.com
workflow_endpoint: api-workflow.treasuredata.com
attempt_ids:
- 1201247649
- 1200176632
- 1199185996

+create_temporary_db:
td_ddl>:
create_databases: ["${td.database}"]

+get_attempt_task:
py>: scripts.ingest_task.run
session_unixtime: ${session_unixtime}
dest_db: ${td.database}
dest_table: ${td.tables.tasks}
attempt_ids: ${attempt_ids.join(',')}
api_endpoint: ${td.api_endpoint}
workflow_endpoint: ${td.workflow_endpoint}
docker:
image: "digdag/digdag-python:3.9"
_env:
TD_API_KEY: ${secret:td.apikey}

+gen_query:
td>: queries/gen_query.sql
store_last_results: true

+compare_task:
td>: queries/compare_task.sql

+delete_temporary_db:
td_ddl>:
drop_databases: ["${td.database}"]


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
select
fullname as task_name,
${td.last_results.query}
from ${td.tables.tasks}
group by 1
order by max(id)
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
with temp1 as (
select 'max(If(attemptid=''' || attemptid || ''', DATE_DIFF(''second'', DATE_PARSE(startedat, ''%Y-%m-%dT%H:%i:%sZ''), DATE_PARSE(updatedat, ''%Y-%m-%dT%H:%i:%sZ'')), NULL)) as "' || attemptid || '"' as query_fragment from ${td.tables.tasks}
group by 1
)
select array_join(array_agg(query_fragment), ',') as query from temp1
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import requests
import os
import pytd
import pandas as pd
import json

def convert_to_json(s):
return json.dumps(s)

def get_task_info(base_url, headers, ids):
l = []
for i in ids:
url = base_url % i
print(url)
res = requests.get(url=url, headers=headers)
if res.status_code != requests.codes.ok:
res.raise_for_status()
tasks = res.json()['tasks']
for t in tasks:
t['attemptid'] = i
l.extend(tasks)
return l

def insert_task_info(import_unixtime, endpoint, apikey, dest_db, dest_table, tasks):
df = pd.DataFrame(tasks)
df['time'] = int(import_unixtime)
df['config'] = df['config'].apply(convert_to_json)
df['upstreams'] = df['upstreams'].apply(convert_to_json)
df['exportParams'] = df['exportParams'].apply(convert_to_json)
df['storeParams'] = df['storeParams'].apply(convert_to_json)
df['stateParams'] = df['stateParams'].apply(convert_to_json)
df['error'] = df['error'].apply(convert_to_json)
client = pytd.Client(apikey=apikey, endpoint=endpoint, database=dest_db)
client.load_table_from_dataframe(df, dest_table, if_exists='overwrite', fmt='msgpack')

def run(session_unixtime, dest_db, dest_table, attempt_ids, api_endpoint='api.treasuredata.com', workflow_endpoint='api-workflow.treasuredata.com'):
id_list = attempt_ids.split(',')
if len(id_list) == 0:
print('no attempt id')
return

workflow_url = 'https://%s/api/attempts' % workflow_endpoint + '/%s/tasks'
headers = {'Authorization': 'TD1 %s' % os.environ['TD_API_KEY']}
l = get_task_info(workflow_url, headers, id_list)
if len(l) == 0:
print('no update record')
return
insert_task_info(session_unixtime, 'https://%s' % api_endpoint, os.environ['TD_API_KEY'], dest_db, dest_table, l)

0 comments on commit 83bf2e8

Please sign in to comment.