Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

718 integration test #158

Merged
merged 26 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
48216b3
Fix imports
AntonZogk Jan 24, 2025
57a65ef
Fix multi args for strata
AntonZogk Jan 24, 2025
3468145
Fix args in ratio of means and hard coded target
AntonZogk Jan 24, 2025
b6bc3f6
Add missing args current, revision periods
AntonZogk Jan 24, 2025
f19854e
Remove referencename column
AntonZogk Jan 24, 2025
048acc2
Remove convert to str code
AntonZogk Jan 24, 2025
a8632c1
Check for empty derived dfs
AntonZogk Jan 24, 2025
97b4f8c
Fix args for run_live_or_frozen call
AntonZogk Jan 24, 2025
accdb50
Add missing args for back data
AntonZogk Jan 24, 2025
cbd51e2
Add missing arg question_no
AntonZogk Jan 24, 2025
4606ecd
Feat main testing
AntonZogk Jan 24, 2025
4b51932
Fix typo in cp path
AntonZogk Jan 24, 2025
3b9f450
Force install rdsa-utils,boto,raz_client
AntonZogk Jan 24, 2025
a55f7f8
Use home directory as working directory for running test main
AntonZogk Jan 27, 2025
a53b931
Comment out rdsa,boto,raz
AntonZogk Jan 27, 2025
41d43ad
Insstall rdsa utils, raz
AntonZogk Jan 27, 2025
782c86c
Update pre-commit hook job
AntonZogk Jan 27, 2025
1420237
Install all dependencies
AntonZogk Jan 27, 2025
a862c0d
Fix trailing whitespace
AntonZogk Jan 27, 2025
ef959b2
Update workflow, permissions and hooks commit hash
AntonZogk Jan 27, 2025
7dbd18d
try short hash
robertswh Jan 27, 2025
61eb385
full hash
robertswh Jan 27, 2025
97bd10c
Update action hash
AntonZogk Jan 27, 2025
ca50f29
Use v3.0.0 for hooks
AntonZogk Jan 27, 2025
1e450c3
Use hash for hook action
AntonZogk Jan 27, 2025
c90e079
Run hooks
AntonZogk Jan 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 8 additions & 16 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -1,28 +1,20 @@
name: cml_runtimes

permissions:
contents: read
pull-requests: read
on:
# Triggers the workflow on pull requests to main branch
pull_request:
branches: [ main ]

jobs:
commit-hooks:
runs-on: ubuntu-20.04

pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- uses: actions/setup-python@v3
with:
python-version: 3.10.13

- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
pip install .[dev]
Copy link
Collaborator Author

@AntonZogk AntonZogk Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this fails when forcing installation of rdsa-utils, the reason is because gssapi package is missing from the system (pythongssapi/requests-gssapi#14), so ubuntu-20.04 which hooks used to run on doesn't have gssapi so couldn't install rdsa utils.

Updated the hooks workflow with something more reproducible which doesn't require to install the package in dev mode anymore. We actually test building the package in 4 versions of CDP so not needed.


- name: Check commit hooks
run: |
pre-commit run --all-files
- uses: actions/setup-python@v4
- uses: pre-commit/action@646c83fcd040023954eafda54b4db0192ce70507 # hash for v3.0.0

testing-cml:
runs-on: ubuntu-latest
Expand Down
10 changes: 7 additions & 3 deletions mbs_results/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@
"sic_domain_mapping_path": "",
"threshold_filepath":"",

"back_data_type":"response_type",
"imputation_marker_col":"imputation_marker",


"period_selected": 202303,
"current_period" : 202303,
"previous_period" : 202302,
Expand All @@ -27,15 +31,15 @@
"calibration_factor": "calibration_factor",
"cell_number": "cell_no",
"design_weight": "design_weight",
"errormarker": "statusencoded",
"status": "statusencoded",
robertswh marked this conversation as resolved.
Show resolved Hide resolved
"form_id_idbr": "formtype",
"group": "calibration_group",
"calibration_group": "calibration_group",
"period": "period",
"question_no": "questioncode",
"reference": "reference",
"region": "region",
"sampled": "sampled",
"sampled": "is_sampled",
"state": "frozen",
"strata": "cell_no",
"target": "adjustedresponse",
Expand Down Expand Up @@ -154,6 +158,6 @@
"13":"fir"
},

"additional_outputs":["create_imputation_link_output"]
robertswh marked this conversation as resolved.
Show resolved Hide resolved
"additional_outputs":[]

}
4 changes: 1 addition & 3 deletions mbs_results/estimation/apply_estimation.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,7 @@ def apply_estimation(

estimation_df = pd.concat(estimation_df_list, ignore_index=True)

create_population_count_output(
estimation_df, period, calibration_group, save_output=True, **config
robertswh marked this conversation as resolved.
Show resolved Hide resolved
)
create_population_count_output(estimation_df, period, save_output=True, **config)

# validate_estimation(estimation_df, **config)

Expand Down
6 changes: 5 additions & 1 deletion mbs_results/imputation/impute.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ def impute(dataframe: pd.DataFrame, config: dict) -> pd.DataFrame:
reference=config["reference"],
target=config["target"],
period=config["period"],
current_period=config["current_period"],
revision_period=config["revision_period"],
question_no=config["question_no"],
strata="imputation_class",
auxiliary=config["auxiliary"],
)
Expand All @@ -66,12 +69,13 @@ def impute(dataframe: pd.DataFrame, config: dict) -> pd.DataFrame:
question_no=config["question_no"],
spp_form_id=config["form_id_spp"],
)
target = config["target"]

post_constrain["imputed_and_derived_flag"] = post_constrain.apply(
lambda row: (
"d"
if "sum" in str(row["constrain_marker"]).lower()
else row["imputation_flags_adjusted_value"]
else row[f"imputation_flags_{target}"]
),
axis=1,
)
Expand Down
26 changes: 12 additions & 14 deletions mbs_results/imputation/ratio_of_means.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,8 +341,9 @@ def ratio_of_means(
reference: str,
strata: str,
auxiliary: str,
current_period: str,
revision_period: str,
current_period: int,
revision_period: int,
question_no: str,
filters: pd.DataFrame = None,
manual_constructions: pd.DataFrame = None,
imputation_links: Dict[str, str] = {},
Expand Down Expand Up @@ -372,6 +373,12 @@ def ratio_of_means(
Column name containing strata information (sic).
auxiliary : str
Column name containing auxiliary information (sic).
current_period: int
Value with current period to be imputed as int.
revision_period: int
Value containing the amount of periods for imputation.
question_no: str
Column name containing question_no
filters : pd.DataFrame, optional
Dataframe with values to exclude from imputation method.
manual_constructions : pd.DataFrame, optional
Expand Down Expand Up @@ -429,7 +436,9 @@ def ratio_of_means(

if manual_constructions is not None:
# Need to join mc dataframe to original df
df = join_manual_constructions(df, manual_constructions, reference, period)
df = join_manual_constructions(
df, manual_constructions, reference, period, question_no
)

if f"{target}_man" in df.columns:
# Manual Construction
Expand Down Expand Up @@ -510,14 +519,3 @@ def calculate_back_data_period(current_period, revision_period) -> str:
(current_period - pd.DateOffset(months=revision_period)).date().strftime("%Y%m")
)
return back_data_period


if __name__ == "__main__":
from mbs_results.utilities.inputs import load_config

config = load_config()
bdp = calculate_back_data_period(
current_period=config["current_period"],
revision_period=config["revision_period"],
)
print(config["current_period"], bdp)
1 change: 0 additions & 1 deletion mbs_results/outputs/produce_additional_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ def get_additional_outputs_df(
"response",
"froempment",
"cell_no",
"referencename",
robertswh marked this conversation as resolved.
Show resolved Hide resolved
"imputation_flags_adjustedresponse",
"f_link_adjustedresponse",
"b_link_adjustedresponse",
Expand Down
5 changes: 3 additions & 2 deletions mbs_results/outputs/weighted_adj_val_time_series.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import numpy as np
import pandas as pd
from staging.merge_domain import merge_domain
from utilities.utils import convert_column_to_datetime

from mbs_results.staging.merge_domain import merge_domain
from mbs_results.utilities.utils import convert_column_to_datetime


def get_weighted_adj_val_time_series(
Expand Down
5 changes: 2 additions & 3 deletions mbs_results/staging/data_cleaning.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,6 @@ def load_manual_constructions(
manual_constructions[period] = convert_column_to_datetime(
manual_constructions[period]
)
manual_constructions[reference] = manual_constructions[reference].astype("str")
manual_constructions.set_index([reference, period], inplace=True)

validate_manual_constructions(df, manual_constructions)
Expand All @@ -186,7 +185,7 @@ def join_manual_constructions(
manual_constructions: pd.DataFrame,
reference: str,
period: str,
question_no: str = "question_no",
question_no: str,
**config,
):
"""
Expand All @@ -205,7 +204,7 @@ def join_manual_constructions(
the name of the reference column
period: str
the name of the period column
period: str
question_no: str
the name of the question number column
**config: Dict
main pipeline configuration. Can be used to input the entire config dictionary
Expand Down
2 changes: 1 addition & 1 deletion mbs_results/staging/stage_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ def stage_dataframe(config: dict) -> pd.DataFrame:
df = run_live_or_frozen(
df,
config["target"],
error_marker=config["errormarker"],
status=config["status"],
state=config["state"],
error_values=[201],
)
Expand Down
47 changes: 30 additions & 17 deletions mbs_results/utilities/constrains.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import operator
import warnings
from typing import List

import pandas as pd
Expand Down Expand Up @@ -162,14 +163,21 @@ def constrain(
)
pre_derive_df = pre_derive_df[[target]]

derived_values = pd.concat(
robertswh marked this conversation as resolved.
Show resolved Hide resolved
[
sum_sub_df(pre_derive_df.loc[form_type], derives["from"])
.assign(**{question_no: derives["derive"]})
.assign(**{spp_form_id: form_type})
for form_type, derives in derive_map.items()
]
)
derived_values_list = [
sum_sub_df(pre_derive_df.loc[form_type], derives["from"])
.assign(**{question_no: derives["derive"]})
.assign(**{spp_form_id: form_type})
for form_type, derives in derive_map.items()
]

if derived_values_list:

derived_values = pd.concat(derived_values_list)

else:
warnings.warn("No derived questions created")
derived_values = pd.DataFrame(columns=["constrain_marker"])

unique_q_numbers = df[question_no].unique()
df.set_index([question_no, period, reference], inplace=True)

Expand Down Expand Up @@ -238,15 +246,20 @@ def derive_questions(
# Assuming default value of o-weight is 1
pre_derive_df = pre_derive_df[[target]].fillna(value=0)

derived_values = pd.concat(
[
sum_sub_df(pre_derive_df.loc[form_type], derives["from"])
.assign(**{question_no: derives["derive"]})
.assign(**{spp_form_id: form_type})
# Create a task on Backlog to fix this.
for form_type, derives in derive_map.items()
]
)
derived_values_list = [
sum_sub_df(pre_derive_df.loc[form_type], derives["from"])
.assign(**{question_no: derives["derive"]})
.assign(**{spp_form_id: form_type})
# Create a task on Backlog to fix this.
for form_type, derives in derive_map.items()
]
if derived_values_list:
derived_values = pd.concat(derived_values_list)

else:
warnings.warn("No derived questions created")
derived_values = pd.DataFrame(columns=["constrain_marker"])

unique_q_numbers = df[question_no].unique()

df.set_index([question_no, period, reference], inplace=True)
Expand Down
6 changes: 3 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ install_requires =
pyyaml
pandas
numpy
# rdsa-utils
# raz-client
# boto3
rdsa-utils
raz-client
boto3
python_requires = >=3.6
robertswh marked this conversation as resolved.
Show resolved Hide resolved
zip_safe = no

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cell_no,calibration_group
999,9999
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
classification,question_no,l_value
99999,40,9999999
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
classification,sic_5_digit
99999,999
2 changes: 2 additions & 0 deletions tests/data/test_main/input/test_cp_009_202112.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
period,reference,form_type,sic92,error_mkr,response_type
202112,1,ZZZ,45310,O,1
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202112
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202201
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202202
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202203
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202204
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202205
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_finalsel009_202206
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1::999:999:999:999: 999: 999: 999: 999: 999.99: 999: 999: 999:999:999:ZZZ :ZZZ :ZZZ : 999: 999: 999:999:ZZZ:ZZZ:01/01/1900 :ZZZ ZZ : : :ZZZ ZZ : : :99 ZZZ ZZ :ZZZ :ZZZ ZZZ : : :ZZ9 9ZZ :ZZ ZZZ ZZZ : : :ZZZ ZZZ :9999 :9999 :Z: : 999:9999:* :Z
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
form,domain,threshold,IDBR_form
2 changes: 2 additions & 0 deletions tests/data/test_main/input/test_manual_constructions.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
period,reference,questioncode,adjustedresponse
202204,1,40,888
2 changes: 2 additions & 0 deletions tests/data/test_main/input/test_qv_009_202112.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
period,reference,question_no,returned_value,adjusted_value,instance
202112,1,40,999,999,999
1 change: 1 addition & 0 deletions tests/data/test_main/input/test_sic_domain_mapping.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sic_5_digit,domain
Loading
Loading