Skip to content

Commit

Permalink
Feature 342 tcdiag tcmpr plotter (#446)
Browse files Browse the repository at this point in the history
* Issue #383 modifications to support multiple plot types, list_stat_1 values

* issue #383 modifications to support generating more than one plot type

* issue #383 provide supporting plotting multiple plot types with single config file

* issue #383 provide supporting plotting multiple plot types with single config file

* issue #383 provide supporting plotting multiple plot types with single config file

* clean up comments

* issue #383 provide supporting plotting multiple plot types with single config file

* issue #383 provide supporting plotting multiple plot types with single config file

* issue #383 provide supporting plotting multiple plot types with single config file

* issue #383 provide supporting plotting multiple plot types with single config file

* fix colors and legends

* issue #383 add logging to replace printing to stdout

* issue #383 add logging support

* issue #383 added logging support

* issue #383 modifications to support plotting multiple plot types using one config file and logging to replace printing to stdout

* issue #383 modifications to support multiple plot types in one config file, logging support added to replace print statements

* issue #383 modifications to support multiple plot types in a single config file and logging to replace print statements

* issue #383 modifications to support multiple plot types in one config file and logging to replace print statements

* issue #383 replace print statements with logging

* Issue #383 logging support

* issue #383 modifications to support multiple plot types in a single config file and logging replaces printing to stdout

* issue #383 modifications to support plotting multiple plot types defined in a single config file

* issue #383 modifications to support multiple plot types in a single config file and added logging

* issue #383 add more checking and modifications to support multiple plot types defined in a single config file.  Also added logging

* issue #383 modifications to support plotting multiple plot types defined in a single config file and added logging

* issue #383 update the plot name to include the plot type

* issue #383 TCMPR plot documentation initial content

* changed file permissions

* Added tcmpr_plots to the Table of Contents

* issue #383 plot images added for TCMPR plotter

* Updated plot to match updates to User's Guide

* updates to match User's Guide

* Update the config files for box plot and relperf plots to create only the TK_ERR plot so the y-axis can be more specific.

* Delete docs/Users_Guide/figure/RELPERF_SAMPLE_DATA_ABS(AMAX_WIND-BMAX_WIND)_relperf.png

not relevant

* Delete docs/Users_Guide/figure/BOXPLOT_SAMPLE_DATA_ABS(AMAX_WIND-BMAX_WIND)_boxplot.png

not relevant

* sample data for TCMPR plotter

* issue #383 config for all seven plot types

* issue #383 removed unused figure, replaced with TK_ERR figure for boxplot

* issue #383 removed hard-coded paths

* issue #383 rearrange content for clarity

* issue #383 fix incomplete sentences

* issue #383 added the baseline_file and column_info_file

* issue #383 added instructions for the baseline_file and column_info_file settings

* issue #383 basic system tests for TCMPR plotting

* System tests for TCMPR plotting

* Issue #383 include the tcmpr plotting system tests

* issue #383 explicitly set hfip_bsln to 'no'

* explicitly set hfip_bsln to no in testing

* change comparison syntax for hfip_bsln check

* change file size testing

* change file size testing with assert False for mismatch

* change file size testing-check mean line plots

* comment out file size testing, they are not consistent when run inside containers

* issue #383 Explicitly state that the TCMPR data must have all columns labelled

* issue #342 added two more settings to accomodate plotting for TCDiag data

* Add support for creating line plot

* Support for reading in the tcst reformatted file

* replace printing to stdout with logging

* create log directory if one doesn't already exist

* Added two more settings for the point plot to support generating line plot for TCDiag data

* plot_list replaced with plot_type_list

* Added two settings to support line plot for TCDiag data

* modify formatting

* Refactor to allow user to create either a scatter or line plot.  Decrease opacity to enable better visualization of overlapping points

* reformatted TCDiag from TC-Pairs output, to be used for testing

* Config file for generating TCDiag simple time series for all forecasts and a single initialization

* check for identical length of x- and y-values for line plot

* pull out plot generating code from the main and into it's own function to make this more usable for METplus use cases

* Clean up comments and formatting of the create_plot method

* replace existing logic with code that behaves more like unix mkdir -p to create directories for output

* Delete plots/config directory

* Update tcmpr_config.py

Fixing sonar qube complaint

---------

Co-authored-by: Hank Fisher <[email protected]>
  • Loading branch information
bikegeek and hankenstein2 authored May 29, 2024
1 parent c64b0a0 commit 5c3cb76
Show file tree
Hide file tree
Showing 7 changed files with 482 additions and 52 deletions.
2 changes: 2 additions & 0 deletions metplotpy/plots/config/tcmpr_defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,5 @@ subtitle: ''
prefix:
baseline_file: ./hfip_baseline.dat
column_info_file: ./plot_tcmpr_hdr.dat
is_tcdiag_linetype: False
connect_points: False
128 changes: 93 additions & 35 deletions metplotpy/plots/tcmpr_plots/box/tcmpr_point.py
Original file line number Diff line number Diff line change
@@ -1,37 +1,45 @@
import os
from datetime import datetime

import plotly.graph_objects as go

from metplotpy.plots import util
from metplotpy.plots.tcmpr_plots.box.tcmpr_box_point import TcmprBoxPoint
from metplotpy.plots.tcmpr_plots.tcmpr_series import TcmprSeries


class TcmprPoint(TcmprBoxPoint):
def __init__(self, config_obj, column_info, col, case_data, input_df, baseline_data):
super().__init__(config_obj, column_info, col, case_data, input_df, baseline_data)
print("--------------------------------------------------------")
print(f"Plotting POINT time series by {self.config_obj.series_val_names[0]}")
def __init__(self, config_obj, column_info, col, case_data, input_df, baseline_data, stat_name):
super().__init__(config_obj, column_info, col, case_data, input_df, baseline_data, stat_name)
# Set up Logging
self.point_logger = util.get_common_logger(self.config_obj.log_level, self.config_obj.log_filename)

self._adjust_titles()
self.series_list = self._create_series(self.input_df)
self.point_logger.info("--------------------------------------------------------")
self.point_logger.info(f"Plotting POINT time series by {self.config_obj.series_val_names[0]}")
start = datetime.now()

self._adjust_titles(stat_name)
self.series_list = self._create_series(self.input_df, stat_name)
self.case_data = None
self.cur_baseline = baseline_data['cur_baseline']
self.cur_baseline_data = baseline_data['cur_baseline_data']
self._init_hfip_baseline_for_plot()

if self.config_obj.prefix is None or len(self.config_obj.prefix) == 0:
self.plot_filename = f"{self.config_obj.plot_dir}{os.path.sep}{self.config_obj.list_stat_1[0]}_pointplot.png"
self.plot_filename = f"{self.config_obj.plot_dir}{os.path.sep}{stat_name}_pointplot.png"
else:
self.plot_filename = f"{self.config_obj.plot_dir}{os.path.sep}{self.config_obj.prefix}_pointplot.png"
self.plot_filename = f"{self.config_obj.plot_dir}{os.path.sep}{self.config_obj.prefix}_{stat_name}_pointplot.png"
# remove the old file if it exists

# remove the old file if it exist
if os.path.exists(self.plot_filename):
os.remove(self.plot_filename)
self._create_figure()

def _adjust_titles(self):
self.point_logger.info(f"Finished generating the TCMPR points in {datetime.now() - start} ms")

def _adjust_titles(self, stat_name):
if self.yaxis_1 is None or len(self.yaxis_1) == 0:
self.yaxis_1 = self.config_obj.list_stat_1[0] + '(' + self.col['units'] + ')'
self.yaxis_1 = stat_name + '(' + self.col['units'] + ')'

if self.title is None or len(self.title) == 0:
self.title = 'Point Plots of ' + self.col['desc'] + ' by ' \
Expand All @@ -57,28 +65,78 @@ def _draw_series(self, series: TcmprSeries) -> None:
boxpoints = 'all'

# create a trace
self.figure.add_trace(
go.Box(x=series.series_data['LEAD_HR'],
y=series.series_data['PLOT'],
mean=series.series_points['mean'],
notched=self.config_obj.box_notch,
line=line_color,
fillcolor=fillcolor,
name=series.user_legends,
showlegend=True,
# quartilemethod='linear', #"exclusive", "inclusive", or "linear"
boxmean=self.config_obj.box_avg,
boxpoints=boxpoints, # outliers, all, False
pointpos=0,
marker=dict(size=4,
color=marker_color,

# line plot, when connect_points is False in config file
if 'point' in self.config_obj.plot_type_list:
if self.config_obj.connect_points:
# line plot
mode = 'lines+markers'
else:
# points only
mode = 'markers'
# Create a point plot

# Ensure that the size of the list of x and y values
# are the same, or the resulting plot will be incorrect.
# This mismatch occurs when the x_list represents the
# available lead hours in the series data and the
# series_points has None where there isn't data corresponding
# to lead hours in the series_points dataframe.
#
y_list = series.series_points['mean']
x_list = series.series_data['LEAD_HR']
if len(x_list) != len(y_list):
# Clean up None values in the series.series_points['mean'] list
# The None values are assigned by the _create_series_points() method.
y_list = [y_values for y_values in y_list if y_values is not None]

self.figure.add_trace(
go.Scatter(x=x_list,
y=y_list,
showlegend=True,
mode=mode,
name=self.config_obj.user_legends[series.idx],
marker=dict(
color=marker_line_color,
size=8,
opacity=0.7,
line=dict(
width=1,
color=marker_line_color
),
symbol=marker_symbol,
),
jitter=0
),
secondary_y=series.y_axis != 1
)
color=self.config_obj.colors_list[series.idx],
width=1
)
),
),
secondary_y=series.y_axis != 1
)

# When a line plot is requested, connect any gaps
if self.config_obj.connect_points:
self.figure.update_traces(connectgaps=True)

else:
# Boxplot
self.figure.add_trace(
go.Box(x=series.series_data['LEAD_HR'],
y=series.series_data['PLOT'],
mean=series.series_points['mean'],
notched=self.config_obj.box_notch,
line=line_color,
fillcolor=fillcolor,
name=series.user_legends,
showlegend=True,
boxmean=self.config_obj.box_avg,
boxpoints=boxpoints, # outliers, all, False
pointpos=0,
marker=dict(size=4,
color=marker_color,
line=dict(
width=1,
color=marker_line_color
),
symbol=marker_symbol,
),
jitter=0
),
secondary_y=series.y_axis != 1
)

54 changes: 41 additions & 13 deletions metplotpy/plots/tcmpr_plots/tcmpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,8 +420,11 @@ def save_to_file(self):

# Create the directory for the output plot if it doesn't already exist
dirname = os.path.dirname(os.path.abspath(self.plot_filename))
if not os.path.exists(dirname):
os.mkdir(dirname)
try:
os.makedirs(dirname, exist_ok=True)
except FileExistsError:
pass

self.logger.info(f'Saving the image file: {self.plot_filename}')
if self.figure:
try:
Expand Down Expand Up @@ -511,8 +514,6 @@ def perform_event_equalization(input_df:pd.DataFrame, is_skill:bool, config_obj:
return output_data




def main(config_filename=None):
"""
Generates a sample, default, TCMPR plot using a combination of
Expand Down Expand Up @@ -553,6 +554,24 @@ def main(config_filename=None):

config_obj = TcmprConfig(docs)

# Create the requested plot(s)
create_plot(config_obj)


def create_plot(config_obj: dict) -> None:
"""
One or more TCMPR plots is generated. Event equalization is performed if
it was requested by a setting in the yaml configuration file.
Args:
@param config_obj: The config object containing all the necessary information obtained
from the yaml configuration file.
Returns: None, creates one or more plots as specified in the yaml config file
"""

# Find input files, they must have the .tcst extension and filename must have
# the prefix "tc_pairs" (e.g. tc_pairs_gfso_20220401.tcst)
tcst_files = []
# list all .tcst files in tcst_dir
if config_obj.tcst_dir is not None and len(config_obj.tcst_dir) > 0 and os.path.exists(config_obj.tcst_dir):
Expand All @@ -566,7 +585,9 @@ def main(config_filename=None):
input_df = orig_input_df.copy(deep=True)

# Define a demo and retro column
# TODO these values never get used comment out for now

# Note: Currently not supported, leave commented out for now.

# input_df = orig_input_df.copy(deep=True)
# if config_obj.demo_yr is not None and config_obj.demo_yr != 'NA':
# demo_yr_obj = datetime.strptime(str(config_obj.demo_yr), '%Y')
Expand All @@ -579,18 +600,19 @@ def main(config_filename=None):
quotechar='"', skipinitialspace=True, encoding='utf-8')

logger = util.get_common_logger(config_obj.log_level, config_obj.log_filename)
\
for plot_type in config_obj.plot_type_list:

# Apply event equalization, if requested
# Event equalization is different for the skill_mn and skill_md
is_skill = False
if config_obj.use_ee:
if plot_type == 'skill_mn' or plot_type == 'skill_md':
is_skill = True
# perform event equalization on the skill_mn|skill_md plot type
logger.info(f"Perform event equalization for {plot_type}: {datetime.now()}")
output_result = perform_event_equalization(orig_input_df, is_skill, config_obj)
input_df = output_result
is_skill = True
# perform event equalization on the skill_mn|skill_md plot type
logger.info(f"Perform event equalization for {plot_type}: {datetime.now()}")
output_result = perform_event_equalization(orig_input_df, is_skill, config_obj)
input_df = output_result
else:
logger.info(f"Perform event equalization for {plot_type}: {datetime.now()}")
output_result = perform_event_equalization(orig_input_df, is_skill, config_obj)
Expand Down Expand Up @@ -641,7 +663,7 @@ def main(config_filename=None):
elif plot_type == 'skill_mn':
from metplotpy.plots.tcmpr_plots.skill.mean.tcmpr_skill_mean import TcmprSkillMean
plot = TcmprSkillMean(config_obj, column_info, col_to_plot, common_case_data, input_df,
cur_stat, baseline_data)
cur_stat, baseline_data)
elif plot_type == 'skill_md':
from metplotpy.plots.tcmpr_plots.skill.median.tcmpr_skill_median import TcmprSkillMedian
plot = TcmprSkillMedian(config_obj, column_info, col_to_plot, common_case_data, input_df, cur_stat)
Expand Down Expand Up @@ -683,7 +705,10 @@ def read_tcst_files(config_obj, tcst_files):
for file in tcst_files:
if os.path.exists(file):
print(f'Reading track data:{file}')
file_df = pd.read_csv(file, sep=r'\s+|;|:', header='infer', engine="python")
if config_obj.is_tcdiag:
file_df = pd.read_csv(file, sep='\t')
else:
file_df = pd.read_csv(file, sep=r'\s+|;|:', header='infer', engine="python")
file_df['LEAD_HR'] = file_df['LEAD'] / 10000
file_df['LEAD_HR'] = file_df['LEAD_HR'].astype('int')
all_filters = []
Expand All @@ -704,7 +729,10 @@ def read_tcst_files(config_obj, tcst_files):
# use numpy to select the rows where any record evaluates to True
mask = np.array(all_filters).all(axis=0)

file_df['VALID_TIME'] = pd.to_datetime(file_df['VALID'], format='%Y%m%d_%H%M%S') # 20170417_060000
if config_obj.is_tcdiag:
file_df['VALID_TIME'] = file_df['VALID']
else:
file_df['VALID_TIME'] = pd.to_datetime(file_df['VALID'], format='%Y%m%d_%H%M%S') # 20170417_060000
# Define a case column
file_df['equalize'] = file_df.loc[:, 'BMODEL'].astype(str) \
+ ':' + file_df.loc[:, 'STORM_ID'].astype(str) \
Expand Down
7 changes: 6 additions & 1 deletion metplotpy/plots/tcmpr_plots/tcmpr_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class TcmprConfig(Config):
Prepares and organises Line plot parameters
"""
SUPPORTED_PLOT_TYPES = ['boxplot', 'point', 'mean', 'median', 'relperf', 'rank', 'skill_mn', 'skill_md']

def __init__(self, parameters: dict) -> None:
""" Reads in the plot settings from a box plot config file.
Expand All @@ -37,6 +38,9 @@ def __init__(self, parameters: dict) -> None:
"""
super().__init__(parameters)

self.is_tcdiag = self._get_bool('is_tcdiag_linetype')
self.connect_points = self._get_bool('connect_points')

# Logging
self.log_filename = self.get_config_value('log_filename')
self.log_level = self.get_config_value('log_level')
Expand Down Expand Up @@ -251,7 +255,8 @@ def _get_hfip_bsln(self) -> str:
"""

hfip_bsln = str(self.get_config_value('hfip_bsln'))
hfip_bsln = hfip_bsln.lower()
hfip_bsln_lower = hfip_bsln.lower()


# Validate that hfip_bsln is one of the following; (no, 0, 5, 10 year goal)
supported_bsln = ['no', '0', '5', '10']
Expand Down
12 changes: 9 additions & 3 deletions metplotpy/plots/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@
__author__ = 'Minna Win'

import argparse
from typing import Tuple
import sys
import getpass
import os
import logging
import gc
import re
Expand Down Expand Up @@ -316,6 +315,13 @@ def get_common_logger(log_level, log_filename):
currently in use by a plot type.
'''

# If directory for logfile doesn't exist, create it
log_dir = os.path.dirname(log_filename)
try:
os.makedirs(log_dir, exist_ok=True)
except OSError:
pass

# Supported log levels.
log_level = log_level.upper()
log_levels = {'DEBUG': logging.DEBUG, 'INFO': logging.INFO,
Expand All @@ -338,7 +344,7 @@ def get_common_logger(log_level, log_filename):
datefmt='%Y-%m-%d %H:%M:%S',
filename=log_filename,
filemode='w')
mpl_logger = logging.getLogger(name='matplotlib').setLevel(logging.CRITICAL)
logging.getLogger(name='matplotlib').setLevel(logging.CRITICAL)
common_logger = logging.getLogger(__name__)
f = cf()
common_logger.addFilter(f)
Expand Down
Loading

0 comments on commit 5c3cb76

Please sign in to comment.