Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extract_plot_data() and fill value to autoplot() for type = "mosaic" #248

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joeycouse
Copy link

@joeycouse joeycouse commented Dec 10, 2021

Resolves part of #240

Just putting this out there to get your thoughts on the interface. I've implemented an extract_plot_data() function for the confusion matrix class which returns a list with the relevant plot data.

Would need to be extended for other autoplot() use cases e.g. roc_curve, etc. If you think this is something worth merging I can put together the other methods. Just wanted to get y'alls take before putting more effort into this. Thanks!

New autoplot(type = 'mosaic') correct prediction boxes filled with a light blue
image

library(tidyverse)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
#> 
#> Attaching package: 'yardstick'
#> The following object is masked from 'package:readr':
#> 
#>     spec

hpc_cv %>%
  conf_mat(obs, pred) %>%
  extract_plot_data(type = 'mosaic')
#> $data
#>    pred_type        ymin        ymax     xmin     xmax
#> 1    correct  0.00000000 -0.91577162    0.000 1769.000
#> 2  incorrect -0.92577162 -1.00547767    0.000 1769.000
#> 3  incorrect -1.01547767 -1.01886942    0.000 1769.000
#> 4  incorrect -1.02886942 -1.03000000    0.000 1769.000
#> 5  incorrect  0.00000000 -0.34415584 1786.335 2864.335
#> 6    correct -0.35415584 -0.95434137 1786.335 2864.335
#> 7  incorrect -0.96434137 -0.98660482 1786.335 2864.335
#> 8  incorrect -0.99660482 -1.03000000 1786.335 2864.335
#> 9  incorrect  0.00000000 -0.15533981 2881.670 3293.670
#> 10 incorrect -0.16533981 -0.69689320 2881.670 3293.670
#> 11   correct -0.70689320 -0.89864078 2881.670 3293.670
#> 12 incorrect -0.90864078 -1.03000000 2881.670 3293.670
#> 13 incorrect  0.00000000 -0.04326923 3311.005 3519.005
#> 14 incorrect -0.05326923 -0.34173077 3311.005 3519.005
#> 15 incorrect -0.35173077 -0.48634615 3311.005 3519.005
#> 16   correct -0.49634615 -1.03000000 3311.005 3519.005
#> 
#> $x_breaks
#>       VF        F        M        L 
#>  884.500 2325.335 3087.670 3415.005 
#> 
#> $y_breaks
#> [1] -0.4578858 -0.9656246 -1.0171735 -1.0294347
#> 
#> $tick_labels
#> [1] "VF" "F"  "M"  "L" 
#> 
#> $axis_labels
#> $axis_labels$y
#> [1] "Prediction"
#> 
#> $axis_labels$x
#> [1] "Truth"

Created on 2021-12-10 by the reprex package (v2.0.1)

@joeycouse joeycouse changed the title Add extract_plot_data() and fill argument to autoplot() for type = "mosiac" Add extract_plot_data() and fill value to autoplot() for type = "mosiac" Dec 10, 2021
@joeycouse joeycouse changed the title Add extract_plot_data() and fill value to autoplot() for type = "mosiac" Add extract_plot_data() and fill value to autoplot() for type = "mosaic" Dec 10, 2021
@juliasilge
Copy link
Member

Is there some existing generic we should use for this, rather than making a new one? fortify comes to mind, although the docs say not to use it.

@joeycouse
Copy link
Author

The only function I'm aware of that achieves something similar is the ggplot2::ggplot_build() which accepts a plot object and returns a dataframe of the plot data. Although the returned dataframe isn't in a format I think would address #248
adequately.

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

data("two_class_example")

two_class_example %>%
  conf_mat(truth, predicted) %>%
  autoplot() %>%
  ggplot_build() %>%
  pluck(1)
#> [[1]]
#>      fill  xmin  xmax       ymin       ymax PANEL group colour size linetype
#> 1 #4f58bd   0.0 258.0  0.0000000 -0.8798450     1     1     NA  0.5        1
#> 2  grey70   0.0 258.0 -0.8898450 -1.0100000     1     2     NA  0.5        1
#> 3  grey70 260.5 502.5  0.0000000 -0.2066116     1     2     NA  0.5        1
#> 4 #4f58bd 260.5 502.5 -0.2166116 -1.0100000     1     1     NA  0.5        1
#>   alpha
#> 1   0.9
#> 2   0.9
#> 3   0.9
#> 4   0.9

Created on 2021-12-17 by the reprex package (v2.0.1)

@topepo
Copy link
Member

topepo commented Dec 21, 2021

I don't think that it is a good idea to make a new generic. The tidy() method should translate the object to a tabular data structure that can be used as the substrate for the autoplot() method. The tidy() method for the confusion matrix is maybe not the best (I think that I wrote it) and can be improved to make some of your (and Julia's) code more concise.

That would be enough to get data for the heatmap but the mosaic plot would need some additional, non-tabular data. So, I propose:

  1. I'll update the PR to improve the tidy() method:

  2. @joeycouse can take their work on cm_mosaic_data() to make a function that we can export so facilitate custom mosaic plots.

I don't think that we need cm_heat_data() nor do we need to export get_axis_labels().

@DavisVaughan and @juliasilge how does that sound?

@juliasilge
Copy link
Member

I think this sounds like a good way to go 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants