Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python sampler for residential #308

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from
Open

Add python sampler for residential #308

wants to merge 25 commits into from

Conversation

rajeee
Copy link
Contributor

@rajeee rajeee commented Sep 28, 2022

Pull Request Description

Python version of the sampling script. There are also helper testing script that can verify if a given buildstock.csv is correct or not according to the TSVs.

Use buildstockbatch normally, and it will use the faster python based sampler automatically for residential.

To use the sampler stand alone, a new command resstock_sampler can be used.

>> resstock_sampler --help
Usage: resstock_sampler [OPTIONS] COMMAND [ARGS]...

  Perform sampling or verify existing samples (in buildstock.csv). Type
  `resstock_sampler sample --help` or `resstock_sampler verify --help` to know
  more.

Options:
  --help  Show this message and exit.

Commands:
  sample  Performs sampling for project and writes output csv file.
  verify  Checks the buildstock.csv file (BUILDSTOCK_FILE) for correctness.

There are two commands available sample to perform sampling, and verify to verify existing buildstock.csv for correctness.

> resstock_sampler sample --help
Usage: resstock_sampler sample [OPTIONS]

  Performs sampling for project and writes output csv file.

Options:
  -p, --project TEXT            The path to the project (most have
                                housing_characteristics folder inside)
                                [required]
  -n, --num_datapoints INTEGER  The number of datapoints to sample.
                                [required]
  -o, --output TEXT             The output filename with path.  [required]
  --help                        Show this message and exit.
> resstock_sampler verify --help
  Checks the buildstock.csv file (BUILDSTOCK_FILE) for correctness. BUILDSTOCK_FILE is considered correct if
  the probability distribution in project TSVs can result in the BUILDSTOCK_FILE using quota sampling.

  In addition to correctness verification, it also calculates the sample probability distribution error for the
  options in each TSVs between the BUILDSTOCK_FILE and what one would expect based on the probabilities. It also
  calculates smapling errors for each group in the TSV. An example is provided below to explain the error
  calculation further.
  Consider a project with three TSVs.
  Bedrooms.tsv
  ----------
  Option=1    Option=2    Option=3    Option=4    Option=5    sampling_probability
       0.2         0.2         0.2         0.2         0.2                     1.0
  
  Fan.tsv
  ----------
  Dependency=param1    Option=None    Option=Standard    Option=Premium    sampling_probability
                 1            0.35              0.35                0.3                     0.2
                 2            0.35              0.35                0.3                     0.2
                 3            0.35              0.35                0.3                     0.2
                 4            0.35              0.35                0.3                     0.2
                 5            0.35              0.35                0.3                     0.2
  
  AC.tsv
  ----------
  Dependency=param2    Option=Yes    Option=No    sampling_probability
               None           0.9          0.1                    0.35
           Standard           0.8          0.2                    0.35
            Premium           0.1          0.9                    0.3
  Quota sampling in the above project for 10 samples can generate a buildstock.csv that looks like this:
  buildstock.csv
  -------------
  Building  Bedrooms         Fan            AC
         *         1         None          Yes
         *         1         Standard      Yes
         *         2         None          Yes
         *         2         Standard      Yes
         *         3         None          Yes
         *         3         Standard      Yes
         *         4         None          Yes
         *         4         Standard      Yes
         *         5         None          Yes
         *         5         Standard       No
  
  For nsamples=10, the error calculation for each of the TSV will be as follows.
  For Bedrooms.tsv, distribution of various bedrooms is 0.2 in the buildstock.csv which exactly matches with the
  distribution in the TSV. Hences, max_option_error = total_option_error = 0.
  Since there are no dependencies, the max_group_error and total_group_error is also 0.
  
  For Fan.tsv, we expect the sample distribution for None, Standard and Premium to be 0.35, 0.35 and 0.3.
  The actual distribution is 0.5, 0.5 and 0.0. This gives absolute distribution errors as 0.15, 0.15 and 0.3.
  Hence, the max_option_error is 0.3 and total_option_error is 0.6
  There are 5 dependency groups [(1,), (2,), (3,), (4,), (5,)] in Fan.tsv with expected sample distribution of
  [0.2, 0.2, 0.2, 0.2, 0.2]. The actual probability distribution for these groups are also the same, both the
  max_group_error and total_group_error is 0.
  
  For AC.tsv, expect the sample distribution for Yes and No to be 0.625 (0.9 * 0.35 + 0.8 * 0.35 + 0.1 * 0.3) and
  0.375. The actual sample distribution we have for Yes and No is 0.9 and 0.1. This gives absolute distribution
  errors as 0.275 and 0.275. Hence, the max_option_error = 0.275 and total_option_error is 0.55.
  There are 3 dependency groups [(None,), (Standard,), (Premium,)] in AC.tsv with expected sample distribution of
  0.35, 0.35 and 0.3. The actual sample distribution we have for [(None,), (Standard,), (Premium,)] is 0.5, 0.5, 0.
  This gives absolute distribution error for group as 0.15, 0.15 and 0.3. Hence, max_group_error is 0.3 and
  total_group_eror is 0.6

Options:
  -p, --project TEXT  The path to the project (most have
                      housing_characteristics folder inside)  [required]
  -o, --output TEXT   The output filename for error report.
  --help              Show this message and exit.


Usage example:
resstock_sampler sample -p /Users/radhikar/Documents/resstock/project_national -o resstock_sampler_test.csv

resstock_sampler verify resstock_sampler_test.csv -p /Users/radhikar/Documents/resstock/project_national

Checklist

Not all may apply

  • Code changes (must work)
  • Tests exercising your feature/bug fix (check coverage report on Checks -> BuildStockBatch Tests -> Artifacts)
  • Coverage has increased or at least not decreased. Update minimum_coverage in .github/workflows/ci.yml as necessary.
  • All other unit tests passing
  • Update validation for project config yaml file changes
  • Update existing documentation
  • Run a small batch run to make sure it all works (local is fine, unless an Eagle specific feature)
  • Add to the changelog_dev.rst file and propose migration text in the pull request

@github-actions
Copy link

github-actions bot commented Sep 28, 2022

File Coverage
All files 82%
base.py 81%
eagle.py 74%
exc.py 57%
localdocker.py 27%
postprocessing.py 84%
utils.py 96%
sampler/base.py 69%
sampler/downselect.py 33%
sampler/precomputed.py 93%
sampler/residential_quota.py 60%
sampler/residential_sampler/sampler.py 71%
sampler/residential_sampler/sampling_utils.py 99%
test/test_validation.py 96%
workflow_generator/base.py 90%
workflow_generator/commercial.py 24%
workflow_generator/residential.py 96%
workflow_generator/residential_hpxml.py 59%

Minimum allowed coverage is 24%

Generated by 🐒 cobertura-action against 0aefb48

@rajeee rajeee marked this pull request as draft September 29, 2022 14:23
@rajeee rajeee marked this pull request as ready for review September 29, 2022 22:48
@rajeee rajeee requested a review from nmerket October 3, 2022 15:32
Copy link
Member

@nmerket nmerket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the thorough testing. Is this ready for primetime?

Rather than replacing the current ruby based sampler here, I'd prefer if you added a new sampler file and class. That way people can go back to the old one for a while during a transition period.

python-version: ['3.8', '3.9', '3.10']
python-version: ['3.10']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only testing Python 3.10 now? I'd like to keep some level of backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants