Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing taxonomy python notebook to script #245

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

park2454
Copy link

@park2454 park2454 commented Feb 6, 2023

missing_taxonomy.py creates a csv file for objects with missing labels based on Brian(@bfhealy)'s jupyter notebook. A classification feature parquet and a dataset mapper json file are required as input and the output are stored in "golden_missing_label.csv".

@bfhealy
Copy link
Collaborator

bfhealy commented Feb 6, 2023

Thanks @park2454! It looks like there are some lint errors related to unused imports in the code. Did you run pre-commit on your changes? If not, try:

pre-commit install
pre-commit run --files tools/missing_taxonomy.py

Then fix the errors, and commit/push again.

@park2454
Copy link
Author

park2454 commented Feb 6, 2023 via email

Copy link
Collaborator

@bfhealy bfhealy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping to make this notebook become part of the codebase! Please see the comments below for recommended changes.

Comment on lines +96 to +109
parser.add_argument(
"-merge_features",
type=bool,
nargs='?',
const=True,
default=False,
help="merge downloaded results with features from Kowalski",
)
parser.add_argument(
"-features_catalog",
type=str,
default='ZTF_source_features_DR5',
help="catalog of features on Kowalski",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments are not used by the code above, so they can be removed.

# Read in golden dataset (downloaded from Fritz), mapper
parquet_path = os.path.join(os.path.dirname(__file__), parquet)
mapper_path = os.path.join(os.path.dirname(__file__), mapper)
output_path = os.path.join(os.path.dirname(__file__), "golden_missing_labels.csv")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to allow the user to customize the name of this output file using another argument.

gold_map = gold_map.reset_index(drop=False).set_index('fritz_label')
gold_dict = gold_map.transpose().to_dict()

labels_gold = gold_new.set_index('obj_id')[gold_new.columns[1:54]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The columns corresponding to the labels may not always be 1:54. Perhaps we could use the mapper's keys or the config file to generate a list of classifications?

Comment on lines 8 to 9
import matplotlib.pyplot as plt
from collections import Counter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove these imports - while the notebook used them, this code does not.

@@ -0,0 +1,113 @@
import argparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above the first import, add this commented line: #!/usr/bin/env python

This allows users to run the script using ./missing_taxonomy.py in addition to python missing_taxonomy.py.

tools/missing_taxonomy.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants