Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set Up MVP Package For Label-Image Creation #5

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

AFg6K7h4fhy2
Copy link
Owner

@AFg6K7h4fhy2 AFg6K7h4fhy2 commented Oct 19, 2024

This PR constitutes a first-pass at adding:

  • An abstract Label class.
  • A Label for collected specimens called CollectionsLabel.
  • A Label for systematics called SystematicsLabel.
  • Utility functions for working with and generating multiple labels.
  • A notebook with label creation examples, including re-creating labels found on the Fossil Forum and other places.

@AFg6K7h4fhy2 AFg6K7h4fhy2 added first-pass A first-pass at a specific task; typically evolves into something more substantial later. feature A new tool or utility being added. labels Oct 19, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added this to the [October 14, October 25] milestone Oct 19, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 self-assigned this Oct 19, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 linked an issue Oct 19, 2024 that may be closed by this pull request
@AFg6K7h4fhy2
Copy link
Owner Author

The following comment details an issue being experienced.

Presently, Label is an abstract base class for CollectionsLabel. Label has many attributes and CollectionsLabel of course has some other, more specific attributes. When a user creates a CollectionsLabel, there should not be much needed if they want to make use of the default values for the CollectionsLabel specific parameters. Also, they should be able to change the order of the parameters to suit the order in which they want the group titles (e.g. "Collection" to appear).

As an example,

label1 = paleo_utils.CollectionsLabel(
    save_directory="../assets/saved_images/",
    id_number="2131",
    collection="AMNH",
    collector="Dr. Montague",
    location="Europe",
    coordinates=(40.7128, -74.0060),
    date_found="2023-01-01",
)

Should output the following:

ID: 2131
Collection: AMNH
Collector: Dr.Montague
Location: Europe
Coordinates: (40.7128, -74.0060)
Date Found: 2023-01-01

Changing the order in the paleo_utils.CollectionsLabel() call should change the output order. The problem I am having is in trying to determine whether the parameter name e.g. location in location= should be turned in the group title. There are certain parameters that aren't groups, such as coordinates_separate: bool = False which guides where the coordinates occur.

An idea is to have a class for a default label be separate from the class CollectionsLabel. The class CollectionsLabel then receives a blank label instance and can use that as the place to add the label.

Further thoughts will come soon.

@AFg6K7h4fhy2
Copy link
Owner Author

The solution to the aforementioned issue is to probably work with calls like the following:

label2 = paleo_utils.CollectionsLabel(
    collection="AMNH",
    id_number="12345",
    collector="Dr. Larson",
    custom_titles={
        "collection": "Repository: ",
        "id_number": "Catalog: ",
        "collector": "Finder: "
    }
)

@AFg6K7h4fhy2
Copy link
Owner Author

I asked a question on Code Review.

@AFg6K7h4fhy2
Copy link
Owner Author

Re: #5 (comment)

So that the answer to the CR question doesn't become lost, I've reproduced it below:

Code Review Answer

use pathlib

label = paleo_utils.CollectionsLabel(
    save_directory="../assets/saved_images/",
    date_found="2024-01-01",

Rather than str, consider storing that folder name as a Path. Then it will be significantly more convenient for clients to make use of.

Kudos on using ISO8601, so lexical order matches chronological order.

optional values

You consistently admit of None for each attribute. This resembles a relational database table lacking even a single NOT NULL clause in its DDL.

I encourage you to go back and reconsider whether a CollectionsLabel truly could have e.g. no collection and no id_number.

data type

Several date fields curiously use str rather than a datetime.

I imagine chrono_age: str gives you the flexibility to store human-friendly descriptions like "1.2 Bya" or "66 Mya", which would lead to unfortunate lexical ordering. And I'm concerned that someone might try to jam "late Devonian" in there. Consider using one or two numeric fields to store age, and worry about human-friendly formatting when outputting a retrieved record.

Giving size a type of str seems mind boggling to me, but I imagine you have your reasons. Perhaps it is a {small, medium, large} enum? Or some sizes are in square meters while others are in cubic meters?

idiom

Certainly this works,

    ... {key: kwargs[key] for key in kwargs}

but it is more usual to iterate over kwargs.items(). And in this particularly simple case you should use kwargs.copy() to produce the desired shallow copy.

But it's not clear why you'd want to make a copy at all, given that each function call makes a shallow copy already. The OP contains no unit test which would turn Red if we stop copying. If you are trying to conform to some documented attrs advice, then please add a # comment which cites the docs. If there is another concern at work here, please comment on it, and write a unit test which highlights the effect of interest. I claim that any valid calling code one might write wouldn't care about whether a copy had happened, that is, one could not demonstrate a Green test which turns Red when we remove that copying operation. If inheritance plays into this, show that in your unit test. As written the code is obscure -- I cannot think of a good reason for making a copy of the args.

(In an inheritance situation is common enough for a V2 child class to destructively .pop() off V2 args that it knows about, so they won't bother the V1 parent class. But that doesn't appear to be relevant here.)

zero items

You have declared that title_overrides cannot be None, which is a good choice.

    def __attrs_post_init__(self):
        ...
        if self.title_overrides:

No need for the guard, there. Just iterate over zero items in an empty dict, no harm done.

Please phrase the for as for key, value in self.title_overrides.items():, and use similar single-line formatting down in label().

Wouldn't a simple .update() suffice here?

meaningful identifier

        label_attrs = {
            attr.name ...

That looks more like you're creating a new names variable, to me.

Also, rather than merge that code fragment into the main branch, just delete it, along with the half dozen related lines of commented code that follow it.

isinstance()

            if (
                value is not None
                and not isinstance(value, dict)
            ):

Please simplify to

            if not isinstance(value, (dict, NoneType)):

(Your IDE will help you add from types import NoneType.)

embedded newlines

        return "\n".join(parts)

I note in passing that you apparently prohibit certain characters in all those fields, such as NEWLINE, but that wasn't enforced anywhere. If one crept in, it would corrupt this returned value, turning a single record into apparently multiple records.


Overall this seems a reasonable design.

expect to be able to use argument ordering ...

You didn't mention your target environment, but I'm willing to assume it does not include e.g. Jython, and will be restricted to just modern cPython. The python language distinguishes between unordered dict and the OrderedDict class which makes stronger guarantees. The cPython interpreter now imposes ordering on dict. The old promise was "an arbitrary order which is non-random", but that changed several years ago.

Starting with interpreter 3.7, dict iteration promises to preserve insertion order. Also, PEP 468 ensures that kwargs order shall match the arg order seen in the source code.

So, yes, your expectation is reasonable.

@AFg6K7h4fhy2 AFg6K7h4fhy2 added the High Priority A task that is of higher relative priority. label Nov 6, 2024
@AFg6K7h4fhy2
Copy link
Owner Author

Since it would not be nice to provide all the parameters covered in Label each time, having some templates available would be useful.

@AFg6K7h4fhy2
Copy link
Owner Author

AFg6K7h4fhy2 commented Nov 6, 2024

The workflow of Label creation needs to be better established. Does one make a Label, save the class, then reuse it each time they want to make a collections label or systematics label? This would imply that CollectionsLabel and SystematicsLabel might not be best of as their own classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new tool or utility being added. first-pass A first-pass at a specific task; typically evolves into something more substantial later. High Priority A task that is of higher relative priority.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Set Up MVP Package For Label-Image Creation
1 participant