Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treatment of filenames in ADE is inconsistent #12

Open
davidschlangen opened this issue Jun 27, 2019 · 1 comment
Open

treatment of filenames in ADE is inconsistent #12

davidschlangen opened this issue Jun 27, 2019 · 1 comment

Comments

@davidschlangen
Copy link
Contributor

Problem: For ADE20k, our usual way of denoting an image through an image_id doesn't work. First, the images are inside of a nested structure, which cannot be predicted from the image id. E.g., ADE_train_00000994.jpg is to be found in training/a/abbey/. Second, the same image ID may occur in training/ and in testing/.

(Actually, I'm not so sure now anymore whether the imageID, which ultimately is coming from index_ade20k.mat, is the number in the filename. But I am fairly certain that the image_id that we used was non-unique w/o the split.)

So even knowing the split (which could be encoded into the image_id, by adding a constant number so that everything beyond that is from split B) is not enough to get the image; for that you need to know the category as well.

This problem surfaces at two places. During extraction, going through a dataframe like ade_objdf, as it is at the moment, is not enough, because that doesn't have the image category and the split. (Actually, it does have the split, as that is encoded into the image id.) So to get the image, one would need to load a different structure that goes from image id + split to the fully qualified path.

It is also a problem for our usual encoding of the image in the feature file, where we have only three numerical fields. This constraint makes it necessary to encode the split info (which minimally is needed to disambiguate the image_id) into the image id.

Possible solution:

  • Add the fields to the ADE dataframes, so that during extraction only the one dataframe needs to be consulted, in the same way as for all other corpora as well. (Rather than loading another dataframe with the mapping between id+split and full filename.)
  • Encode the split into the image id, in the feature file. Then, when one wants to go from feature row to the corresponding image, e.g. for visualisation of the image, one will need to have this mapping available. But that seems ok, since that is a special case and then the mapping dataframe can be explicitly loaded.

(But the API to get_image_filename should be cleaned up in any way, and split and category should be made into keyword arguments that are passed along to get_ade_filename.)

@davidschlangen
Copy link
Contributor Author

So the upshot is: The dataframes should not use the split-into-imageID trick, but rather add split and category explicitly. (And so, it should be removed from preproc.) The feature file will still have to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant