Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to overwrite existing directory #558

Open
gibsramen opened this issue May 20, 2022 · 4 comments · May be fixed by #559
Open

Option to overwrite existing directory #558

gibsramen opened this issue May 20, 2022 · 4 comments · May be fixed by #559

Comments

@gibsramen
Copy link
Collaborator

I'd like to use Empress in a Snakemake pipeline (see below) but I'm running into an annoying issue. If my understanding is correct (which it could very well not be), the way Snakemake works is that it sees that the results/empress directory does not exist and thus creates it before the shell command executes. However, when this occurs Empress complains that Output directory already exists!. I think what would be useful is for some sort of --overwrite option to force the creation of the visualization + support files.

In my case, I can work around this by changing some of the filepaths but I think this might be a useful option.

rule empress:
    input:
        tree="results/diversity/metagenomics/phylo_rpca/labeled-phylogeny.nwk",
        feature_md="results/tmp/feature_metadata.tmp.tsv"
    output:
        "results/empress/empress.html"
    shell:
        """
        empress tree-plot \
            --tree {input.tree} \
            --feature-metadata {input.feature_md} \
            --output-dir results/empress
        """
@fedarko
Copy link
Collaborator

fedarko commented May 21, 2022

Eesh, that's annoying! It should be feasible to add this in. Looks like the lines causing the problem are here:

def check_and_process_files(output_dir, tree_file, feature_metadata):
"""Initial checks and processing of files for standalone CLI plotting.
Parameters
----------
output_dir : str
tree_file : str
fm_file : str
Returns
-------
bp.Tree
pd.DataFrame
"""
if os.path.isdir(output_dir):
raise OSError("Output directory already exists!")

... So we can probably just add an --overwrite / --no-overwrite flag (default --no-overwrite, i.e. the current behavior) to the standalone script, and then pass the value of this flag directly to check_and_process_files() to tell it to ignore the output directory already existing.

(Maybe we should add some code to check_and_process_files() to, when overwrite is True, just delete the contents of the output directory? This would limit potential complications with stuff already existing in the output directory, although it might be dangerous if people use this willy-nilly. Not sure what the best practice is.)

@gibsramen
Copy link
Collaborator Author

I think deleting the contents of the output directory might be too dangerous (who knows what other stuff people could have in there). If we could only overwrite empress.html and support_files/* I think that would be sufficient.

Also I wrote this code so it's my own fault this is an annoying issue 😅

@ElDeveloper
Copy link
Member

ElDeveloper commented May 23, 2022 via email

@gibsramen gibsramen self-assigned this May 23, 2022
@gibsramen gibsramen linked a pull request May 23, 2022 that will close this issue
@gibsramen
Copy link
Collaborator Author

So this specific instance can be circumvented with

output:
    directory("results/empress")

but I think it's still worth adding the --overwrite option. Will try to carve out some time to finish up my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants