Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcp-631 added instructions for using direct ENA files upload #633

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

prabh-t
Copy link
Contributor

@prabh-t prabh-t commented Feb 7, 2022

No description provided.

Copy link
Contributor

@ke4 ke4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.


### Uploading files directly to ENA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think this is ready to be put in this SOP for archiving. This step is not part of the current DSP integration. We should have the direct archiving full ENA archiving first before putting in the operational Archiving SOP (which still uses DSP).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not ready, than we don't merge it yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or specify that it is not relevant until the task is finished.

@ESapenaVentura
Copy link
Collaborator

ESapenaVentura commented Feb 8, 2022

Agree with @aaclan-ebi above, maybe if we don't want to loose the progress on writing this we can add it to a section at the end (e.g. WIP ENA archiving, not ready yet or something like that)

@@ -99,6 +99,8 @@ Once a submission is ready in ingest (`Archiving` status after hitting submit),

## Step 2 of 3 - Archiving Files to DSP

_Note: When direct archiving is fully functional, the instructions at the end of this step on __Uploading files directly to ENA__ should be followed instead of the following._

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not completely clear on this. Does this mean that direct archiving, which is simpler, is not yet functional, i.e. we cannot follow that step yet? Is there a way to make it more clear which steps cannot be run currently, for example, by highlighting that section in a different colour?

Copy link
Contributor Author

@prabh-t prabh-t Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ami-day sorry for the confusion. you (and @aaclan-ebi - see her comment below) are right, the direct archiving is not fully functional. This is only about testing the direct data file upload to ENA using the new ingest-archiver endpoint. The steps in this PR should not be part of the SOP yet, to avoid confusion. I have updated the ticket with some steps how you could test this. Let me know if this makes sense. To be fair I am not sure if this really requires user testing, perhaps not at this point.

Copy link
Contributor

@ami-day ami-day left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have listed some suggestions below:

  • the example json file shows the file structure for 1 set of fastq files (R1,R2,I1). The script runs with 1 json file. Assuming that json file contains all sequencing files, could the example include 1 other run (an additional R1,R2,I1) for purpose of demonstration?

  • I am unsure why the bam file conversion is mentioned in this SOP. We do not submit bam files to the HCA DCP, and I believe we would not aim to convert the fastq files we upload to ENA to bam format if the dataset is an HCA dataset (which it should always be in our case).

  • It could be more clear who will run what parts of this SOP. Is the full SOP intended for both wranglers and developers, and if some parts are for developers only, which parts are they?

  • The instruction include running docker on EC2 with "docker run". Is the docker set up already so that we can run this command?

Other than these points, the SOP is very clear and readable. Thanks for your hard work!

Copy link
Contributor

@amnonkhen amnonkhen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name the branches properly. The name should include ticket number in dcp-nnn format and some meaningful title.

@prabh-t prabh-t changed the title added instructions for using direct ENA files upload dcp-631 added instructions for using direct ENA files upload Mar 4, 2022
@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.


### Uploading files directly to ENA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not ready, than we don't merge it yet.

@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.


### Uploading files directly to ENA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or specify that it is not relevant until the task is finished.

@amnonkhen
Copy link
Contributor

Merge this only when the direct archiving is functional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants