dcp-631 added instructions for using direct ENA files upload #633

prabh-t · 2022-02-07T15:41:01Z

No description provided.

ke4

LGTM 👍

aaclan-ebi · 2022-02-08T12:33:59Z

docs/SOPs/archiving_SOP.md

@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

 If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.

+
+### Uploading files directly to ENA


Hmm, I don't think this is ready to be put in this SOP for archiving. This step is not part of the current DSP integration. We should have the direct archiving full ENA archiving first before putting in the operational Archiving SOP (which still uses DSP).

If it is not ready, than we don't merge it yet.

or specify that it is not relevant until the task is finished.

ESapenaVentura · 2022-02-08T13:58:32Z

Agree with @aaclan-ebi above, maybe if we don't want to loose the progress on writing this we can add it to a section at the end (e.g. WIP ENA archiving, not ready yet or something like that)

ami-day · 2022-02-15T13:32:10Z

docs/SOPs/archiving_SOP.md

@@ -99,6 +99,8 @@ Once a submission is ready in ingest (`Archiving` status after hitting submit),

 ## Step 2 of 3 - Archiving Files to DSP

+_Note: When direct archiving is fully functional, the instructions at the end of this step on __Uploading files directly to ENA__ should be followed instead of the following._
+


I am still not completely clear on this. Does this mean that direct archiving, which is simpler, is not yet functional, i.e. we cannot follow that step yet? Is there a way to make it more clear which steps cannot be run currently, for example, by highlighting that section in a different colour?

@ami-day sorry for the confusion. you (and @aaclan-ebi - see her comment below) are right, the direct archiving is not fully functional. This is only about testing the direct data file upload to ENA using the new ingest-archiver endpoint. The steps in this PR should not be part of the SOP yet, to avoid confusion. I have updated the ticket with some steps how you could test this. Let me know if this makes sense. To be fair I am not sure if this really requires user testing, perhaps not at this point.

ami-day

I have listed some suggestions below:

the example json file shows the file structure for 1 set of fastq files (R1,R2,I1). The script runs with 1 json file. Assuming that json file contains all sequencing files, could the example include 1 other run (an additional R1,R2,I1) for purpose of demonstration?
I am unsure why the bam file conversion is mentioned in this SOP. We do not submit bam files to the HCA DCP, and I believe we would not aim to convert the fastq files we upload to ENA to bam format if the dataset is an HCA dataset (which it should always be in our case).
It could be more clear who will run what parts of this SOP. Is the full SOP intended for both wranglers and developers, and if some parts are for developers only, which parts are they?
The instruction include running docker on EC2 with "docker run". Is the docker set up already so that we can run this command?

Other than these points, the SOP is very clear and readable. Thanks for your hard work!

amnonkhen

Please name the branches properly. The name should include ticket number in dcp-nnn format and some meaningful title.

amnonkhen · 2022-03-04T11:36:36Z

docs/SOPs/archiving_SOP.md

@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

 If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.

+
+### Uploading files directly to ENA


If it is not ready, than we don't merge it yet.

amnonkhen · 2022-03-04T11:36:51Z

docs/SOPs/archiving_SOP.md

@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

 If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.

+
+### Uploading files directly to ENA


or specify that it is not relevant until the task is finished.

amnonkhen · 2022-03-04T11:44:02Z

Merge this only when the direct archiving is functional.

added instructions for using direct ENA files upload

0745df8

prabh-t mentioned this pull request Feb 7, 2022

Create an external available endpoint to trigger data archiving ebi-ait/dcp-ingest-central#631

Closed

4 tasks

prabh-t requested review from ESapenaVentura and ke4 February 7, 2022 15:45

ke4 approved these changes Feb 7, 2022

View reviewed changes

aaclan-ebi reviewed Feb 8, 2022

View reviewed changes

ami-day reviewed Feb 15, 2022

View reviewed changes

ami-day suggested changes Feb 15, 2022

View reviewed changes

amnonkhen requested changes Feb 18, 2022

View reviewed changes

prabh-t changed the title ~~added instructions for using direct ENA files upload~~ dcp-631 added instructions for using direct ENA files upload Mar 4, 2022

amnonkhen mentioned this pull request Mar 4, 2022

Archiving to ENA - doc improvements ebi-ait/dcp-ingest-central#690

Open

amnonkhen approved these changes Mar 4, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dcp-631 added instructions for using direct ENA files upload #633

dcp-631 added instructions for using direct ENA files upload #633

prabh-t commented Feb 7, 2022

ke4 left a comment

aaclan-ebi Feb 8, 2022

amnonkhen Mar 4, 2022

amnonkhen Mar 4, 2022

ESapenaVentura commented Feb 8, 2022 •

edited

Loading

ami-day Feb 15, 2022

prabh-t Feb 18, 2022 •

edited

Loading

ami-day left a comment

amnonkhen left a comment

amnonkhen Mar 4, 2022

amnonkhen Mar 4, 2022

amnonkhen commented Mar 4, 2022

		@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent

		If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name.


		### Uploading files directly to ENA

		@@ -99,6 +99,8 @@ Once a submission is ready in ingest (`Archiving` status after hitting submit),

		## Step 2 of 3 - Archiving Files to DSP

		_Note: When direct archiving is fully functional, the instructions at the end of this step on __Uploading files directly to ENA__ should be followed instead of the following._

dcp-631 added instructions for using direct ENA files upload #633

Are you sure you want to change the base?

dcp-631 added instructions for using direct ENA files upload #633

Conversation

prabh-t commented Feb 7, 2022

ke4 left a comment

Choose a reason for hiding this comment

aaclan-ebi Feb 8, 2022

Choose a reason for hiding this comment

amnonkhen Mar 4, 2022

Choose a reason for hiding this comment

amnonkhen Mar 4, 2022

Choose a reason for hiding this comment

ESapenaVentura commented Feb 8, 2022 • edited Loading

ami-day Feb 15, 2022

Choose a reason for hiding this comment

prabh-t Feb 18, 2022 • edited Loading

Choose a reason for hiding this comment

ami-day left a comment

Choose a reason for hiding this comment

amnonkhen left a comment

Choose a reason for hiding this comment

amnonkhen Mar 4, 2022

Choose a reason for hiding this comment

amnonkhen Mar 4, 2022

Choose a reason for hiding this comment

amnonkhen commented Mar 4, 2022

ESapenaVentura commented Feb 8, 2022 •

edited

Loading

prabh-t Feb 18, 2022 •

edited

Loading