-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dcp-631 added instructions for using direct ENA files upload #633
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent | |||
|
|||
If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name. | |||
|
|||
|
|||
### Uploading files directly to ENA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think this is ready to be put in this SOP for archiving. This step is not part of the current DSP integration. We should have the direct archiving full ENA archiving first before putting in the operational Archiving SOP (which still uses DSP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not ready, than we don't merge it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or specify that it is not relevant until the task is finished.
Agree with @aaclan-ebi above, maybe if we don't want to loose the progress on writing this we can add it to a section at the end (e.g. |
@@ -99,6 +99,8 @@ Once a submission is ready in ingest (`Archiving` status after hitting submit), | |||
|
|||
## Step 2 of 3 - Archiving Files to DSP | |||
|
|||
_Note: When direct archiving is fully functional, the instructions at the end of this step on __Uploading files directly to ENA__ should be followed instead of the following._ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not completely clear on this. Does this mean that direct archiving, which is simpler, is not yet functional, i.e. we cannot follow that step yet? Is there a way to make it more clear which steps cannot be run currently, for example, by highlighting that section in a different colour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ami-day sorry for the confusion. you (and @aaclan-ebi - see her comment below) are right, the direct archiving is not fully functional. This is only about testing the direct data file upload to ENA using the new ingest-archiver endpoint. The steps in this PR should not be part of the SOP yet, to avoid confusion. I have updated the ticket with some steps how you could test this. Let me know if this makes sense. To be fair I am not sure if this really requires user testing, perhaps not at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have listed some suggestions below:
-
the example json file shows the file structure for 1 set of fastq files (R1,R2,I1). The script runs with 1 json file. Assuming that json file contains all sequencing files, could the example include 1 other run (an additional R1,R2,I1) for purpose of demonstration?
-
I am unsure why the bam file conversion is mentioned in this SOP. We do not submit bam files to the HCA DCP, and I believe we would not aim to convert the fastq files we upload to ENA to bam format if the dataset is an HCA dataset (which it should always be in our case).
-
It could be more clear who will run what parts of this SOP. Is the full SOP intended for both wranglers and developers, and if some parts are for developers only, which parts are they?
-
The instruction include running docker on EC2 with "docker run". Is the docker set up already so that we can run this command?
Other than these points, the SOP is very clear and readable. Thanks for your hard work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name the branches properly. The name should include ticket number in dcp-nnn format and some meaningful title.
@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent | |||
|
|||
If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name. | |||
|
|||
|
|||
### Uploading files directly to ENA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not ready, than we don't merge it yet.
@@ -368,6 +370,27 @@ bsub -J laurenti_upload -M 64000 'singularity run -B /nfs/production/hca/laurent | |||
|
|||
If running parallel jobs, choose different <file_name> / <job_names> because you will have multiple file-upload-infos and multiple jobs, in this case I’ve used the name of the output bam file as the job_name and file_name. | |||
|
|||
|
|||
### Uploading files directly to ENA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or specify that it is not relevant until the task is finished.
Merge this only when the direct archiving is functional. |
No description provided.