-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix in RNA calling pipeline #475
Comments
Great to hear from you @Gou-29 ! For those obvious bugs could you send in PR so we can evaluate and merge? For others please leave them as discussions |
Hi @Gou-29 , thank you for pointing out the issues. Re:
Add the Regarding the issue with Please feel free to implement other changes and send in a PR. |
I also have a small question about the message: |
@Gou-29 Hi I wonder is the new version of RNA_Seq sufficient to accommodate your need for SE RNA processing |
@hsun3163 Thanks a lot for your update then! All things ran smoothly then. The only question on my side is that my results for |
@Gou-29 for the samples with wrong gct matrices, could you share with us one example for the multi-QC output? We can see if the reads quality are low and RNASEQC filtered them out as a result ... |
Hi @Gou-29 It most likely is not due to your data. Would u mind pull the new update and try again? |
@hsun3163 @gaow I have tested the new pipeline and the Further, I may still not be able to use >8 cores on my end by parsing Our server will shut down for Christmas for energy saving. I will be able to test updates after 26th. Thanks a lot for all your efforts and Merry Xmas! |
Dear Prof. Wang and other mates:
It has been a long time since I leave this project team. Currently, I am at a new lab and doing some work. Our RNA-seq data is very suitable for me to fully utilize & testing the existing pipeline.
Basically, I am now using the RNA_calling pipeline and have experienced these problems:
Trimmomatic
)Our data have multiple adaptor sequences in each lane. I tested your recommended tool
fastp
but it did not work very properly. We go back to theTrimmomatic
and the following new workflow can be used to deal with SE RNA-seq:fastqc
:The line
unzip -o ${_output[0]:n}.zip -d ${cwd}
will leads to an error when using SE data. Though this may not have any impact on the end result.STAR_align_1
main workflow:--sjdbOverhang ${sjdbOverhang if sjdbOverhang != 0 else _read_length}
. In some cases, one may not know the read length in constructing the very firstsample.list
. Only after QC you may have idea of the average read length. In the documentation ofSTAR
, it claimed that using the default value100
will work as well as the optimal value. If the read length column is not provided at first stage, there will raise an error. This will also affect the input definition in lineinput: fastq,group_by = is_paired_end + 1,group_with = {"sample_id","read_length"}
rm -r ${_output[0]:nnnn}._STARtmp
. It seems that it should be replaced byrm -r ${_output[0]:nnnn}_STARpass1
(based on our running result)rsem_call_4, rnaseqc_call_4
In
R
functionreadPicard.alignment_summary_metrics
. To read this file, the linem <- read.table(files[i], header=TRUE, sep="\t", comment.char="#", stringsAsFactors=FALSE, nrows=2)
, thenrow = 2
parameter should be corrected tonrow = 1
as the function included the header line. Otherwise, there will raise an error.fasta.fasta
typo in the minimum exampleThanks a lot for this wonderful pipeline, which literally freed a bunch of my time. Hope all these bug reports will be helpful on your end too!
The text was updated successfully, but these errors were encountered: