楼主 | 收藏 | 举报 2018-03-08 00:00   浏览:160   回复:2

fastq2fasta by sed

I've been having fun with sed this morning. I'm testing using Illumina data as input to Newbler, just to see how it works. As the data is paired-end, it requires some pre-processing following the guidelines for Sanger reads in the manual for gsAssembler (A.K.A newbler). Yesterday I did this using the method proposed in the manual with a final cleanup using sed so the procedure was fastq-->fasta-->perl script to make .acc file --> use fnafile to produce fasta file --> final tidy using sed. That was all messy and horribly slow. So today I decided to do all of those steps using sed.

Here's the end result:

1) Convert fastq2fasta using sed

sed '/^@/!d;s//>/;N' asp5_leaf_read1.fq > asp5_leaf_read1.fna
sed '/^@/!d;s//>/;N' asp5_leaf_read2.fq > asp5_leaf_read2.fna

2) Format fasta header so newbler can match the pairs up

sed -e 's/>\(.*\)#0\/\(.*\)/>\1 template=\1 dir=\2 library=asp5/' -e 's/dir=1/dir=f/' asp5_leaf_read1.fna > asp5_leaf_read1_newbler.fna
sed -e 's/>\(.*\)#0\/\(.*\)/>\1 template=\1 dir=\2 library=asp5/' -e 's/dir=2/dir=r/' asp5_leaf_read2.fna > asp5_leaf_read1_newbler.fna

3) Produce qual files with read names matching the fasta files

sed '/^+/!d;s//>/;N' asp5_leaf_read1.fq | sed 's/>\(.*\)#0\/\(.*\)/>\1/' > asp5_leaf_read1_newbler.qual
sed '/^+/!d;s//>/;N' asp5_leaf_read2.fq | sed 's/>\(.*\)#0\/\(.*\)/>\1/' > asp5_leaf_read2_newbler.qual

Those qual values will then need rescaling to phred33 .... still to do. Steps 1 and 2 can also be piped if you don't want to keep the unmodified .fna files (but I did - for testing Inchworm). I also need to check whether to reverse compliment the reads for either paired-end or mate-pair data.

Those steps took about 15 mins to complete. The attempt yesterday took over six hours. Thank you sed!

打赏
沙发 | 回复 | 举报 2011-03-18 23:33
hi, ‘asp5′ and ‘leaf’ is just a filename without special means.
藤椅 | 回复 | 举报 2024-11-30 22:56
网站首页 | 关于我们 | 联系方式 | 使用协议 | 版权隐私 | 网站地图  |  排名推广  |  广告服务  |  积分换礼  |  网站留言  |  RSS订阅  |  违规举报
 
免责声明:本站有部分内容来自互联网,如无意中侵犯了某个媒体 、公司 、企业或个人等的知识产权,请来电或致函告之,本网站将在规定时间内给予删除等相关处理。