用awk和sed快速将fasta格式的序列改成一行显示_生物软件圈_商圈

用awk和sed快速将fasta格式的序列改成一行显示

Some time when you want to change the fasta seq into one line, just as following. Then it will help you to do other process.
I know that perl and other script will do that, however, I would introduce two simple and fast way do achieve that with awk and sed.


> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
paxpaxpax
pax


> sq1 foofoofoobarfoofoofoo
> sq2 quxquxquxbarquxquxquxbarquxx
> sq3 paxpaxpaxpax

For awk:
awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' YourFile

For sed:
sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' YourFile

Today, I want to extract contig which is more 500bp from my aseembly result, So I do that as following:
sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' |awk '{if (length($5)>500 ) print ">contig-"FNR"\n"$5}'

打赏