楼主 | 收藏 | 举报 2018-08-27 00:00   浏览:107   回复:8

Clustal难道不能批量运行?

相信Clustal程序是大家耳熟能详的序列比对软件了。其中ClustalX是windows下图形化界面的版本,ClustalW是基于命令行的版本,后者是做生物信息中一般用的比较多,今天的主角就是ClustalW。

接触ClustalW的人大多以为ClustalW是一个交互运行的软件(3年前小生本人也是这么以为的),命令很清晰但是一次只能处理一组数据,如下图所示:

近期一个师弟问我,有没有办法批量对一批同源蛋白序列group做序列比对,因为ClustalW只能手工一个group接着一个group地比对实在太麻烦了。于是我给他推荐了一系列比对软件,例如支持多线程的的mafft。同时ClustalW完全可以自动化批量运行。

其实在命令行下,如果仅输入clustalw的话就会出现上面的那张图的结果。但是大家可以尝试在后面加上-HELP,也就是运行:

clustalw -HELP

你会发现一堆参数信息(后面附录)。有了这些参数,用户在运行clustalw的时候就可以通过带参数的方式直接运行了,这样如果需要比对多组蛋白或者核苷酸序列的话,提前用perl程序把要运行的脚本或者命令生成好就ok了。到这里相信大家都清楚如何利用clustalw批量比对序列了吧。

[plain]
CLUSTAL 2.0.12 Multiple Sequence Alignments

DATA (sequences)

-INFILE=file.ext :input sequences.
-PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment).

VERBS (do things)

-OPTIONS :list the command line parameters
-HELP or -CHECK :outline the command line params.
-FULLHELP :output full help content.
-ALIGN :do full multiple alignment.
-TREE :calculate NJ tree.
-PIM :output percent identity matrix (while calculating the tree)
-BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
-CONVERT :output the input sequences in a different file format.

PARAMETERS (set things)

***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE :use FAST algorithm for the alignment guide tree
-TYPE= :PROTEIN or DNA sequences
-NEGATIVE :protein alignment with negative values in matrix
-OUTFILE= :sequence alignment file name
-OUTPUT= :GCG, GDE, PHYLIP, PIR or NEXUS
-OUTORDER= :INPUT or ALIGNED
-CASE :LOWER or UPPER (for GDE output only)
-SEQNOS= :OFF or ON (for Clustal output only)
-SEQNO_RANGE=:OFF or ON (NEW: for all output formats)
-RANGE=m,n :sequence range to write starting m to m+n
-MAXSEQLEN=n :maximum allowed input sequence length
-QUIET :Reduce console output to minimum
-STATS= :Log some alignents statistics to file

***Fast Pairwise Alignments:***
-KTUPLE=n :word size
-TOPDIAGS=n :number of best diags.
-WINDOW=n :window around best diags.
-PAIRGAP=n :gap penalty
-SCORE :PERCENT or ABSOLUTE

***Slow Pairwise Alignments:***
-PWMATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-PWGAPOPEN=f :gap opening penalty
-PWGAPEXT=f :gap opening penalty

***Multiple Alignments:***
-NEWTREE= :file for new guide tree
-USETREE= :file for old guide tree
-MATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f :gap opening penalty
-GAPEXT=f :gap extension penalty
-ENDGAPS :no end gap separation pen.
-GAPDIST=n :gap separation pen. range
-NOPGAP :residue-specific gaps off
-NOHGAP :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res.
-MAXDIV=n :% ident. for delay
-TYPE= :PROTEIN or DNA
-TRANSWEIGHT=f :transitions weighting
-ITERATION= :NONE or TREE or ALIGNMENT
-NUMITER=n :maximum number of iterations to perform
-NOWEIGHTS :disable sequence weighting

***Profile Alignments:***
-PROFILE :Merge two alignments by profile alignment
-NEWTREE1= :file for new guide tree for profile1
-NEWTREE2= :file for new guide tree for profile2
-USETREE1= :file for old guide tree for profile1
-USETREE2= :file for old guide tree for profile2

***Sequence to Profile Alignments:***
-SEQUENCES :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE= :file for new guide tree
-USETREE= :file for old guide tree

***Structure Alignments:***
-NOSECSTR1 :do not use secondary structure-gap penalty mask for profile 1
-NOSECSTR2 :do not use secondary structure-gap penalty mask for profile 2
-SECSTROUT=STRUCTURE or MASK or BOTH or NONE :output in alignment file
-HELIXGAP=n :gap penalty for helix core residues
-STRANDGAP=n :gap penalty for strand core residues
-LOOPGAP=n :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal

***Trees:***
-OUTPUTTREE=nj OR phylip OR dist OR nexus
-SEED=n :seed number for bootstraps.
-KIMURA :use Kimura's correction.
-TOSSGAPS :ignore positions with gaps.
-BOOTLABELS=node OR branch :position of bootstrap values in tree display
-CLUSTERING= :NJ or UPGMA
[/plain]

打赏
沙发 | 回复 | 举报 2025-01-03 23:10
我以前是用perl写一个脚本把所有的任务写在shell脚本里面,然后进行批量处理。现在用的比较多的一个软件是mafft。
藤椅 | 回复 | 举报 2015-01-11 20:19
求助大神,请问mafft怎么批量处理呢,只会处理一条额,
板凳 | 回复 | 举报 2025-01-03 23:10
把所有你需要处理的通过perl生成一个shell 文件,例如:mafft 1.famafft 2.fa
马扎 | 回复 | 举报 2025-01-03 23:10
地板 | 回复 | 举报 2025-01-03 23:10
6楼 | 回复 | 举报 2025-01-03 23:10
7楼 | 回复 | 举报 2025-01-03 23:10
8楼 | 回复 | 举报 2025-01-03 23:10
网站首页 | 关于我们 | 联系方式 | 使用协议 | 版权隐私 | 网站地图  |  排名推广  |  广告服务  |  积分换礼  |  网站留言  |  RSS订阅  |  违规举报
 
免责声明:本站有部分内容来自互联网,如无意中侵犯了某个媒体 、公司 、企业或个人等的知识产权,请来电或致函告之,本网站将在规定时间内给予删除等相关处理。