MISA,英文全称为MIcroSAtellite identification tool,即微卫星识别工具。虽然前段时间看到PLoB上已经有一篇文章《利用MISA鉴定简单重复序列(SSR)》来介绍MISA脚本的使用,今天在陈连福的博客上发现也有一篇介绍MISA的文章,所以转载过来,与大家一起分享。
MISA是使用 perl 编写的一支程序,能识别出序列中的微卫星和复合微卫星(两个微卫星之间由由不多于100bp的碱基对隔开),并给出其所在位点。
MISA下载网址:http://pgrc.ipk-gatersleben.de/misa/misa.html
MISA用法:
12 $ misa.pl filenamemisa.pl <FASTAfile>
其中,fastfile是序列文件,同时在运行程序的工作目录下必须有一个名称为“misa.ini”的文件。该文件内容为:
12 definition(unit_size,min_repeats): 1-10 2-6 3-5 4-5 5-5 6-5interruptions(max_difference_for_2_SSRs): 100
该文件指定了misa的参数,即1个碱基重复10次及10次以上;2个碱基重复6次及6 次以上; 3个碱基重复5次及5次以上;4个碱基重复5次及5次以上;5个碱基重复5 次及5次以上;6碱 基重复5次及5次以上,这样的碱基重复序列才算是微卫星序列。 同时,两个微卫星之间的距 离小于100bp的时候,两个微卫星组成一个复合微卫星。
MISA的输出结果:
MISA会在 Fastafile 所在的文件夹下生成两个文件,分别是 “<FASTfile>.misa” 和 “<FASTfile>.statistics”
12 "<FASTfile>.misa" :以表格的形式列出微卫星的类型和位点;"<FASTfile>.statistics" :统计微卫星的类型和频数。
在MISA的下载页面中,提供了3个附加的 perl 脚本,分别是:Get_est_trimmer.pl,p3_in.pl 和 p3_out.pl。
Get_est_trimmer.pl
针对EST序列,可以除去EST序列中短的序列和两端不明确的碱基。
p3_in.pl
输入 misa.pl 的输出结果,将引物设计的参数文件(模板,产物长度,目标区域等)导入到一个以“p3in”为后缀的文件中。
1 $ p3_in.pl filename.misa
调用 primer3_core
该软件详细解说见:《primer3引物设计详解》,生成结果文件 filename.p3in。
1 $ primer3_core -default_version=1 -output=filename.p3out filename.p3in
p3_out.pl
对primer3产生的文件进行提取合,得到最后的结果文件 filename.result
1 $ p3_out.pl filename.p3out filename.misa
p2_in.pl 和 p3_out.pl 这两支程序需要修改才能正常使用。
p2_in.pl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | #!/usr/bin/perl use strict; open (IN,"<$ARGV[0]") || die ("\nError: Couldn't open misa.pl results file (*.misa) !\n\n"); my $filename = $ARGV[0]; $filename =~ s/\.misa//; open (SRC,"<$filename") || die ("\nError: Couldn't open source file containing original FASTA sequences !\n\n"); open (OUT,">$filename.p3in"); my $in = join "", <IN>; study $in; #print "$in\n"; my $count; while (<SRC>) { my $id = $_; $id =~ s/^>(\S*)/$1/; $id =~ s/\s+/_/; $id =~ s/\n//; #print "$id"; my $seq = <SRC>; $seq =~ s/[\d\s>]//g;#remove digits, spaces, line breaks,... #print "$seq\n"; while ($in =~ /$id\t(\d+)\t\S+\t\S+\t(\d+)\t(\d+)/g) { my ($ssr_nr,$size,$start) = ($1,$2,$3); $count++; print OUT "SEQUENCE_ID=$id"."_$ssr_nr\nSEQUENCE_TEMPLATE=$seq\n"; print OUT "PRIMER_PRODUCT_SIZE_RANGE=100-280\n"; print OUT "SEQUENCE_TARGET=",$start-3,",",$size+6,"\n"; print OUT "PRIMER_MAX_END_STABILITY=250\n=\n" }; }; print "\n$count records created.\n"; |
p3_out.pl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | #!/usr/bin/perl use strict; open P3OUT, '<', "$ARGV[0]" or die ("\nError: Couldn't open Primer3 results file (*.p3out) !\n\n"); my $filename = $ARGV[0]; $filename =~ s/\.p3out//; open IN, '<', "$ARGV[1]" or die ("nError: Couldn't create file !\n\n"); open OUT, '>', "$filename.results" or die ("nError: Couldn't create file !\n\n"); my $count = 0; my $count_failed = 0; print OUT "ID\tSSR nr.\tSSR type\tSSR\tsize\tstart\tend\t"; print OUT "FORWARD PRIMER1 (5'-3')\tTm(°C)\tsize\tREVERSE PRIMER1 (5'-3')\tTm(°C)\tsize\tPRODUCT1 size (bp)\tstart (bp)\tend (bp)\t"; print OUT "FORWARD PRIMER2 (5'-3')\tTm(°C)\tsize\tREVERSE PRIMER2 (5'-3')\tTm(°C)\tsize\tPRODUCT2 size (bp)\tstart (bp)\tend (bp)\t"; print OUT "FORWARD PRIMER3 (5'-3')\tTm(°C)\tsize\tREVERSE PRIMER3 (5'-3')\tTm(°C)\tsize\tPRODUCT3 size (bp)\tstart (bp)\tend (bp)\n"; my $in = join "", <IN>; my $p3out = join "", <P3OUT>; close IN; close P3OUT; study $in; my @p3out = split /=\n/, $p3out; #print $in; #print @p3out; foreach ( @p3out ) { my ($id,$ssr_nr) = (/SEQUENCE_ID=(\S+)_(\d+)\s*?\n/); # print "$id\n$ssr_nr\n" $in =~ /($id\t$ssr_nr\t.*)\n/; my $misa = $1; # print "$misa\n"; /PRIMER_LEFT_0_SEQUENCE=(.*)\n/ || do {$count_failed++;print OUT "$misa\n"; next}; my $info = "$1\t"; /PRIMER_LEFT_0_TM=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_0=\d+,(\d+)/; $info .= "$1\t"; /PRIMER_RIGHT_0_SEQUENCE=(.*)/;$info .= "$1\t"; /PRIMER_RIGHT_0_TM=(.*)/; $info .= "$1\t"; /PRIMER_RIGHT_0=\d+,(\d+)/; $info .= "$1\t"; # print "$info\n"; /PRIMER_PAIR_0_PRODUCT_SIZE=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_0=(\d+),\d+/; $info .= "$1\t"; /PRIMER_RIGHT_0=(\d+),\d+/; $info .= "$1\t"; /PRIMER_LEFT_1_SEQUENCE=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_1_TM=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_1=\d+,(\d+)/; $info .= "$1\t"; /PRIMER_RIGHT_1_SEQUENCE=(.*)/;$info .= "$1\t"; /PRIMER_RIGHT_1_TM=(.*)/; $info .= "$1\t"; /PRIMER_RIGHT_1=\d+,(\d+)/; $info .= "$1\t"; /PRIMER_PAIR_1_PRODUCT_SIZE=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_1=(\d+),\d+/; $info .= "$1\t"; /PRIMER_RIGHT_1=(\d+),\d+/; $info .= "$1\t"; /PRIMER_LEFT_2_SEQUENCE=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_2_TM=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_2=\d+,(\d+)/; $info .= "$1\t"; /PRIMER_RIGHT_2_SEQUENCE=(.*)/;$info .= "$1\t"; /PRIMER_RIGHT_2_TM=(.*)/; $info .= "$1\t"; /PRIMER_RIGHT_2=\d+,(\d+)/; $info .= "$1\t"; /PRIMER_PAIR_2_PRODUCT_SIZE=(.*)/; $info .= "$1\t"; /PRIMER_LEFT_2=(\d+),\d+/; $info .= "$1\t"; /PRIMER_RIGHT_2=(\d+),\d+/; $info .= "$1"; $count++; print OUT "$misa\t$info\n" } print "\nPrimer modelling was successful for $count sequences.\n"; print "Primer modelling failed for $count_failed sequences.\n"; |
本文来自陈连福的博客:http://www.hzaumycology.com/chenlianfu_blog/?p=255