楼主 | 收藏 | 举报 2018-09-30 00:00 浏览:146 回复:9

interproscan安装及详细设置

详细的安装及配置步骤请参见iprscan的文档Installing_InterProScan.txt

1.在安装iprscan之前，你的linux系统需要如下基本组件：

PERL，这个常见的linux发行版都已经有了。
光有perl不够，你得有iprscan依赖的一些perl模块：
* CGI
* DB_File.pm - the interface to Berkeley DB
* English
* File::Basename
* File::Copy
* File::Path
* File::Spec::Functions
* FileHandle
* IO::Scalar
* IO::String
* Mail::Send
* Sys::Hostname
* URI::Escape
* XML::Parser.pm - libexpat (1.95.5 or newer) which is needed for the new
implementation of BlastProDom and to parse xml outputs.
* XML::Quote

安装这些perl模块很简单，我都是用CPAN直接装的，有些模块CPAN装不了(test过不了，冲突等等），手动安装，并把模块放到系统路径中。

2.各类模块装好后，下载iprscan的源码或可执行程序与及相关数据库

下载链接：ftp://ftp.ebi.ac.uk/pub/databases/interpro/iprscan
需要下载的文件及路径：
* RELEASE/latest/iprscan_v4.8.tar.gz - InterProScan itself
* BIN/4.x/iprscan_bin4.x_[PLATFORM].tar.gz - Binaries for the various platforms
* DATA/iprscan_DATA_[LATESTDATAVERSION].tar.gz - databases used by InterProScan (except Panther)
* DATA/iprscan_PTHR_DATA_[LATESTDATAVERSION].tar.gz - panther database and indexes
* DATA/iprscan_MATCH_DATA_[LATESTDATAVERSION].tar.gz - match_complete.xml and interpro.xml files

3.解压缩各类文件

将以上各个压缩包拷到你要安装iprscan的目录下，解压缩各个文件
gunzip -c iprscan_DATA_[LATEST_VERSION].tar.gz | tar -xvf -
gunzip -c iprscan_PTHR_DATA_[LATESTDATAVERSION].tar.gz | tar -xvf -
gunzip -c iprscan_MATCH_DATA_[LATESTDATAVERSION].tar.gz | tar -xvf -
gunzip -c iprscan_bin4.x_[PLATFORM].tar.gz | tar -xvf -
gunzip -c iprscan_v4.8.tar.gz | tar -xvf -

到这里，interproscan就算安装完了，很简单，但是安装完了不代表就可以直接使用了，麻烦的是后面的配置。

4.配置iprscan

其实配置也没有那么困难，在iprscan目录下有个config.pl脚本，直接用它就可以进行配置了。

4.1
运行perl Config.pl
在运行过程中会跳出很多选项，按照自己系统的情况选择就行了。
4.2
Reconfigure everything (first time install)? (y|n)
如果你之前已经安装配置过iprscan，现在需要调整一些配置，就选n,如果是刚装了iprscan，就选y
Do you want to set paths? (y|n):
Enter the full path for the InterProScan installation?:
以上是告知你的iprscan所在的位置，可选可不选
Do you want to set another one Perl command in place of [default_PERL]?
一般来说是否，perl嘛除非你机器上有好几个perl版本
4.3
Setup chunk size? (y|n)
Enter chunk size (default 10):
这个设置比较重要，如果你的系统是多线程多CPU的，你可以选择y,单线程的就没必要了。
选择yes之后，你需要设置chunk的大小，所谓chunk，就是把你的fasta文件分割成很多个小文件，这个数值10表示你希望每一个fasta文件内只包含10条序列。假设你有100条蛋白需要扫描，设置为10，就会把你的原始fasta文件分割为10个，每个文件会单独被提交给iprscan进行扫描，假设你有16个CPU，这时10个文件就能同时被提交给系统进行扫描，速度加快了很多。
但是，在基因组层级的注释中，往往是几万个蛋白序列，这个初始值10就太小了，因为你的文件将会被分割成好几千的小块。就算是16CPU的系统，速度也是很慢的，因为分割成小块单独扫描的结果最后还是要合并在一起。
我的超算有900个CPU，不过资源比较紧张，经常只有200个左右CPU能抢到，所以我一般设了200，一个4万个蛋白的文件就分割成200个。
如果是小机器，双核4线程之类，这个值设的越高越好，除非你的数据非常少。

4.4 下面一堆设置无关紧要，除非有特殊需求
Do you want to configure it? (y|n)
Enter the maximum of input protein sequences (default = 1000):
我一般设成1000000，要不然超算的优势就没了。
Enter the maximum of input nucleic sequences (default = 100):
我一般设成50，我觉得15~20个氨基酸的短domain还是有不少的
Enter the maximum length (nucleic acids) for a nucleotide sequence (default = 10000):
Enter the minimum length (amino acids) for a protein sequence (default = 5):
Enter the minimum length for the length of a translated orf (default = 50):
Enter the default codon table value to use to translate dna/rna in six frames (default = 0):

下面的设置主要跟超算有关
Do you want to setup applications (if you don't, no applications will be included in InterProScan by default)?(y/n)
Do you wish to use a queue system? (y|n) 你有超算就yes
Do you want to use sge6/lsf42/pbs (y|n) 超算队列管理系统，我的是lsf
Do you want to set a global cluster name for all applications? (Default = 'n') (y|n): 我们超算没啥特殊名字，no
Do you want to set a global queue name for all applications? (Default = 'n') (y|n): 超算的队列名
Is all this information correct? (y|n) yes

4.5 最后一步，设置你需要使用的数据库
基本上跟上面一样，就是各类数据库你要用还是不用的问题了。

4.6 后面的步骤不需要说了，设置完成后，config.pl会帮你完成所有的配置过程，还是需要一段时间的。

昨天虽然run了iprscan，但是还是存在问题-里面的profilescan模块一直是运行错误的，检查了log文件后发现这个模块需要libg2c.so文件，libg2c.so是GCC 4版本以前的fotran编译器需要使用的一个库，现在很多新的linux发行版本都不带这个库了。这样，我只能再装一个GCC 3.4, 编译好后把它的lib放到LD_LIBRARY_PATH系统变量中，这个问题就解决了。

打赏

***1bbsunchen

沙发 | 回复 | 举报 2011-12-30 16:36

超算的队列名是什么意思？如果你是lsf，就输入lsf？

***2ybzhao

藤椅 | 回复 | 举报 2025-02-28 09:58

不同的集群上安装的队列系统不一样，有一部分装的是pbs，或lsf等。

***1boya888

板凳 | 回复 | 举报 2012-06-18 21:43

有网友讲本地跑interproscan非常之慢，建议用其提供的interproscan.pl在线运行，默认参数下，大概40－50秒返回一个结果！请问interproscan.pl在哪儿下载？难道是要先安装？The main script is the one called “iprscan” in the bin/ directory. It acts as both a command-line script (when the -cli option is used) and as the CGI script if and when a user has installed the web interface to InterProScan.

***请教，下面这个群魔乱舞的结果是什么意思？该怎么分析？[

马扎 | 回复 | 举报 2025-02-28 09:58

***1boya888

地板 | 回复 | 举报 2012-06-18 22:00

PANTHER与InterProScan绑定！ftp://ftp.ebi.ac.uk/pub/databases/interpro/iprscan/5/data/panther-data-7.0.tar.gz [4745537 KB]The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. hmmsearch from the HMM2.3.2 package and blastall from the Blast package.

***PANTHERis a large collectio

6楼 | 回复 | 举报 2025-02-28 09:58

***1boya888

7楼 | 回复 | 举报 2012-06-19 18:32

interproscan.pl找到啦！http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_restRepresentational state transfer (REST): a software architecture style.Clientshttp://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan_soapSimple Object Access Protocol (SOAP): a messaging protocol for transporting information.Clients

***请班主删除重盖的4楼！为什么发言不能编辑？ :oops

8楼 | 回复 | 举报 2025-02-28 09:58

***2zsl

9楼 | 回复 | 举报 2017-04-16 21:47

你好，具体是怎么做的。我本地配置卡住一直不动