Functional classification of proteins is central to comparative genomics. However, proteins
can be catalogued in multiple ways according to their functional roles in the cell for quantitative
comparisons. The need for algorithms tuned to enable integrative interpretation of analytical data is
felt globally. The availability of a general, automated software with built-in flexibility will
significantly aid this activity.
We have prepared ARC (Automated Resource Classifier), which is an open source software
meeting the user requirements of flexibility. The default classification scheme follows a simple
keyword match-based approach using annotation texts. This scheme is agglomerative and directs
entries into any of the 7 basic non-overlapping functional classes:
- Metabolism (M)
- Information (I)
- Signal and communication(S)
- Cell division (D)
- Cell wall and membrane(C)
- Stress(R)
- Translocation (L)
and 2 ancillary classes:
- Others (O)
- Hypothetical (H)
The keyword library of ARC was built serially by first drawing keywords from Bacillus subtilis and Escherichia
coli K12 , whose sequences have been carefully annotated by large consortiums. In subsequent steps,
this library was further enriched by collecting terms from archaeal representative Archaeoglobus
fulgidus, and the Gene Ontology (GO) terms of Cellular Component and of Molecular Function and
synonyms of Gene Symbols from UniProt. ARC is 94.04% successful on 6,75,663 annotated
proteins from 348 prokaryotes. Of the 7898 GO terms occurring in prokaryotic annotations, only 10
confounding entries were found. ARC Web interface can handle data with varied structures
including simple text files, microarray data and other user specified forms. Examples using data
from literature pointed towards an important role for secreted proteins in mycobacterial physiology.
These applications allow rapid inference of the experimental results for comparative analysis from
an integrative perspective.
ARC is an automated, flexible software for agglomerative classification of prokaryotic proteins to
enable integrative interpretation of prokaryotic genomic data. User friendly features and open source
format allow customization by users. Examples of application of ARC have served to illuminate the
current perspectives on mycobacterial physiology and costs of proteins in 333 prokaryotes.
The Web interface of ARC offers the capability to handle data varied structures such as simple text files, microarray data and other forms of user specified data. These applications allow quick and easy inference of the experimental results for comparative analysis from an integrative perspective. ARC also offers the flexibility for modifying keyword libraries by users to suit their specific requirements for functional classification. For this purpose, the keyword library is structured in a user friendly format to enable rapid customization.
ARC is a flexible, automated functional classification software with Web interface for rapid analysis of genomic scale data for prokaryotes.
|