About ARC

Functional classification of proteins is central to comparative genomics. However, proteins can be catalogued in multiple ways according to their functional roles in the cell for quantitative comparisons. The need for algorithms tuned to enable integrative interpretation of analytical data is felt globally. The availability of a general, automated software with built-in flexibility will significantly aid this activity.

We have prepared ARC (Automated Resource Classifier), which is an open source software meeting the user requirements of flexibility. The default classification scheme follows a simple keyword match-based approach using annotation texts. This scheme is agglomerative and directs entries into any of the 7 basic non-overlapping functional classes:

  • Metabolism (M)
  • Information (I)
  • Signal and communication(S)
  • Cell division (D)
  • Cell wall and membrane(C)
  • Stress(R)
  • Translocation (L)

and 2 ancillary classes:

  • Others (O)
  • Hypothetical (H)

The keyword library of ARC was built serially by first drawing keywords from Bacillus subtilis and Escherichia coli K12 , whose sequences have been carefully annotated by large consortiums. In subsequent steps, this library was further enriched by collecting terms from archaeal representative Archaeoglobus fulgidus, and the Gene Ontology (GO) terms of Cellular Component and of Molecular Function and synonyms of Gene Symbols from UniProt. ARC is 94.04% successful on 6,75,663 annotated proteins from 348 prokaryotes. Of the 7898 GO terms occurring in prokaryotic annotations, only 10 confounding entries were found. ARC Web interface can handle data with varied structures including simple text files, microarray data and other user specified forms. Examples using data from literature pointed towards an important role for secreted proteins in mycobacterial physiology. These applications allow rapid inference of the experimental results for comparative analysis from an integrative perspective.

ARC is an automated, flexible software for agglomerative classification of prokaryotic proteins to enable integrative interpretation of prokaryotic genomic data. User friendly features and open source format allow customization by users. Examples of application of ARC have served to illuminate the current perspectives on mycobacterial physiology and costs of proteins in 333 prokaryotes.

The Web interface of ARC offers the capability to handle data varied structures such as simple text files, microarray data and other forms of user specified data. These applications allow quick and easy inference of the experimental results for comparative analysis from an integrative perspective. ARC also offers the flexibility for modifying keyword libraries by users to suit their specific requirements for functional classification. For this purpose, the keyword library is structured in a user friendly format to enable rapid customization.

ARC is a flexible, automated functional classification software with Web interface for rapid analysis of genomic scale data for prokaryotes.