Princeton University
Lewis-Sigler Institute for Integrative Genomics

This collection of tools is a subset of the tools, applications, and libraries developed within the Center for Quantitative Biology at Princeton University's Lewis-Sigler Institute. They are provided here within a common framework, called the CQB Integrated Tools, which allows for data to be processed and passed between them easily. The Integrated Tools framework also supports batch processing for multiple data sets natively within each tool and utilizes the local computing grid for high throughput.

This set is not limited to tools within the Center. We are anticipating adding several more tools from within the Center as well as other widely utilized biological applications from external sources.

The tools currently represented within this framework are listed below. A comprehensive list of tools provided by the Center can be found on the Center for Quantitative Biology's Tools and Resources page.


The tools in this category have to do with clustering data or handling the files that are associated with clustering.


Iclust is an information-theoretic, model-independent clustering application. It can be applied to many different kinds of data. It typically starts by producing a mutual information pairwise relations matrix based on the input data. Iclust then uses this matrix to group the input data into separate clusters. The theory along with several examples are described in Slonim et al., PNAS, 2005. More...

The standalone interface is also on this site.


KNNImputer takes a PCL file, imputes missing values, and saves the result as a new PCL file. It does this by examining the nearest neighbors (the number of which is adjustable) with one of several different distance measures. Genes with more than 30% (by default) missing data will be deleted rather than imputed. See KNNImpute's home page and Troyanskaya et al., Bioinformatics, 17:520-5, 2001. KNNImpute is available as part of the Sleipnir library.


This tool takes a CDT file and associated GTR file, then traverses the tree and prunes it where it finds the correlation exceeds the given threshold. It then outputs the contents of the pruned parts of the tree. The output in partition file format is a single file listing each identifier along with a partition number corresponding to the group it belongs to. This file is suitable for use with FIRE and is similar to that produced by Iclust. The node file format outputs two files for each group, one with a list of the identifiers in the group, and one that is a CDT file containing those members. More...

Motif discovery

The tools in this category have to do with motif discovery and characterization.


This Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction (COALESCE) can use large collections of genomic data and Bayesian integration to predict coregulated gene modules, the conditions of regulation, and the consensus binding motifs for regulation. It uses a synthesis of gene expression biclustering, motif prediction, and data integration (including expression, sequence, nucleosome positioning, and evolutionary conservation). Input data is PCL format. This tool is available with the Sleipnir library.

The standalone interface is also on this site.


The tools in this category search for annotations to enrich the input data or deal with annotation data.


The GAF Viewer displays the data within annotation files, such as those provided by the GO consortium, and other formats. It produces two types of tables - one showing all the identifiers and what ontology terms they are directly annotated to, and one showing all the referenced ontology terms and what identifiers are annotated to them (both directly and indirectly). The tables also include the organism (by taxon ID), evidences, references, and alternate identifiers (synonyms). In addition, a DAG can be produced showing the structure of the ontology and what identifiers are directly annotated to them.

Data conversion

The simple tools in this category can be used for basic conversions and filtering operations - altering the format, converting data types, and so on.

Map identifiers

This tool takes a delimited text file and maps identifiers in selected columns from one type to another, for example from Agilent IDs to Yeast ORFs. More...

Data matrix extractor

DME takes a delimited text file, such as a PCL file, and extracts the embedded matrix. In the case of PCL files, for example, the matrix is the experiment data. This is done by locating the largest body of numeric data within the text, and in known cases excluding certain areas (GWEIGHT columns and EWEIGHT rows, for example). More...

This image illustrates extracting a matrix from a PCL file.

Delimited text converter

This tool takes a delimited text file (where each line has fields separated by a special character or characters) and performs some basic conversions. One common conversion is the changing of the delimiter, for example from a tab to a space. Other common conversions are removing blank space at the beginning or end of the lines, replacing missing (empty) fields, removing blank lines, and so on. More...


This category contains tools that help to enable visualization of data.

Heatmap generator

Given a file containing a numeric matrix, a heatmap is generated. Each element in the heatmap is colored depending on the magnitude of the corresponding matrix element relative to the "center" value (usually the mean). There are several options to control the coloring. More...