PruneTree

Contents


Background

This little script takes clustered files (a CDT file and an associated GTR file) and clips the tree at the given correlation. Each node beyond the cut (each node with a higher correlation) is then output.

Input format

There are two input modes. In individual file mode, you may provide or select both the CDT file and the GTR file independently. In batch mode, you select the GTR file only (or the set of GTR files). The CDT file name is derived from the GTR file name.

The CDT and GTR type files are well known. For more information, please see this document produced by SMD.

Setting the parameters

This program requires the correlation at which to prune the tree, the output formats desired, and the minimum size of nodes. The output formats are explained below. For the minimum node size, output is generated only for nodes that have (at least) the specified number of identifiers.

Output

One form of output, the partition file, is a single file containing one line for each identifier in the clipped nodes followed by a number shared by each other identifier in the same node. Each node is given a unique number. In another form of output, the node files option, two files for each node are produced. One contains the list of the identifiers in the node, the other is a CDT file containing those members.

Note that, if this option is selected, a great many node files may be produced.

Run time and complexity issues

This program has very little overhead.

Download source code

The perl script is freely available. Please check under Downloads for the current version.

Credits

The original script was written by Gavin Sherlock. It has been modified somewhat for local requirements.