Data matrix extractor (DME)


Input format

The input file should be a delimited text file. Usually the delimiters are tabs or spaces, but others may be specified within the form.

Within the file, there should be a body of numerical data occupying some number of consecutive rows and columns, with no other text in between. (Blank fields and the string 'NaN' are ok.) This data may be surrounded by rows and columns containing other information. The extractor locates and outputs this body of data.

For some known file formats, rows and columns that may otherwise appear to be part of an embedded matrix are excluded as necessary. For example, EWEIGHT rows and GWEIGHT columns in CDT files, which would appear to be part of the data, are ignored.

Setting the parameters

Specifying the delimiter
The delimiter may be specified, if it is not already detected, by selecting the appropriate option. In addition to the supplied options, you may specify the delimiter in the text box. The special keywords "tab", "space", and "spaces" can be used.

Treat all consecutive delimiters as one
Although rare, in addition to specifying the actual delimiter, in some cases it may be necessary to treat multiple, consecutive delimiters as a single delimiter. An example is a case where any number of tabs have been used between fields that also contain spaces, and it is not desirable to treat everything between adjacent tabs as empty fields.

Replace missing fields
If there are empty fields, it is often useful to replace them with something. Iclust, for example, expects missing data to be represented by 'NaN'. This is the standard for this system.

Override matrix bounds
This option is available in case the data matrix is not correctly detected or it is desirable to extract a smaller section of the whole.

If this option is checked, the matrix is transposed before it is output.


The output is a tab-delimited numeric matrix (and whatever may have been used to replace missing fields).