MTC User's Manual

MTC (Multi-Tier Comparer)

User's Manual

NLKRRG

Overview
Platform
Definitions
Getting Started
User's Manual

Control window
"Set epsilon"
"Compare files"
"Set window size"
"Count bigrams"
"Count co-occurrences"
"Reset default values"
Input file format

Overview

MTC (Multi-Tier Comparer) is a GUI-based tool implemented in Java meant to assist in the analysis of data sets that have been annotated in a program such as Praat (http://www.praat.org). After the labels are extracted to a text file in a specific format using a Praat script, there are several activities for which MTC provides interfaces:

If more than one person annotated a given data set, the "Compare files" function can be used to graphically compare the two sets of intervals and allow the user to select one label for each interval in order to come up with a single, merged file.
Given a set of labels, the "Count bigrams" function produces a list of bigrams that appear within a specified "window" and sorts the list by frequency.
Given two sets of labels from different tiers of the same data set, the "Count co-occurrences" function produces a list of pairs of labels that occur simultaneously and sorts the list by frequency. The first set of labels is interpreted as a set of ranges. For example, a dialog might be divided up into sentences, and each sentence given a label based on its function in the dialog. The second set of labels is interpreted as a set of discrete events that happen during those ranges. For example, the second set of labels might mark the times when a certain word is used. The "Count co-occurrences" function makes a list that correlates each event from the second set of labels with the label of what is going on in the first set of labels.

Table of Contents

Platform

MTC runs on any graphics-enabled platform.

Table of Contents

Definitions

bigram: A pair of tokens. For example, given the list: opening assert question answer, assert question answer, assert closing the bigrams and their respective frequencies would be: opening<>assert<>1 assert<>question<>1 question<>answer, assert<>2 answer, assert<>question<>1 answer, assert<>closing<>1 epsilon: The time difference within which two times are considered to be equal (e.g. 3.4 and 3.5 are considered equal within an epsilon of 0.1) The default epsilon value is 0.1. window size: Sometimes you want to count not only bigrams in terms of consecutive tokens, but also tokens that are within a certain window of each other. For example, given the list: opening assert question answer, assert closing the bigrams and their respective frequencies within a window of size 3 would be: opening<>assert<>1 opening<>question<>1 assert<>question<>1 assert<>answer, assert<>1 question<>answer, assert<>1 question<>closing<>1 answer, assert<>closing<>1 Table of Contents

Getting Started

There are several run options included in this distribution:

Windows/DOS:
-run the executable .bat file mtc/bin/dos/runMTC.bat
NOTE: for this option, you must have the entire directory tree structure of the cd intact

Linux:
-run the script mtc/bin/linux/runMTC
NOTE: for this option, you must have the entire directory tree structure of the cd intact

Other OS:
If not already installed, download and configure the appropriate Java Runtime Environment
(JRE) v.1.4 from http://java.sun.com
Run the executable jar file by typing a command such as:
java -jar mtc/bin/MTC.jar

The control window containing various options on buttons should pop up.

Table of Contents

User's Manual

Control window: This window is used to start the various tools that are a part of the program. To exit the program, click the "Quit" button or close the control window. There is no warning message if you have not yet written results to a file.
"Set epsilon": This can be used to change the epsilon value that is used when comparing the files. The value entered in the dialog must be a numerical value, otherwise an error message will be displayed.
"Compare files": This will pop up a dialog asking you to select the files you wish to compare. Select multiple files by holding down the Shift or Ctrl key while clicking on the files, or type in each file name in quotes, separated by commas. Click "Open" to open a new window displaying the comparison table.
Files which did not have an entry for a certain time are labeled "NO_ENTRY" in that row. The last column of the table shows the merged set of labels. If all files consistently label a row the same way, then that label is automatically accepted in the merged column. If there are different labels, then the cell in that row will display "Choose:". Click on the cell to access a drop-down menu of choices. You can then select which column you want to accept a label from, or select "UNKNOWN".

NOTE: All cells in the last column must be somehow resolved before writing the results to a file. A warning message will be displayed when one of the print options are clicked if there are still unresolved rows (i.e. "Choose:" is still displayed in the final column), and nothing will be written to the file.

There are two options to write to an output file:
-"Write to normal text file": writes the last column to a file in a form that could be accepted as input to this program. (see input format)
-"Write to interval tier file": writes the last column to a file in a form that can be read into Praat as an IntervalTier object (a "short text file")
In both cases you will be prompted for the file to which you want to write. If the file already exists, you will be asked to confirm that you want to over-write the file.

NOTE: Intervals for which "NO_ENTRY" was selected are not written to the file.

The epsilon value used for that comparison as well as the number of rows in which differences were found are displayed in the title of the output window.
"Set window size": This can be used to set the window size in which to count bigrams. The window size must be an integer value greater than or equal to two, otherwise an error message will be displayed and the window size will retain its previous value.
"Count bigrams": This will pop up a dialog asking you to select one or more input files containing the tokens you wish to analyze. To select more than one file, hold down the Shift or Ctrl key while clicking on the file, or type a list of files in, each surrounded by quotes and separated by commas. The start times will be ignored, only the order will be taken into account. Click "Open" to make a list of the bigrams contained in all of the selected files. A new window will open displaying the results.
The bigrams will be displayed sorted by frequency in the format:
token1<>token2<>count
where count is the number of occurrences in the file.
The window size used for counting bigrams is displayed in the title bar of the output window.
To write the results to a file, click on "Write to text file". You will be prompted for the file to which you want to write. If the file already exists, you will be asked to confirm that you want to over-write the file.
"Count co-occurrences": This will pop up a dialog asking you to select the first input file, which should be the file to be interpreted as ranges. Click on "Open", then another dialog will pop up asking you to select the second input file, which should be the file to be interpreted as points within those ranges. Click on "Open" to make the list of co-occurrences. A new window will open displaying the results. The pairs will be displayed sorted by frequency in the format:
token1<>token2<>count
where count is the number of occurrences in those two files

NOTE: Empty entries (entries without a label) in the second ("point") file will be ignored. Entries in the first ("range") file will be considered regardless of their label.

To write the results to a file, click on "Write to text file". You will be prompted for the file to which you want to write. If the file already exists, you will be asked to confirm that you want to over-write the file.
"Reset default values": The epsilon value and window size are both reset to their default values (epsilon = 0.1, window size = 2).
Input file format: Must be of the form

        startTime1 %label1%
        startTime2 %label2%

For example:
        0 %%
        1.234 %greeting%
        3.456 %info-request%

Labels can contain spaces and punctuation except for the % character.

I modified a Praat script by Miette Lennes to obtain input files directly from Praat TextGrids. A copy of this modified script is in the file mtc/src/saveLabels.praat

It is assumed that the end-time of an interval is the same as the start-time of the following interval. The end-time of the last interval in a file is
set at 0.1 units after its start-time (*This should probably be changed*)

Table of Contents

Last Modified: 06 August, 2003