MTC (Multi-Tier Comparer)
User's Manual
NLKRRG
Table of
Contents
Overview
Platform
Definitions
Getting Started
User's Manual
- Control window
- "Set epsilon"
- "Compare files"
- "Set window size"
- "Count bigrams"
- "Count co-occurrences"
- "Reset default values"
- Input file format
Overview
MTC (Multi-Tier Comparer) is a GUI-based tool implemented in Java
meant to assist in the analysis of data sets that have been annotated in
a program such as Praat (http://www.praat.org).
After the labels are extracted to a text file in a specific
format using a Praat script, there are several activities for which MTC
provides interfaces:
- If more than one person annotated a given data set, the "Compare files" function can be used to
graphically compare the two sets of intervals and allow the user to
select one label for each interval in order to come up with a single,
merged file.
- Given a set of labels, the "Count
bigrams" function produces a list of bigrams that appear within a
specified "window" and sorts the list by frequency.
- Given two sets of labels from different tiers of the same data
set, the "Count co-occurrences" function
produces a list of pairs of labels that occur simultaneously and sorts
the list by frequency. The first set of labels is interpreted as a
set of ranges. For example, a dialog might be divided up into
sentences, and each sentence given a label based on its function in the
dialog. The second set of labels is interpreted as a set of
discrete events that happen during those ranges. For example, the
second set of labels might mark the times when a certain word is used.
The "Count co-occurrences" function makes a list that correlates
each event from the second set of labels with the label of what is going
on in the first set of labels.
Table of Contents
Platform
MTC runs on any graphics-enabled platform.
Table of Contents
Definitions
- bigram: A
pair of tokens. For example, given the list:
opening
assert
question
answer, assert
question
answer, assert
closing
the bigrams and their respective frequencies would be:
opening<>assert<>1
assert<>question<>1
question<>answer,
assert<>2
answer,
assert<>question<>1
answer,
assert<>closing<>1
epsilon:
The time difference within which two times are considered to be
equal (e.g. 3.4 and 3.5 are considered equal within an epsilon of 0.1)
The default epsilon value is 0.1.
window size:
Sometimes you want to count not only bigrams in terms of
consecutive tokens, but also tokens that are within a certain window of
each other. For example, given the list:
opening
assert
question
answer, assert
closing
the bigrams and their respective frequencies within
a window of size 3 would be:
opening<>assert<>1
opening<>question<>1
assert<>question<>1
assert<>answer,
assert<>1
question<>answer,
assert<>1
question<>closing<>1
answer,
assert<>closing<>1
Table of Contents
Getting Started
There are several run options included in this distribution:
Windows/DOS:
-run the executable .bat file mtc/bin/dos/runMTC.bat
NOTE: for this option, you must have the entire directory tree
structure of the cd intact
Linux:
-run the script mtc/bin/linux/runMTC
NOTE: for this option, you must have the entire directory tree
structure of the cd intact
Other OS:
If not already installed, download and configure the appropriate Java
Runtime Environment
(JRE) v.1.4 from http://java.sun.com
Run the executable jar file by typing a command such as:
java -jar mtc/bin/MTC.jar
The control window containing various options on buttons should pop up.
Table of Contents
User's Manual
- Control window: This
window is used to start the various tools that are a part of the
program. To exit the program, click the "Quit" button or
close the control window. There is no warning message if you have
not yet written results to a file.
- "Set epsilon": This can
be used to change the epsilon value that is
used when comparing the files. The value entered in the dialog
must be a numerical value, otherwise an error message will be displayed.
- "Compare files": This
will pop up a dialog asking you to select the files
you wish to compare. Select multiple files by holding down the Shift or Ctrl key while clicking on the
files, or type in each file name in quotes, separated by commas.
Click "Open"
to open a new window displaying the comparison table.
Files which did not have an entry for a certain time are labeled
"NO_ENTRY" in that row. The last column of the table shows the
merged set of labels. If all files consistently label a row the
same way, then that label is automatically accepted in the merged
column. If there are different labels, then the cell in that row
will display "Choose:". Click on the cell to access a drop-down
menu of choices. You can then select which column you want to
accept a label from, or select "UNKNOWN".
NOTE: All cells in the last column must be somehow resolved
before writing the results to a file. A warning message will be
displayed when one of the print options are clicked if there are still
unresolved rows (i.e. "Choose:" is still displayed in the final column),
and nothing will be written to the file.
There are two options to write to an output file:
-"Write to normal
text file": writes the last column to a file in a form that could
be accepted as input to this program. (see input
format)
-"Write to
interval tier file": writes the last column to a file in a form
that can be read into Praat as an IntervalTier object (a "short text
file")
In both cases you will be prompted for the file to which you want to
write. If the file already exists, you will be asked to confirm
that you want to over-write the file.
NOTE: Intervals for which "NO_ENTRY" was selected are not written
to the file.
The epsilon value used for that comparison as well as the number of
rows in which differences were found are displayed in the title of the
output window.
- "Set window size": This
can be used to set the window size in which to count bigrams. The
window size must be an integer value greater than or equal to two,
otherwise an error message will be displayed and the window size will
retain its previous value.
- "Count bigrams": This
will pop up a dialog asking you to select one or more input files containing the tokens you wish to
analyze. To select more than one file, hold down the Shift or Ctrl key while clicking on the file,
or type a list of files in, each surrounded by quotes and separated by
commas. The start times will be ignored, only the order will be
taken into account. Click "Open" to make a
list of the bigrams contained in all of the selected files. A new
window will open displaying the results.
The bigrams will be displayed sorted by frequency in the format:
token1<>token2<>count
where count is the number of
occurrences in the file.
The window size used for counting bigrams is displayed in the title bar
of the output window.
To write the results to a file, click on "Write to text file".
You will be prompted for the file to which you want to write.
If the file already exists, you will be asked to confirm that you
want to over-write the file.
- "Count co-occurrences":
This will pop up a dialog asking you to select the first input file, which should be the file to be
interpreted as ranges. Click on "Open", then
another dialog will pop up asking you to select the second input file,
which should be the file to be interpreted as points within those
ranges. Click on "Open" to make
the list of co-occurrences. A new window will open displaying the
results. The pairs will be displayed sorted by frequency in the
format:
token1<>token2<>count
where count is the number of
occurrences in those two files
NOTE: Empty entries (entries without a label) in the second ("point")
file will be ignored. Entries in the first ("range") file will be
considered regardless of their label.
To write the results to a file, click on "Write to text file".
You will be prompted for the file to which you want to write.
If the file already exists, you will be asked to confirm that you
want to over-write the file.
- "Reset default values":
The epsilon value and window size are both reset to their default
values (epsilon = 0.1, window size = 2).
- Input file format: Must be of
the form
startTime1 %label1%
startTime2 %label2%
For example:
0 %%
1.234 %greeting%
3.456 %info-request%
Labels can contain spaces and punctuation except for the % character.
I modified a Praat script by Miette Lennes to obtain input files
directly from Praat TextGrids. A copy of this modified script is
in the file mtc/src/saveLabels.praat
It is assumed that the end-time of an interval is the same as the
start-time of the following interval. The end-time of the last
interval in a file is
set at 0.1 units after its start-time (*This should probably be
changed*)
Table of Contents
Last Modified: 06 August, 2003