home
about me
About my mentor
My project
My weekly journal

Detailed Specifications




Table of Contents:

i. Overall Function
ii. Expected users
iii. Environment


1. Functionality for the users
1.1. Log in screen
1.2. Demographic information
1.3. Main Menu Screen
1.4. Word alignments
1.4.1. Methods of representation
1.4.1.1. Line alignment scheme
1.4.1.2. Grid scheme
1.4.2. Linking words and phrases
1.4.3. Linking to Not Translated
1.4.4. Unlinking words and phrases
1.4.5. Ability to Clear
1.4.6. Previous and Next Buttons
1.4.7. Return to Main Menu Button
1.4.8. Comments Text Area
1.4.9. Timer
1.4.10. Help features
1.4.11. Possible future features
1.5. Finished screen
2. Functionality for the administrators
2.1. Log in screen
2.2. Managing users
2.2.1. Create user or administrator
2.2.2. Modify user settings
2.2.3. Delete user
2.3. Managing groups
2.3.1. Create groups
2.3.2. Modify group settings
2.3.3. Delete groups
2.4. Managing sentences
2.4.1. Uploading files
2.4.2. Preprocessing files
2.5. Statistics
2.5.1. By individual user
2.5.2. By group
2.5.3. Exporting
3. Internal Representation
3.1. User files
3.1.1. 2 Dimensional Array for each sentence
3.1.2. Demographic information
3.1.3. Time information
4. Data analysis and manipulation



i. Overall Function
This program will be an experimental test bed used to obtain data from translators or bilingual speakers of English and Chinese to enter information about word alignments between a sentence and its translation. This information will be stored and viewable by administrators in such a way as to be useful to future endeavors.

ii. Expected Users
The people entering the word alignment information (who will be refered to as a “user” or “users” in the rest of this document) will be translators or bilingual speakers of English and Chinese (or other languages as needed). The design of this program will aim to make their part of the program as intuitive and easy to use as possible, since it is likely that they do not have extensive computer experience. The people setting up this experiment (who will be refered to as an “administrator” or “administrators” in the rest of this document) will be people such as Dr. Rebecca Hwa who are studying how having annotated word alignments can affect machine translation or machine learning and other information from the user-provided alignments. The administrators will have more computer experience, but their side of the program should still be easy to use and configure to obtain whatever information is needed.

iii. Environment
This program will be a JAVA applet and will be accessible to anyone on the internet with the capability to use JAVA applets and display the needed font (Chinese, Arabic, etc).



1. Functionality for the users

1.1. Log in screen
The first thing a user will do is enter a username and password given to them by an administrator. The security of these usernames and passwords will likely be trivial for now, as the main purpose of logging in is to save information for each user separately and allow them to save and return to their own work.

1.2. Demographic information
The first time a user logs into the program he will be asked to enter some demographic information such as how proficient he is at both languages, how old he is, and anything else that might be helpful to the researchers. This information will be saved in the user's file and the user will not have to enter this information again.

1.3 Main Menu Screen
This screen will list all the sentences the user needs to complete. They will be marked to indicate whether they are complete or incomplete. The user will be able to select a sentence from the list to align.

1.4. Word alignments
This screen is the main part of the test bed. It has many components to allow the user to easily enter word alignment information.

1.4.1. Methods of Representation
Different displays of the alignments will be available, and the choice between these two will be made for now by the administrator. In the future, if there is no universal advantage or disadvantage to one or the other, the user may have the ability to choose which representation is easier for them personally to use. Both methods will display the sentence and its translation above the representation to allow the user to read the sentences normally before thinking about word alignments.

1.4.1.1. Line alignment scheme
This scheme displays the sentences vertically, with each word on its own line. The two sentences will be side by side, and the user will align the sentences by connecting words between the two sentences.

1.4.1.2. Grid scheme
This scheme has one of the sentences written along the x axis and one of the sentences written along the y axis of a grid. If the user wishes to align two words, they would mark the intersection of the row and column that the words were in.

1.4.2. Linking words and phrases
The users will be able to click on two words, one from each sentence, and then click a button marked “Sure” if they are sure about their alignment. They also have the option of selecting two words and clicking a button marked “Unsure” if for some reason they are not completely certain that their alignment is correct, but that would be their best guess. In the event that more than one word has been selected from either sentence, all the words will be connected with all the links possible between them in order to mark them as a complete phrase translation of each other. In other words, groups of words that are connected must be a clique.

1.4.3. Linking to “Not Translated”
Marking a word as "Not Translated" means that there is no corresponding word in the translated sentence. When the users need to mark a word in this way, they will right click on that word and click the Sure or Unsure button. The rectangle will be filled in red and will be outlined in the colors meaning "Sure" or "Unsure" depending upon the button they selected. Aligning this word to another sentence will not be allowed while it is marked "Not Translated".

1.4.4. Unlinking words and phrases
When users click on a group of words that have already been marked "Sure" or "Unsure", the group will become selected again. Clicking on a word that is a member of this group will deselect that rectangle and effectively remove the word from the group. Users will have to click "Sure" or "Unsure" again to remark the remaining part of the group.

1.4.5. Ability to clear
There will be a button marked “Clear” that will erase all of the links the user has made so that the user can start over.

1.4.6. “Previous” and “Next” buttons
The “Previous” button will allow the user to return to a sentence that he has already seen in order to make corrections or complete a sentence. The “Next” button allows the user to go on to the following sentence, but first checks to see if all of the words in the current sentence have been linked to some other word or “Not Translated”. It is important for the administrator to get as much information as possible, so words that are not linked to anything are very undesirable. If all words are linked, the program will proceed to displaying the next sentence. If not all of the words are linked, the user will be alerted to this with a pop up window or special screen that then gives them the option of cancelling the move to the next sentence with a button marked “Cancel” in order to complete the current sentence, or to save the work on the current sentence by pressing a button marked “Save and Continue” and go on to the next sentence with the intentions of returning to the current sentence either by pressing the “Previous” button or by being shown the current sentence again after all the new sentences have been shown.

1.4.7. Return to Main Menu button
The "Return to Main Menu" button will allow the user to return to the main menu screen described above in the case that they would wish to jump to a sentence which is not closely adjacent to the current sentence.

1.4.8. Comments Text Area
There will be a text area marked “Comments” on either the side or the bottom of the screen in order for the user to mention any problems, concerns, or explainations related to their alignment. This information could be of use to the administrator when trying to understand why alignments vary from user to user.

1.4.9. Timer
The timer will not be visible to the user, but there will be an internal timer object that will record what order each alignment was made in and the time it took for each alignment to be made. Added together, this will also provide the total time a user spent on a sentence. This information will be useful in determining in another way which alignments the person was sure of (they would do those first and quickest) and which required more thought. This may also be an indicator of accuracy or effort put into the alignments – too quickly and the person may not have been taking the task seriously, and too long and the person may not have the language ability to complete the task easily.

1.4.10. Help features
Eventually there will be a tutorial program that users will be required to work through before starting their alignments that will give the users guidelines and rules to follow. These rules should also be available to the user during the alignment process by pressing a button marked “Help”.

1.4.11. Possible future features
After the features already listed are completed and functioning, other features may be added to this program. These include giving the user a computer generated word alignment already started for them to correct, having a separate mode for the user to enter parts of speech of the words in one of the sentences, having a mode for the user to enter how phrases modify each other, and having the ability to easily change which languages are currently being used (the support of different characters).

1.5. Finished screen
When the user has fully completed all sentences the administrator has assigned to them, instead of another sentence, the screen will display a thank you message and any other instructions or announcements the user needs to see.

2. Functionality for the administrators

2.1. Log in screen
The log in screen for the administrators will be the exact same log in screen that is used for the users, but the usernames and passwords will be recognized by the program as an administrator. Upon logging in, an administrator will not proceed to sentences to align, but will go to a screen with a menu of all the following administrator functions.

2.2. Managing users

2.2.1. Create users or administrators
This screen will allow the administrator to assign new user names and passwords. This will create a new file for the user. The administrator also has the ability to create new administrator accounts in order for more than one administrator to work on projects. All administrators will probably share the same users and settings for now, but perhaps a future feature will be to allow each administrator to create different groups and use different sentences, but have access to the same users.

2.2.2. Modify user settings
The administrator will have the ability to select a specific user and change that user's settings. These settings include which alignment representation this user will see (line alignment or grid), which sentences they will see, how many sentences they have to align, and other settings that may be deemed necessary later.

2.2.3. Delete User
In the event that a user is no longer needed or is not appropriate for the project, the administrator will be able to delete their file and remove their username and password from the program's access list.

2.3. Managing groups

2.3.1. Create groups
In most cases, it will be most useful to assign the same settings to a group of users and not just one individual user. This area will allow the administrator to create groups, assign the groups names, and assign already created users to belong to those groups. A possible future feature is to implement auto blocking-- the administrator would choose a large group of users, a large group of sentences, and a blocking scheme (e.g. divide users into 4 groups, have all sentences aligned at least 3 times) and the program would assign users to groups and sentences to groups then display the results.

2.3.2. Modify group settings
The administrator will be able to set for the group as a whole such options as which alignment representation the group will see, which sentences the group will align, and if everyone in the group should see all the sentences or if different users in the group should see a random or predetermined subset of the sentences.

2.3.3. Delete groups
In the event that a group is no longer needed, the administrator will have the ability to disaffiliate all users from a particular group and thus lose the settings that were in place for that group. This will not delete the users in the group.

2.4. Managing sentences

2.4.1. Uploading files
The sentences and their translations will be in a file that the program needs to be made aware of. The administrator will have the ability to upload a file of sentences so that the program can use them. This file will be given a name within the program, and once it is uploaded these sentences will be available to assign to a user or group in those settings.

2.4.2. Preprocessing files
The files must be preprocessed in order to be in the format the program needs. Punctuation marks (e.g. .,?!;”' and others) need to be separated from the adjacent words and treated as words themselves so that punctuation marks can also be aligned across translations. Contractions need to be separated into their original words, so that, for example, the word “don't” can more accurately be aligned separately to the translations of “do” and “not” instead of ambiguously being aligned to both those translations. Apostrophe s's and apostrophes indicating ownership in English also need to be separated from the nouns they are modifying so that the idea of ownership can be aligned with whatever indicates ownership in the translation.

2.5. Statistics

2.5.1. By individual user
The administrator will be able to select a user and look at all the work they have completed so far. Available will be the alignments they made, the order in which they were made, how long it took the user to make each alignment, the total time spent on the sentence, and any comments the user wrote in about this sentence.

2.5.2. By group
The administrator will be able to select a named group of users to view statistics about the sentences that more than one user aligned. These statistics will show how similar the alignments were between users, how similarly the users marked links “Sure” and “Unsure”, the average time for working on this sentence, and any other statistics deemed relevant.

2.5.3. Exporting
If there is another program that could make use of the data obtained through these sentence alignments, the administrator will have the ability to choose a group of users and export their statistics in a format and file type that program could use. The exact details of this feature will not be specified until more is known about possible uses for the data.

3. Internal representation

3.1. User files

3.1.1. 2 Dimensional Array for each sentence
The alignment information for each sentence will be stored in a 2 dimensional array. One axis of the array will be indexed according to the number of words in one sentence, and the other axis will be indexed according to the number of words in the translation of that sentence. There will also be a row and a column indexed at -1 (or 0, perhaps, this is not completely finalized) to represent “Not Translated”. In the event that, for example, the third English word aligned to the fourth Chinese word and the user was sure about this alignment, the entry in this sentence's array at [3][4] would be a 1. If the user linked these two words but was unsure of this alignment, the entry at [3][4] would be a 2. If these two words are not linked to each other, the entry at [3][4] would be a 0. If the fifth Chinese word was linked to “Not Translated” and the user was sure, the entry at [-1][5] would be 1. This representation will be a sparse matrix and may be slightly inefficient, but since the sentences will not be extremely long, the arrays will not be extremely large. This representation makes it very easy to compare two alignments and to look at the alignments according to either the English words or the Chinese words.

3.1.2. Demographic Information
The information that the user provides about themselves when they log in for the first time will be stored in a file in a way that would be easy to extract at a later date to display in the user statistics or export to another program.

3.1.3. Time information
The information regarding what order and at what time the alignments were made may be stored in the same file as the sentence alignments or it may be stored in a separate file. Either way the information will be stored in such a way that it would be easy to extract at a later date to display in the user statistics, be used in the group statistics, or be exported to another program.

4. Data analysis and manipulation
This section may contain future features such as being able to obtain a dictionary or translation rules from the alignments, computing different statistics, and formatting these statistics in a useful way. More research into how this information will be obtained and used is necessary before this can be implemented.