Linh Tran

Journal

Week 1: Preparation

This week was hectic, but I am glad I had managed. My first day of work started on Monday, May 22. The first day I managed to get to the ArticuLab just on time after learning how to navigate Carnegie the day before. I first joined the orientation meeting hosted by Laruen Smith, our lab manager. We were seated and given a folder with information about how to start the first week in our lab. Some of the forms addresses the common rules and obligations that we must follow while working in the lab. We went around and did a self-introduction. I learned that most of the interns will be working in the lab part time on the Alex project like me. Most of them were students from the University of Pittsburgh taking the internship as a credit course for their Linguistic major. I spent the day reading the articles about the research such as Samantha's proposal for her dissertation, readings involving cognitive learning using adaptive learning technology and how they influence children learning and development.

The project focuses on measuring and identifying specific factors that interfere with children who speak the African American Vernacular English performance in science, rapport, and socio-cultural interaction. The main contribution of the project is to illustrate how dialect impact students' science performance and how technology could be adapted to improve that outcome. The primary focus for me this summer is to act as the agent that cohesively combines all the interpreted data obtained by students who transcript them, rate the levels of rapport, reasoning, science discourse that base on the social and science aspects. Thus, I spend most of the week reviewing statistics topics, working with R by learning its syntax, and coding using the program. Some of the things I reviewed were hypothesis tests, p value, null \& alternative hypotheses, examples of hypotheses, effect size, differences between variance \& standard deviation, confidence intervals, what is consider a large enough sample, and between subjects vs. within- subject design.

Aside from the work, I was also able to attend two lab meetings. They help me become more acquainted with the type of work I will be doing such as how to analyze data and interpret data sets to come up with hypotheses. I learned the value of collaboration and discussions on how to review a paper for conference submission. I am happy to have the chance to formally meet everyone in the ArticuLab, especially Dr. Justine Cassell. She is very knowledgeable about writing papers for conferences and research. I am happy she brought us donuts for the meeting as well! I am grateful that everyone is friendly and easy to talk to. Lastly, I am happy to be working with Samantha who is very friendly, enthusiastic about the research. I am delighted about the amount of efforts she puts into guiding me, corresponding with me, and answering any questions I have about grad school.

Next week I am expected to do more review and statistics, come up with 2 hypotheses using the new list of data from Samantha, elaborate on the motivations of both in a paragraph for each; and read more articles about critical race theory and topics related to the Alex project. I expect that I would be more adjusted to the full-time schedule of work and start producing work that is substantial that help with analyzing the data for summer.

Week 2: Getting Started

This week I was more familiar with the campus and has already adjusted to the work schedule. Initially, my goals week were to do more reading about the critical race theory, educational technologies; come up with two hypotheses involving the data collected; review statistics and R programming.

The Alex group is split into three main groups: transcribers, social team, and science team. The transcribers will listen to 30 seconds videos clips of the kids interacting with Alex. The social team are supposed to rate the level of rapport using a random sample of video clips. Their way of making sure that the ratings are reliable they will compute the coefficient variation. Lastly, the science team are responsible for looking at the transcripts and detect for science discourse and reasoning. An example of this is demonstrated when a child describes the diet of an animal based on the habitat it lives in.

Good news was that I finally got my student ID to get access to "Box" which is a file manager that allows the lab to upload and share files securely. I logged in my hours for the past week.

The thing that took most of my time was finding an efficient solution to helping the Social Team select random files to do their ratings for data collection. Initially, they would hand select the files they had to rate and keep track if the files they looked at.

I solved the problem by breaking it into two parts:
1) Write a R script that splits the files into groups
2) Write a bash script that copies the files into their designated folders.

This task took a while because I wanted to take the opportunity to learn how to read and parse files in bash and R. The only thing has yet to figure out is how to call a RScript inside a bash file. I learned that I would have to change the PATH to link to the R executable file. This will eliminate the process of having to run both scripts independently.

The R script is run first. It randomly selects sets of .mp4 files and put them into separate lists. Each of those lists are written to a separate .csv file. The R script makes sure that all the lists are of equal size. Then, the bash script will run and select each of the files by name from the path directories they are located and placed them in a single folder. The following image explains the structure of how the files are placed.

On Friday, I discussed my accomplishments and goals with Samantha. We spoke how I should split up time to address how I should analyze data, practice R, and do article readings. I am glad we addressed that because I had trouble organizing my focus. We discussed about an upcoming conference the 2017 AERA (American Educational Research Association) conference and the opportunity to submit papers to it. This is exciting because she offered to help me prepare a paper for the conference. Ideally, this paper will also become my final report. To do that I will have to come up with hypotheses and interesting questions to test and to write the paper by the beginning of July.

My goals for next week are to:
- Brainstorm ideas, hypotheses, look for AERA papers
- Draft of the progress report by Wednesday and review with Samantha on Friday.
- Attend the R programming class on Tuesday.
- Reading more articles and papers.

Week 3: Working with Data & Learning Statistical Mediation

This week I accomplished these things:

1) Attended the First two session of Dr. Howard Seltan's - PIER Statistics Courses Statistical Mediation in Education Research (Summer 2017) . The class is a three weeks class and meets on Tuesdays \& Thursdays from 11-12:30

a) Mediation- In psychology we need to us mediation to show cause and effect. The model's purpose is to show the relationship between the independent and dependent variable through a third hypothetical variable, mediator variable.
b) Causality - most clear \& concise definition mean (all the active subjects) - mean (all of the control subject).
c) Baron and Kenny Mediation approach requires three separate regressions. I learned about the difference between partial vs. complete mediation.
Moderator vs. Mediator variable -moderator determines the strength of the relationship between two other variables and the mediator explains the relationship between two other variables.

2) Completed the DREU Progress Report and discussed it with my graduate student mentor.

3) Worked with calculated Dialect Density Measure for the annotated data retrieved this summer. DDM counts for grammatical dialect speech patterns exhibited by a speaker for instance the frequency of various parts of speech like 'pronouns'. Had to account for two scenarios in which the participant spoke vernacular English labelled as the 'peer' and 'teacher' task. We calculated the difference between the two DDM and retrieved the percentage of context switch e.g (peer DDM - teacher DDM).

The percentages are interpreted as follows:
DDM > 0%, participant code-switched in peer > teacher task
DDM = 0%, they have the same amount of code-switching.
DDM < 0%, no correlation of code-switching and scenarios.

4) Read the book "The Skin We Speak" by Lisa Delpit. The book is very inspirational and exbibit various perspective and stories about how people react against discrimination about the language they speak. One of the most memorable excerpts was about Joanne Kilgour Dowdy. I really relate to the person's story because she had to overcome the fear of fulfilling two roles in her life and finding the proper balance between speaking her native language and the British English correct way. I struggled with the ability to connect with my heritage and culture; I struggle with communication with my parents. Can we sense the disconnect that kids have with their culture today if they grew up being 'Americanized'?

I learned about the revolution of the 'Ebonics Debate' had on the field of linguistics and the turning point that caused much upheaval in mass media. Therefore, it serves as an example of how controversial the matter is the communities that had to struggle with discrimination. Today more research is being done to address the concern about why African American children should learn their native language. I find my project to be powerful because we want to address whether it is better to encourage children to code-switch between dialects in a school setting; they can use code-switching as a skill that progress the way they perceive the way and learn. Nowadays, we are focus on building more artificial intelligence that are more human-like but neglect to address ethical concerns about how it influences social beings. We must study the social aspect of how they are design visually and interactively.

5) Participated in a reading group meeting. The reading group meetings are meant to help undergraduate interns. Discussions are built from related literature assigned bi-weekly and the meetings are held every Thursday from 4-5pm. The first half is meant to be for weekly updates about our progress in the projects, goal, or challenges we faced. The second part are hands-on activities that are like workshops about topics such as how to improve our CVs, personal websites, delivering elevator pitches. This week we discussed an article written about racial discrimination in the video game, World of Warcraft.

This paper relates to the Alex project between we can see how a population people in the virtual game share the same problem in racism discrimination. We learned from the discussion that as computer scientist that we must acknowledge and address the ethical problems that comes with our creation. Virtual stimulations allow us to can observe and test how people react and interact in a controlled environment. During the elevator speech workshop, I learned that we need to identify 1) Audience 2) Goal 3) Why they are situated now and where we want them to be. Identifying these key goals will make it easier for me to deliver a elevator speech.

My goals for week four are to continue my study with Dr. Seltan's Statistical Mediating course by reviewing his notes and reading selected chapters from his book; read the skin we Speak; develop hypotheses to that correlates with the data that is collected. I hope to have a better foundational knowledge about my project to deliver a good elevator speech to my peers on Thursday.

1)Learn about regression, difference between logistic vs. linear regression
2)Go through Dr. Seltman Book website chapters ANOVA \& regression
3)Read book + Statistics class => hypotheses to test with data
4)Also, I am looking forward to participating the 'Virtual Undergraduate Town Hall: Planning Motions for Robotics, Crowds and Proteins' webinar for DREU on Tuesday, June 13.

Week 4: Writing R scripts to Compute Dialect Density Mesures

These are the things I accomplished this week:
1) Uploaded Week 3 Progressproject
2) Learned about logistic vs. linear regression
3) Went through Dr. Seltman's Book and Website on ANOVA regression
4) Attended the Mediation Statistics Class
5) Modified the script to calculate DDMs for each grammatical features
6) Miscellaneous tasks such as exported data files to make format necessary for data analysis

This week I learned a lot about data processing using R. The reason for processing it in R is that we already using R to perform data analysis so it is convenient and easy transition for both the data analysis to be done in the same platform. I am aware that it is easier to perform data manipulation using SQL because it is quicker and not RAM memory heavy and I haven't really worked much with SQL to make further judgement. Despite those pros, these are the things I like about R:
1) Vast packages and libraries that could be easily imported
2) Good interface that displays the imported data
3) Good for doing statistical analysis with the existing libraries that are created for that purpose.
4) Interactive each line of code is interpreted instantly line by line which makes debugging easy

Logistic vs. Linear Regression: These are two types of statistics tests used to make data analysis. In data analysis, we want to know the impact that independent variables (e.g. What is the impact of percentage of code-switching have on a child's ability to reason in science.) Those two models are used to evaluate the relationship between independent variables and dependent variables.

Linear Regression is a type of general linear model. It is equation whose dependent variables when plotted creates normal distribution. Ex. Illustrating how number of frequency of science words mentioned by a child predicts the amount of science of reasoning the child utilizes in a conversation.

Logistic Regression in contrast is a generalized linear model that is a sub category of general linear model. The dependent variables are categorical and non-continuous. Mainly the relationship between the dependent variable and independent is expressed by a 'log' expression. Ex. Predicting the odds of a child experiencing high rapport with a virtual agent using the percentage of dialect in a entire dialogue.

Dr. Seltman's Class we covered an overview of the following terms:

- Sampling Distribution: Distribution is generated by running repeated experiments we can also get the distribution by generated data using bootstrapping methods which stimulates the n numbers of experiments in a software will using a random seed. In the end, the sample data is plotted on a curve that is called the distribution. The distribution is used to make predictions about the population. Ex. We can predict how much dialect a child speaks in comparison to his peers of his school.

- Null Sampling Distribution: A type of distribution that illustrates the probability
- result if the hypothesis is true. For instance, the average amount of dialect
- spoken in a classroom from a school located in low income region is only 10%.
- The null hypothesis is that Ho = .10. If we want to prove that the average is
- higher then we will expect the distribution to be more positive right tail of
- the null distribution to reject the null hypothesis of 10%.

- Three methods of mediation analysis: Baron and Kenny, Product Estimator, and - Casual mediated effect:

Pro:
-Baron and Kenny - could be tested with multiple mediators.
-Product Estimator - is faster and robust, is a built in boot-strap model that relaxes assumptions
-Casual mediated effect- formulated and explicit concise analysis way that could be expressed as a equation.

Cons:
- is most likely to be higher error because we have to perform multiple regressions and look at multiple p-values.
- existing libraries and bootstrapping methods is limited to only two mediator variables.
- may be complex and convoluted in figuring out the variables to plug in. - R script example using the three methods
- Multiple mediators analysis - Mediated, Unmediated analysis, and Missing Data

Goal of mediation analysis is to calculate the fraction of effect on result using various methods such as Confidence intervals, p-values using bootstrap methods or aggregating the causal effect.

Unmediated Analysis- is more difficult than mediated because that is if all the assumptions are corrected for and there's no cofounders. This method is more ideal for controlled environments. In mediated analysis post treatment variables could not be correlated with the mediators. Meaning that there should be no relationships between a mediator variable measured for otherwise we need to include that mediated value as a dependent variable in the model to remove bias. Ex. High report -> High Dialect density -> High science performance. If science performance varies with amount of dialect than we must measure that in the model.

-Missing at Random is a scenario where have missing data that contributes to
-only the independent variables maybe a some of elements of a column of
-measurements records for how many science word utterance in a 30 minute
-conversation student A have with a virtual agent. In this case we will two
-options to settle the discrepancies
1) Ignore the entire column from the result, -which will result in lower power and precision of the analysis method
2)-Performing imputations missing data by adding error and randomness to the data -set, which is better than option (1) but could be a little biased.

Previously, I have manually calculated the Dialect Density Measure, percentage of dialect a child speaks in two scenarios of talk with a virtual agent. This task is tedious and error prone because I will have to make about 10+ arithmetic and data manipulation operations to calculate for each of the 30 participants in the study. I had multiple instances where I made the grad student worry about the outputs because at first it showed that there is significance in within subjects and between subject test where we have the children speak only student English with the virtual agent in the two scenarios where it plays the teacher role and peer role; And have a independent variable in which the kids talks to the virtual agent code-switches between Venular and Standard English in both scenarios. After, further evaluations the between subjects in which the children speaking with code-switching agent is the only thing significant. Therefore, writing a script that does the computation will ensure that the correct results are generated.

I proposed to generate the data results into multiple files by creating a script in R. I was successful in creating it to account for all the listed features (are grammatical features such as pronouns, verbs, etc. that child utilizes) and have the script generate 30 files with the DDM results inside. The script just iterates through a excel workbook and sums up columns that corresponds with a child's response that is tallied under each of the measure features. It will then generate what is called a data frame (which is a type of data structure in R) that is just like a table or matrix. For each iteration, a new data frame is created and written to a .csv file.

Week 5: Fixing Script & Delivering Elevator Speeches

This week, I continued working on the R script that auto calculates Dialect Density Measures from our study results. I had a couple of challenges in the process of writing it but eventually got so that a summary result of DDM data for each of the participants. I also was able to with grad student mentor, Samantha, about grad school. She also helped us how to deliver a effective "Elevator" Speech. The "Articulittles", as they call the interns had the group meeting outside at a park.

Here are some of things I accomplished this week:
1) Brainstormed elevator pitch, about project, lab, and interest
2) Validated script for correct output with manual calculates
3) Identified formatting issues that prevents the scripts getting the right output
4) Participated in Thursday Articulittles' meeting
5) Drafted & Uploaded DREU Midterm report

At the last Intern's group meeting on Thursday, Samantha discussed about how to brainstorm good elevator pitches. I find it helpful because in the past I have always been terrible in explaining to others what my research about. The reason being I feel enormous pressure in delivering a good elevator speech otherwise I am not doing the research any justice. I want to justify why my research have any contribution to the research world. Therefore, she simplified what we need to consider when delivering a pitch in 3 steps.

1) Audience- we need to tailor our speech the specific audience we are addressing by selecting key words that they empathized with
2) Goal - we need to identify our goal, what we want to accomplished, for my case I want to address the need to study how language and culture influences a children's success in impoverished school settings.
3) Predict their position now on the issue and how to win them over on being in our side.

A elevator pitch should be short and start with a hook. Coming up with a captivating hook was fun we went around the table and have everyone quickly say their hook. I am amazed on the creative angles that every approach them hook. For instance, we have the 'SARA' project which is currently a virtual agent that recommends movies to people. The hook was about utilizing the trend that technology is becoming more advance, applications that are more interactive with users are becoming popular because people desire the need for companionship with their devices. My hook was about that schools are becoming more diversified and should accommodate for students that come from different culture backgrounds.

Writing the script was kind of difficult because I had to go through a series of steps to read in the data. The original data is imported from ELAN, software that allows us to annotate complex audio sources, we utilize this software addition to Audacity to extract annotation dialogues from our experiments. The ELAN file .eaf, need to be exported to a Comma Delimited files, .csv files. After figuring that out. I learned that I needed to come with convention that allows me to systematically read in the data from R by importing all the 30 files into a workbook in order so that when I compute the DDMs I can easily identify each file by the sheet ID # and name from the original file. I learned that some of files when exported generate a extra line that shifts the data down a row which is a unique case that I had to consider in my program that identified that needs to be fix. In addition, I learned my manual calculations that I had to verify and compute 3 times are not reliable for sometimes the 'sum' function will not work on columns of data that are not formatted a certain way. Therefore, both of the results obtained from the script and manual calculations are inconsistent therefore. Most of the week I was identifying why the results are inconsistent.

This week's intern meeting was fun. We had our meeting at park nearby campus, which we reference as the park with the dinosaur. We quickly delivered our weekly updates and then played a game of taboo. The thing that is different is that each of the rounds are played differently. The rules are that everyone first come up with three words/phrases into a pile and we are supposed to play the first round as we normally do with taboo and with each round it gets more interesting round 2 we are only allowed to say one word and round 3 is charades with guessing the word. I am happy that Samantha had arranged the intern fun day because I was able to get to all the interns more on a personal level.

Lastly, I coordinated with Samantha about the DREU Midterm report she helped me look over the report before submitting and draft the mentor report to be sent to Dr. Cassell for review and approval. My only comment about the forum where we submit the reports is that I feel limited because each question has a max character count of 1000 characters I did not consider that when drafting the report so I had to cut out some of my report to have it fit in the response boxes :(

Week 6: Identifying inconsistencies in Data & My DREU buddies

This week I accomplished the following things:
1) Worked on formatting DDM output
2) Identify data files that needs to be re-exported to match script
3) Worked on updating DREU Website for weeks 4,5, and 6
4) Participated in Intern meeting
5) Discussed with Samantha about next week group meeting among the Alex people

Last week, I identified the reason for the inconsistencies in the DDM calculations result from script and manual log. Early in the week, I went through all the data sheets and identified the line and file that have a extra line so that our annotators, Caitrin and Ryan, to correct. I then had to emailed Samantha, Caitrin, and Ryan who are involved with the data about the issue so that everyone was on the same page and specified the specified the next steps we need to pursue to finish gathering the DDM results some of which include:
1) Generalize script to take in other form of data
2) Make changes to the output format of the DDM table from suggestions receive from Ryan and Caitrin
3) Identified the features I need to account in DDM calculations for the new data
4) Re-run the script with the re-exported data
5) Upload the new ddm data frame.

I spent the remaining of the week summarizing my accomplishments and reviewing what I had down in the last 2-3 weeks to update this journal. I had compiled my accomplished by referring to my daily hours tracking sheet that Lauren Smith, the ArticuLab Lab Manager put together. There I record the number of hours I worked it day and the time when I arrived and depart; there I also can note my accomplished and challenges I had throughout the day. Having the hour log allows me to easily recall what I had done daily and enable me to quickly write my journal entries here. I had to also compile pictures I wanted to upload to the website for that week too. I find that the most fun part of writing the journal entries.

This week Intern Meeting, we started off by all walking to the Gates Center for Computer Science building because our conference was occupied for the hour. The building was a wonderful place to have our meeting. We also had someone there took a couple of group photos so that we can participate in the DREU photo contest this year. We are very eager to win because we all pose for the photos. We already captured 3 sets of occasions that will be entered in the contest. I am excited about taking group photos because I like to look back on the photos for memories sake. Hopefully, I can get access to them soon so that I could submit to the contest our group is to take as many pictures as we can so that each one of us could submit to the contest, since there are so many of us DREUs students here. Also we also talked about Graduate School and the process of applying, how to choose a mentor, the importance of the GRE, and applying for funding.

Here is a mini introduction to the other DREU students in the lab:
We have Vivian Tsai, a uprising Junior in Computer Science from John's Hopkins University. She is working on the Sara project and identify ways to improve the app such as optimizations that speeds up the process of Sara interpreting and obtaining user inputs. Sara famous figure of our lab although I was told her name is no longer Sara, but I will still call her talk until she is officially assigned another name. Sara is everywhere in the case that she is suppose to be the embodiment of a personal assistant that gives you recommendations about movies, and engage with others to build meaningful relationships to assist them for instance the current live version of Sara assist people at the World Economic Forum in Tianjin, China. She utilizes machine learning mechanism to responds to user requests. Thursday, yesterday, was the first time I witnessed a live demo of Sara interacting with a person in the lab and took a selfie with Sarah and Max! She has a vivacious personality and loves to collect books. She also labeled herself as a "bad cook" but I believe that with practice, she will get better. Also, she is big Harry Potter fan too.

We have Sarah Matthews , a uprising Senior in Computer Science from Hunter College. She is working on the SCIPR, Sensing Curiosity in Play and Responding, which is a board game that is design to have children engage in collaborative learning among themselves and with a virtual agent. The goal is to identify what motivates for children curiosity in learning. Sarah is responsible for working with the AI component of the game such as coming up with a dictionary of rules for the players to use in the game in addition to the game strategy approaches that should be consider in the process. Sarah also wants to pursue graduate school in a phd program for Sustainability. Sarah is a bright individual who I consider to be very charismatic and outspoken. We both love cooking and food buddies.

Lastly, Max Collins , a graduate from University of Illinois at Urbana Champaign. He majored in Psychology and Informatics. Max works on the SCIPR project along with Sarah. He is responsible for doing the annotations of the data recorded from the videos recorded of the children interactions. He also helped with identifying what social queues or behaviors encourage children curiosity to learn. Despite being a psych and informatics major, Max, has some incredible research experiences in his past internships and have a good amount knowledge in programming and app design. He about to start his phd program at University of Irvine soon in the fall in their Informatics Graduate program. Max is a forward-thinking individual who eventually want to become an entrepreneur in building a startup company.

Overall, I am very happy to part part of a lab full of smart and talented individuals. It is very inspiring to get to know everyone here. I think I will be sad when I leave.

Sometime next week, Samantha plans to have a group with the Alex interns and discuss wrap up ideas on deciding proposed hypotheses to start organizing for conference paper submissions. Her ideal goal is split us into group of 2 to work on hypotheses to test. I am excited to learn about how to purpose a hypotheses that involves testing the number of utterances that a child makes in the pretest monologue test, we child was recorded in describing a picture without interruptions for about 20-30minutes and measuring the density of utterances and seeing with the two different scenarios of talking to Alex how their engagement (level of shyness changes or when the child becomes comfortable enough to talk to the virtual agent). I also want to work with applying a machine learning technique that Samantha spoke about that one of our past lab interns under another project called RAPT, Rapport-Aligning Peer Tutor: Virtual peers building rapport to support students in learning, designed that detects and does pattern recognition. Once we have access to that code I want to learn how to utilize and transform it so that we could brainstorm some experiments to design.

Week 7: Wrapping up data frame & Preparing for final report

This week was odd because of July 4th. It started on a Monday where I woke up cranky because I realized that mondy is not. It is okay though because I had some work that I needed to finish.

Here are some of things I accomplished this week:
1) Finished identifying all annotated files that need to be re-exported
2) Finished importing the re-exported files into a single worksheet on Box.
3) Completed first assigned homework for the Mediation Statistics Class
4) Participated in the interns group meeting about CVs
5) Attended Workshop on NSF Fellowship
6) Started on preparing for Final Research work (i.e. Annotated Biography and Literature Reading)

After double checking the results from the script file. I learned that there 5 more files that I missed last week. I identified the special case. The problem about how the annotated data exported into excel have row data that is shifted down a row. I learned that working with data from various different formats and source could be tedious and are prone to error. It requires a lot of time preparing the data and checking. I decided to import the data into a different program in this case, R. There are limitations on the extent on how clean data are imported into Excel. Therefore, best way to work with data Excel is to limit the number of platforms that the data is exported or re-imported into as much as possible. Some of the hypotheses that I came up with about working with data are: the risk of file corruption when data is not handled carefully like being exported too many times, error about confusion about how data is interpreted in the process could lead to wrong results, expect to find formatting issues that need to be address and leverage whether it is it worth trusting a script or doing certain calculations by hand. I learned not to rely too much on the automatic functions such as 'sum' that we take for granted because sometimes something as small as formatting could lead to error in result. These factors are things that I learned to work with this week. I also came to realization for the future in my research career in human interaction is that despite the fun of measuring the effects that a technological device on human subjects most of the challenge is worrying about how the data is collected, how to process the data, and how to interpret the data. Interpreting the data requires the knowledge beyond the background field that I expected which is Computer Science. I learned the important that inter-disciplinary labs can contribute to research. For our lab, it is important that we have people who understand statistics, computer science, linguistics, and physiology.

Completing the first homework for the R class was fun and exciting because for the longest time in the class I finally amends with the fact that I do understand the materials presented in the class. Although it is just simply using R pre-loaded existing libraries. I rely appreciate the fact that I could apply that statistics knowledge I learned from the class to manipulate the data and result. I learned how a confounding variable could influence the mediated effect of some X on some Y. Therefore, accounting for that would lead to more accurate results. Also, with mediation there is no way of finding out 100% that measuring a common cause is correct. In addition to reviewing the homework for the week. I learned about how casual relations could be represented in a Directed Acyclic Graph (DAG) which is a requires understanding that the graph has no cycles. We represent X and Y on the group and the casual variables. The graph will only work if we make sure that all variables are represented as a DAG. Causes variables are called exogenous variables are represented on graph with no arrows into them. In R we could cut all the inputs to X aby using do(X).

The backdoor algorithm tell you whether or not we get the conditioning on a variable could results on having the correct casual effect for coefficient of X. This algorithm is best demonstrated using diagrams instead of verbal instructions.

During the CV workshop, all the interns paired with a partner to critique each other CV/Resume. I helped one of the interns with critiquing her Resume and taught her the difference between a CV and Resume. I also identified ways to better format her CV to make the resume look cleaner in communicating her work experience. I offered to send my CV that is written in laTex to her as reference.

I learned a lot from the NSF fellowship. I inspired me to reason why I want to go to grad school and how I could prepare in applying for it soon. I started off by considering what I wanted to write about in my proposal. I want to write about a project that involves an application of using artificial intelligence technology to understand ones and interpret one’s emotions particularly with the medical potential of using as a diagnosis or threptic device. I research about current projects that exists and asked Samantha to reference me some potential authors to search for as well. I found this in part of my search around preparing for my final report.

Next week, we are wrapping up on preparing the data frame full of all the data we have collected over the summer to data analysis and testing next week. I learned that I had to back-track a little because I calculated the DDMs for the individual features wrong. I learned that I should always remind myself what my end goal was and what I was calculating to see if the output make sense before proceeding too far. Also, I should also decide names for variables that best makes sense for me so that I won't get confuse.

Week 8: Prepariing Data for Calcuating Utopy & Reading Papers

There were a lot of meetings that happened this week. The wonderful thing about was the free food in the meeting room. Also, have two fun activities planned for the week.
1) Interns Luncheon meeting with Dr. Cassell
2) Baseball Game at the PNC park (Pirates vs. Cardinals)

This week I mostly focused on understanding a new technical approach of measuring rapport that is used in the lab called "utopy". The approach was proposed in (Sinha, Tanmay and Cassell, Justine , 2015). The approach was used to illustrate that rapport changes with time. The paper purposes the mechanism for finding the link between convergence on learning and rapport. Following that research (Sinha, 2017) provides an in-depth analysis of measuring rapport in longitudinal peer tutoring. With the mechanism they found a that rapport changes overtime. Micheal Madaio ( 2017), used Utopy to illustrate how indirectness influences rapport among dyad tutoring partners.

Taking a step back rapport in these context means the likelihood that there's a mutual understanding and agreement between two or more subjects. In our lab have multiple studies around these measuring indicators. Rapport is usually had annotated by our linguistics majors.

After learning about purpose of Utopy I worked on modifying my data to fit the setup necessary for calculating utopy.

Besides, my focus on utopy. I also worked on the following things throughout the week:
1) I worked on updating the DREU website for week 7
2) Attended R stats class
3) Read papers about Utopy
4) Attended intern meeting on CVs
5) Install environment for running python
6) Attended Workshop about "Writing the Personal Statement"
8) Played the "Outbreak" game with SCRIPR project people

This week we had a short class for the R class. The topic was on Power Mediation. I had to leave early for this class to attend the luncheon with Dr. Cassell. We reviewed some familiar topics. I learned that power is important in determining if a experiment is worth analyzing for. Power relates closely with the null effect and p-values. Power is measured by a percentage of which you see p <- 0.05 that we see no significance in the data. I learned that power testing should be exclusively used to test before the experiment is measured. Usually with power calculation be aim for 80% power of risk in failure for a given meaningful effect size. We can calculate power using ANOVA (non-central F distribution) and can be calculated by stimulation. Normally, power is computed with more than one scenarios are evaluated.

Here are some of the notes about Utopy I learned. The motivation behind the Utopy is analyzing rapport is that most of the research focuses on cross-sectional rather than longitudinal.

In the lab we perform experiments that studies correlation between rapport and learning. We conducted user studies measuring rapport by having raters to rate 30 seconds slices of videos.

Utopy illustrates the probability of predicting higher rapport over a duration of time for our study it is in thin-slice of 30 seconds separated by task 'peer' and 'teacher' and by session of interaction by participant. The data produces 4 files for each thin-slice by task and session.
1) Graphical interpretation of Utopy in a directed graph
2) List of averages or rapport
3) Line plot of illustrate the trend of by slice number and rapport.

The results that we expect to illustrate with Utopy are understanding if rapport is likely higher by task in a between subject test. We can also visualize the rapport score trend comparison by task.

During the intern's meeting. We discussed ways to improve and tailor our experiences descriptions to match the job/career we pursue. Some of the useful tips I received was selection or word choice. Sometimes using a mission statement from the job sometimes henders job prospects if the jobs mission doesn't align.

I installed the Pycharm IDE to run the script. The process was easy. The only difficulty I had was understanding what the script was doing because the individual that wrote the script did not much code documentation yet. Other thing that I worked on was formatting our data to match the input the Utopy expects to take in.

During the workshop this week. I learned the about the process of writing a personal statement and the importance of 'fit' with the target program. I learned about the STAR story for describing our research work. The acronym stands for Situation, Task, Action, and Result. Style of deliverance is important in captivating the readers.

Lastly, I participated in the planning the 'Outbreak' game. The things I learned from the experiment is the that discussion is important in building rapport and curiosity.

Week9 : Wrapping up & Data Analysis

Week10 : Drafting Final Presentation

This week I worked on wrapping up my project. I started off by wanting to write up the final report using the tradition methods of doing autobiography, literature review, abstract, and then outline my paper. However, I realized that I am running out of time to also prepare for my project presentation to the lab. I am most worried about this because I have always been bad at presenting in front of other people. The thing that makes me most nervous about was disappointing others and not having anything impressive to present to them. Unlike the last two interns, I felt like I didn't do any substantial work worth talking. But then again I should not compare myself with others. Therefore, I continue to push my way through preparing for the presentation. At first I was going to present a poster but then learned from Samantha that it is not appropriate way to present that in front of a group of people.

I was able to schedule a brief meeting with Samantha to get clarification about what I was going to present. It was really helpful because I was overwhelmed about the idea of presenting everything about my project. I thought I had go back to the beginning and do more literature reading to derive my hypotheses to start writing.

After meeting with her I learned that I could present about the Utopy calculations I did last week. Samantha thought it would be interesting to compare Utopy with average rapport because normally in the past they would only work with average rapport. Samantha was able to guide me on ideas of what to test utoy and average rapport with. Because we have multiple features/variables to experiment with. In the process of doing data analysis I learned that I was pretty weak at interpreting graphs which is embarrassing. However, I continued to work at it. After that Samantha provided my helpful tips and an outline of questions I need to address in my presentation. I also learned that I had to go back and really think about what Utopy is and why it is important. Understanding that took a while of reading and conversating with Samantha. I am grateful that she was patient with me through the process.

Coming up with hypotheses was hard because I realized that throughout the process I didn’t fully understand the project well enough to understand the methods of which the data are measured and how they interact to come up with predictions and hypotheses involving them. After a long while of thinking I was able to come up with the hypotheses that rapport impacts children's ability to reason. Therefore predict that higher rapport results in higher reasoning and that utopy is more predictive than average rapport because it considers the change of rapport over time.

This week I worked on wrapping up my project. I started off by wanting to write up the final report using the tradition methods of doing autobiography, literature review, abstract, and then outline my paper. However, I realized that I am running out of time to also prepare for my project presentation to the lab. I am most worried about this because I have always been bad at presenting in front of other people. The thing that makes me most nervous about was disappointing others and not having anything impressive to present to them. Unlike the last two interns, I felt like I didn't do any substantial work worth talking. But then again I should not compare myself with others. Therefore, I continue to push my way through preparing for the presentation. At first I was going to present a poster but then learned from Samantha that it is not appropriate way to present that in front of a group of people.

I was able to schedule a brief meeting with Samantha to get clarification about what I was going to present. It was really helpful because I was overwhelmed about the idea of presenting everything about my project. I thought I had go back to the beginning and do more literature reading to derive my hypotheses to start writing.

After meeting with her I learned that I could present about the Utopy calculations I did last week. Samantha thought it would be interesting to compare Utopy with average rapport because normally in the past they would only work with average rapport. Samantha was able to guide me on ideas of what to test utoy and average rapport with. Because we have multiple features/variables to experiment with. In the process of doing data analysis I learned that I was pretty weak at interpreting graphs which is embarrassing. However, I continued to work at it. After that Samantha provided my helpful tips and an outline of questions I need to address in my presentation. I also learned that I had to go back and really think about what Utopy is and why it is important. Understanding that took a while of reading and conversating with Samantha. I am grateful that she was patient with me through the process.

Coming up with hypotheses was hard because I realized that throughout the process I didn’t fully understand the project well enough to understand the methods of which the data are measured and how they interact to come up with predictions and hypotheses involving them. After a long while of thinking I was able to come up with the hypotheses that rapport impacts children's ability to reason. Therefore predict that higher rapport results in higher reasoning and that utopy is more predictive than average rapport because it considers the change of rapport over time.

Week11 : Rehersing Final Presentation

This week I continued to work on my final presentation. I was hoping to finish everything by Wednesday and send the slides to Samantha as well as rehearse with her on Thursday.

I was able to finish drafting the slides and had Samantha look at my progress. At that point I was overwhelmed with pressure of not including enough background information to present to work but forgot to think about the essential part of the project which is to focus on what my contribution and the motivation around it. My biggest mistake was to not come up with my own motivations and just rely on the ones that Samantha drafted for me. Also, I still was not comfortable about what the data in my slides meant. Therefore, I did a very terrible job rehersing it over with Samantha. At this point I felt terrible and wanted to crawl under a rock because of the embarrassment of having someone explain to me how to read a graph. Now looking back at what I leaned I should have paid more attention to everything I learned. I should of spend more time getting to know what the variables I am wanting to analyze and contextualize them in a way so that I could easily understand instead of focusing on the overall completeness of my presentation.

Therefore, I decided to request a to postpone my presentation to the next week which would mean that my termination date of the DREU program will be on Aug 11 instead of Aug 6. I spent the Friday interpreting my data. It took a long while because I realized I went through the experience not applying to much of my critical thinking skills to understand the importance of the data I have but instead focus on writing scripts to provide the data or format data. I regret not taking advantage if the time I had to get a deeper understanding of what rapport meant, why we care to learn about it, and it does it influence learning in classrooms. Mid-way through this experience I forgot about my purpose of doing research and the whole idea of applying the 6 scientific methods that I was taught to do so since the beginning of the experience. I realized I got caught up with waiting to long to have someone till me what to do instead of independently thinking about what I could of done.

In my data I learned if we look at average rapport vs. percentage of reasoning kids want to reason and measure that across two conditions of 1-on-1 interactions with a virtual agent (suppose to stimulate a learning environment). I learned that if we plot that relationship we see that kids who are placed in an environment that expects them to speak only standard English they would have lower average rapport then kids that allows them to code-switch between two dialects (African American Vernacular English and Standard English). The amount of reasoning across the two conditions doesn't look significantly different. However if we look at the relationship between average rapport and reasoning in the code-switching condition we see that they not only have higher average rapport than kids that are exposed to the other condition but that kids with very high rapport wants to reason less and if we plot reasoning against utopy we see the same negative relation. Therefore shows that kids when placed in an environment that allows them to code-switch they would want to reason less. However if we compared the reasoning percentages across conditions we see kids will want to reason about the same amount if not more for the code-switching if we zoom in to the left graphs it is apparent that range in total reasoning percentage is higher for code-switching condition than non code-switching. Therefore, this overall it is useful to look at both average rapport and utopy which contrast my prediction. Average rapport allowed use to see that standard condition rapport is lower than code-switching and that there's not relationship between average rapport vs. percentage of total reasoning. Also that average rapport shows use that code-switching average rapport is higher than none-coding switching. Utopy shows us increase in rapport is not always good for environments that builds really high rapport in a short amount of time. As shown on the right graphs that shows that children reason less when rapport likely continuous to increase in time. But standard condition rapport is still increasing but unlike code-switching rapport and reasoning increases together.

I learned that interpreting data is most difficult for me when I didn't spend time thinking about the meaning of the variables and do critical thinking on why it is important. This week I had to spend a lot of time looking into motivations about why rapport matters and how might it influence kids learning in classrooms.

Week12 : Presenting Final Presentation

This week I worked on preparing myself for presenting my final presentation to the lab. Originally, I was supposed to present last Friday but had to postpone it till the next week. I am glad that I continued to work an extra week because my presentation went better than I expected. I learned a lot in this last week regarding my topic about rapport and children's learning. Samantha helped me a lot in preparing the data and helping me interpret it. I learned one most important lesson this summer is that should always take consider every opportunity as a challenge for improvement. Before I thought it is of me best interest by showing that I am productive and show that I am doing a lot of work each week. However, that is not the purpose of the experience. I thought I knew what research was about when I first enter the ArticuLab but it is not. Despite, having prior research experience. Every experience is different. I learned to face some of my weaknesses in my technical skills that is required for conducting research. Therefore, this week I spent a lot of time critically analyzing what I learned this summer and learning how to interpret it so that I can easily explain it to other people.

I learned from Samantha that when becoming a researcher, it is important that you know how to communicate your findings to others. Also, a lot of dedication is required into preparing it as well as time constraints for my case it was 10 minutes long and 5 minutes for Q&As.

I had to rehearse and revise my presentation a lot in a span of a week because I focus too much on telling a background of way rapport was important instead of focusing on establishing what part, I looked at specifically which is a way to measure rapport and its relationship with the way children learn. The unique angle of my research of comparing utopy (the probability of increase rapport) vs. average rapport. The findings were pretty interesting that average rapport for determining whether it is productive to look at utopy or not. Because utopy doesn't consider the average base rapport high average doesn't always mean high utopy. I learned that students with already high rapport will have a negative correlation with the amount they will reason because they might have already maxed out their rapport level. In contrast, students with okay amount of rapport will benefit of building more rapport. Therefore, they will benefit in gaining more rapport and reasoning increase. The implication of the study is that we should first build education interventions that focuses on building high utopy at first up until we reach a certain amount of average rapport and then work on maintaining the rapport instead of continuing building it.

When presenting my research, I found it helpful that I make sure I limit how much I say and make sure it is communicated in a concise and straightforward way. I could do that by summarizing each slide into a sentence. Overall, I am grateful this whole experience. It has really put everything I accomplished thus far into perspective that there's always room for me to improve and that I should always welcome more challenges instead of trying to hide form it because that is the only way I learned how to grow. I am happy that the presentation went better than I have ever thought.

Week1

Week2

Week3

Week4

Week5

Week6

Week7

Week8

Week9

Week10

Week11

Week12