17 Aug 2016
I am so excited to report that our results from the study have come in! We had just under 90 participants. They each took the survey and then went on to rate the privacy settings on 20 images. Overall, I have almost 2000 content-rating results to work with.
This week, I've begun my analysis of the data, the details of which are below. In order to analyze the data, I've had to export each table from the SQL database into a .csv file and then import that into SPSS. It can be a time-consuming process, but it is so much better than the manual data input that I had to do at the beginning of the summer.
Users were asked to choose the privacy of an image and were given 4 options. We had 2 albums of images - one which was considered private and one which was considered public. We only ran the model on the private images.
- .25 Only me
- .5 Select friends
- .75 All friends
- 1 Everyone
We used the model to generate the privacy setting that they "should" choose based on their answers to some survey questions about privacy and a preselected privacy setting for the image. Below are statistics for the choices the model generated choice, and the actual user choices on the public and private images.
Frequencies: 78.1% Only me, 18.2% Select friends, 3.6% All friends, .1% Everyone
Final user choice - private
Frequencies: 48.6% Only me, 24.1% Select friends, 20.4% All friends, 6.8% Everyone
Final user choice – public
Frequencies: 11.3% Only me, 12.2% Select friends, 48.2% All friends, 28.3% Everyone
I conducted a paired-samples t-test on the model-generated choice and the private final user choice (p < .001). They are statistically significantly different which is a departure from our preliminary data. Users tend to choice a less-private setting than the model recommendation. I then paired-samples t-tests on the first friend and public and private final user choices (private: p < .001, public: p < .001). Both of them are statistically significantly different than the friend choice. This is probably a product of the study’s design. We decided to hard-code the friends’ choices as more public on the private photos and as more private on the public photos in order to measure the affect of peer pressure on the user choices. When the images are private, the users choose a more private option than their friends, and when they are public, they choose a more public option than their friends. I doubt this trend will continue when I analyze the model data in isolation because the friends' choices are generated by the model and are often quite close to the recommended choice.
Mean: 39.2, Median: 35.5, Min: 22, Max: 70
55.7% Male, 44.3% Female
Number SNS in which they actively participate
Mean: 3.56, Median: 3, Min: 0, Max: 30
Mean: 258.99, Median: 160.5, Min: 0, Max: 2200
Mean: .412, Median: .423, Min: .115, Max: .714
Today, I've been doing some more specific work with the data. I am working with the data from the no-model images today.
No-model dataMean of user privacy: .6629
Frequencies: 17.1% Only me, 20.7% Select friends, 42.0% All friends, 20.2% Everyone
Mean of predefined comfort levels (what we set as what the comfort level 'should' be): .6415
Frequencies: 20.5% Only me, 31.9% Select friends, 18.1% All friends, 29.5% Everyone
I analyzed user privacy choices using 2 survey questions; I've provided the means of the separate groups below. The first was: "What privacy settings do you currently maintain?"
Most of my profile is not visible to anyone: .7639
Most of my profile is visible only to a selected group of friends: .6118
Most of my profile is visible only to my friends: .6648
Most of my profile is completely public: .6741
The second was "I have had concerns about the privacy of my data on social networks" and they were asked to what extent they agree.
Strongly agree: .6279
Neither agree nor disagree: .7377
Strongly disagree: .6897
I think the preliminary data is very interesting. We can see somewhat illogical trends when we look at the last two questions where people who profess to have the most strict settings are actually choosing more public settings on average. My theory is that these people post fewer things on social media so they are not used to evaluating content or engaging with these platforms. Therefore, they are more likely to select an "incorrect" privacy setting. This suggests that managing content privacy settings is not just a matter of conceptualizing the risks/rewards of sharing and applying logic but actually a measurably learnable skill.
I would like to perform some tests to back up my theories but my preliminary analysis of the posting frequency of the users hasn't yielded anything terribly useful.
Beta launch successful
11 Aug 2016
A few days ago, we began launching the study in pieces. We first had 5 participants, then 20, and we plan to add an additional 80 soon. I am beginning the statistical analysis as soon as the results come in.
Before launch, I had to clear all of the test data from our tables. I also wrote some Java code that automatically generates the user accounts. It was surprisingly easy and is only about 50 lines.
My preliminary statistics are showing that there is no statistically significant difference between the final choices of the users and the model-generated choices, which means our model is working! Hopefully I'll have a more detailed update on the analysis in a few days.
Preparing to launch the study
4 Aug 2016
This week we are finalizing everything we've been working on. This means that we're doing a lot of cosmetic fixes like changing the wording of a survey question or the size of an image. I've also been learning the technicalities of PHP because, as we test more and more rigorously, little errors come to light. For example, today I learned that the fetch_assoc() method only returns a row of a SQL call at a time, so you have to call it for as many rows as you have.
As we enter the final stages, we are preparing for real beta users' participation. One component of this is manually creating the user accounts. We have to do this because we plan to launch the study on Amazon Turk which assigns usernames, passwords, and keys for payment using a csv file that we upload. Today, I created the accounts and the quasi-random keys for 20 accounts and put them in the database. However, I can't help be wonder if this process can be more automated, as I feel that it could be more secure and less work for me.
Finishing up development
26 Jul 2016
We are finally seeing the light at the end of the tunnel! We're entering the final stages of the development of the study. From now on, I will be focusing on normalizing the database and adding security measure to my code.
After I was able to pull the variable I needed from the database, I needed to be able to pull it based on the current user. This proved very tricky, and I only figured out how to do it today. The field that holds the username is only accessible via PHP but my AJAX is in JS. I was faced with the problem of accessing the field in JS. Eventually, I worked out how to write 'multi-line' PHP into one JS variable and echo-ing back only the variable I needed.
Starting to integrate my model and information from the database
15 Jul 2016
Above are some picture from the art festival this past weekend. It was a really big celebration, and I loved seeing all of the different vendors!
This week I've been continuing my work with AJAX, which is not it's own language but rather refers to a style of web development that does not require a page to be fully reloaded in order to update certain elements of it. Today I managed to pull the variable I need from the database and assign it to a new variable in my JS. I have yet to run my JS model with the new variable but I plan to work this out tomorrow.
Finishing off the week
15 Jul 2016
An update on my JS problem:
I've worked out all of the errors in my code, it now successfully runs the model! Now I am working on tweaking it so that I can use data from the database to generate the suggested user choices instead of randomly generated numbers, which has lead me to my next obstacle: running a PHP script when my page loads.
Until now, we have been running our back-end PHP scripts using form actions in HTML. However, I need to run my new getComfort.php script when the page loads so that I can then use the data in my following JS. My preliminary research is suggesting that I may need to learn yet another language called AJAX to do this.
In other news, yesterday I "attended" a CRA-W sponsored webinar on graduate research and grad school application tips. There was a lot of useful information about what to put in my research proposal and how to taylor it to each school I am applying to.
This weekend there is an art festival in town, and Gloriane and I are about to head out to that. It sounds like a lot of fun!
My week so far
13 Jul 2016
Since we received our deadline for the study launch, Gloriane and I have been working hard to achieve our goal. Unfortunately, some of our challenges are not as easy to surmount as we would like.
After adapting the Java code (of the mathematic model) to yield data relevant to our research instead of the pure model it was originally generating, I began trying to "attach" it to the website we've been developing for the study. Originally, I tried exporting it as a runnable .jar file, uploading it to the server we are using, and running it as a Java applet. However, I tried many iterations of the code and nothing seemed to work, even when I could get it running on my desktop. Since I am relatively unfamiliar with Java applets, one of my theories as to why it isn't working is that it lacks a GUI (graphical user interface, essentially not a command line) element (which, to my knowledge, most applets have). Since the algorithm does not really lend itself to a visual representation, we don't see any benefits to incorporating a GUI.
In addition to the massive amounts of time I spent wrestling with the Java code, I have also been struggling with the structure of the database. I worked out the kinks in my PHP and successfully recorded data from the survey directly in the SQL database! However, I had a series of checkboxes on the survey which, when left unchecked, left null values for the corresponding dichotomous variables in the database. I could tell that I would have problems with data analysis later if I allowed those null values to persist. I tried everything I could to get the null checkboxes to update to the value '0', but I didn't have any luck. Through some testing and troubleshooting, I realized that SQL wasn't actually reading those values as null (although I don't know what it was reading them as), so I changed the condition for my UPDATE statement from "var IS NULL" to "var != 1" (1 was the value assigned to the checkbox variable if it had been checked), and it finally worked!
On the bright side, this weekend I started a spreadsheet of possible grad schools. I think I am going to apply to a total of about eight; three in the US and five in the UK! From my preliminary searches, funding for school overseas is not as scarce as I thought it would be. I am nervous because all of the UK schools require you to contact a potential advisor with a research proposal before you apply. Although I know I want to pursue an advanced degree in computer science, I know I want a mathematic element, but I'm not sure exactly what else I want. I'm hoping that after reading the papers of my potential advisors and checking out their current research, I will have a better idea of exactly what I want to do my doctoral research on.
I'm about to return to my JS problem, but hopefully I will be able to solve it (or at least make headway on it) by Friday!
Ready for the weekend
8 Jul 2016
On Tuesday we met with Dr. Squicciarini to define our research goals for the week. She set a goal for us to launch the study by Sunday, 24 July!
This week, I was working on reconstructing the Java code so that it can provide privacy setting recommendations for part of the study. I've managed to create a functional .jar file, but it can't take individual inputs at this point. However, I just got access to the server and database today, so I haven't uploaded it yet. I will need to incorporate it into the HTML as a Java applet, which I have no experience with.
My major accomplishment this week has been recoding the survey into HTML and my other work with the survey. We created a table in the database to record the responses for when we are testing the data collection functionality. Then, I wrote a back-end PHP script and added PHP to the survey HTML. I haven't fully error-checked it yet, but with a little tweaking, the PHP should be able to take the data we're collecting it and put it directly into the SQL database.
This weekend, Gloriane and I hope to check out the swimming options around here. We've been ready to take a dip in the water all week!
Diving back in
5 Jul 2016
This past week I was in Barcelona on a family vacation so I am just diving back into things today!
Gloriane has been working on the SNS interface. She also has been working on collecting responses to questions on the SNS directly into the SQL database that we are using. This is wonderful because it will allow us to streamline the processes of both data collection and analysis. It will also minimize errors in the data.
Using the work Gloriane has been doing, today I recoded the entrance survey I edited the week before last in HTML and uploaded that to the SNS.
Tomorrow I plan to create a testing table in the SQL database and do some test runs on the survey to perfect our data collection.
Top image: a tower in Barcelona's Gothic quarter
Bottom image: the fountain outside the National Art Museum of Catalunya
Summing up the week
17 Jun 2016
Yesterday, Gloriane and I went for a nice after-work walk at a local park. The rest of this update is going to be technically focused.
Today I was finally able to finish entering the data from the user study into SPSS. I was able to accurately match most of the surveys with the correct user. We want to examine how long it takes actual humans to reach a consensus on a privacy setting compared with the convergence times that the model shows, as well as look at the differences between first choice privacy settings versus final settings. In other words, how much will the average user compromise their desires for a privacy setting in order to reach a consensus with other users? Today I computed variables to help us with these two specific problems.
I am anxious to get started testing and modifying the Java code and hope that I will be able to focus on that in the upcoming week. I am familiar with the data structures that the model employs but am still nervous about being able to manipulate a relatively complex model.
Gloriane and I have been brainstorming techniques for improving the appearance of the user interface. One of the major factors that we think might be yielding inaccurate data from the user study is that the fake SNS (social networking site) is relatively unengaging unlike actual SNSs, so users don't feel compelled to care about their privacy settings because the "fakeness" of the study is emphasized. Therefore, users might be more willing to compromise than they would be in a real-world situation or might even choose a random privacy setting.
Have a great weekend!
Getting into the swing of things
15 Jun 2016
This past week I continued working on inputting data into SPSS so that we can do the analysis. We've run into a few roadblocks in getting me full access to the lab, so my work has been going slowly, but we're hoping that things will get fully sorted tomorrow.
We have big plans for the summer. We roughly outlined some goals for the project, including attaching the code of the algorithm to a revised user study so that we can test exactly how accurate the model will be when interacting with real users. I am in the process of reading through the Java code now so that I can attach it to the front end when we are ready.
The other girl in the DREU program at Penn State, Gloriane
, arrived over the weekend! It has been wonderful getting to know her and having a teammate to work with. Gloriane is currently working on developing a more welcoming user interface than the one used in the first study.
Yesterday, our mentor took us out to lunch in town! We talked a lot about grad school, especially about the possibility of going to grad school in Europe (which is something I am interested in). She is very knowledgable, and I can tell that she genuinely wants to help us learn about the research process and grad school applications. I am very focused on grad school right now because I will graduate next spring. I will need to submit my applications for schools as well as for scholarships and grants in the upcoming months.
The first week
6 Jun 2016
I arrived in State College, PA on Memorial Day. For the past week or so, I've been getting settled in my sublet and exploring the campus and town. I am pleasantly surprised by the public transportation system, CATA, it seems relatively easy to use and convenient :)
At this point, I am waiting on my PSU ID to come in so that I can start working in the lab.
This morning my mentor, Dr. Squicciarini, sent me the data from a study in which participants tried to agree on a privacy setting for a photo on a fake social networking site. I restructured the data and entered into SPSS so that we can begin to analyze it. I am about to enter the data from the survey participants took before the study which measured social network usage and attitudes towards online privacy. This will allow us to compare user attitudes and their performance in the study. We hope to use the analysis to confirm the accuracy of the mathematic model and to suggest revisions/additions to the study.
Overall, it looks like the summer is going to be really fun and challenging! I can't wait to dive into the project in further detail!