Introduction to Statistical Methods: Assignment

Assignment due date: 26 Jan 2022

Introduction

A research team runs a variation of the experiment of Brumbly and Zhuang (2015) to replicate their finding that visual grouping in randomly organized menus hinders user performance. In addition, the researchers are interested in investigating whether visual grouping can help users memorize a menu.

The research team tests menus of 12 items and compares two different types:

VISUAL GROUPING. Menu items are visually split into three groups by using two gray lines. Each group contains four menu items.
NO GROUPING. No visual grouping is used.

The research team recruits 36 participants. Each participant performs 30 search trials for each menu type, that is, 60 trials in total.

At the beginning of each trial, the participant is shown a target word, e.g., “wolf”, and is asked to locate it in a drop-down menu. The menu is initially hidden. The trial starts when the participant opens the menu by clicking a button on the screen. The trial ends when the participant clicks the correct item in the menu. The researchers measure the time (in ms) from the first click that opens the menu to the last click that selects the correct target. The participant is asked to complete the trials as fast as possible, avoiding errors.

Participants are divided into two independent groups, containing 18 participants each. The first group of participants is exposed to a NO MEMORIZATION condition, where menu items are different between trials. The second group of participants is exposed to a MEMORIZATION condition. For this condition, menu items remain the same (and keep the same order) for all 30 trials of the same type of menus that the participant tests. Menu items are randomly selected from a database of 560 unique words (e.g., lion, jacket, hurricane, actress), and depending on the condition, they are updated either for each trial (under NO MEMORIZATION) or for each type of menu (under MEMORIZATION).

To reduce bias due to ordering effects, the researchers counterbalanace the order of presentation of the two types of menus. More specifically, half of the participants of each group start with the VISUAL GROUPING menus, while the other half start with the NO GROUPING menus.

Hypotheses

The researchers make the following three hypotheses:

H1. Menu selection is faster under MEMORIZATION than under NO MEMORIZATION.

H2. Under NO MEMORIZATION, VISUAL GROUPING results in slower menu selection than NO GROUPING. This hypothesis is based on the results of Brumbly and Zhuang (2015).

H3. Under MEMORIZATION, VISUAL GROUPING results in faster menu selection than NO GROUPING. In particular, the researchers suspect that visual grouping can help users memorize the structure of menus, boosting their performance.

Data analysis

Conduct your analysis through the following steps.

Step 1: Data collection

Suppose the researchers have completed the experiment and have collected the results. Here, we simulate the data generation process by randomly sampling from some fixed populations. To create your data file, you will use this RScript. You must simply run the script and then use the produced dataset.cvs file as your dataset. Notice that each of you will generate a different dataset, so the conclusions of your analyses might be different.

The file should contain five columns with the following information: (i) the participant number, (ii) the condition (memorization vs. no memorization) (iii) the type of menu (visual grouping vs. no grouping), (iv) the number of trial, and (v) the observed selection time (in ms).

Step 2: Descriptive statistics

Write an R script to calculate descriptive statistics for your experimental data. The descriptive statistics will include means, medians, and standard deviations of the selection time for each memorization condition and each type of menu. Then, calculate the same descriptive statistics for the three time differences of interest (see hypotheses H1, H2, and H3).

Step 3: Plots

Produce box-plot diagrams to graphically summarize your observed distributions (i) for the two memorization conditions and (ii) for the two types of menus. Also produce Q-Q plots to visually assess the extent to which your samples deviate from normality. What do you observe?

Step 4: Confidence intervals

Assume populations with normal distributions. Write an R script that estimates the mean selection time for each memorization condition, as well as their mean time difference. Use 95% confidence intervals to estimate those means. The R script should include code that calculates and also graphically plots the confidence intervals. What are your conclusions? Do results support the first hypothesis (H1)?

Then, write an script that estimates the mean selection time for each type of menu and their mean time difference for the NO MEMORIZATION condition. Again, use 95% confidence intervals to estimate those means and include code that calculates and also graphically plots the confidence intervals. What are your conclusions? Do results support the second hypothesis (H2)?

Repeat the previous step for the MEMORIZATION condition. What are your conclusions now? Do results support the third hypothesis (H3)?

Step 5: Significance tests

Write an R script that conducts the appropriate significance tests to test the three hypotheses (H1, H2, and H3). What are your conclusions? Are the results of the significance tests consistent with the confidence intervals that you constructed for Step 4?

Step 6: Statistical power

Based on their pilot tests, the researchers expect that the mean difference in time between the two types of menus can be as low as 130 ms for the NO MEMORIZATION condition. They also estimate the standard deviation of this difference (between participants) to be approximately 300 ms. Write an R script that estimates the power of the above experimental design to detect such a difference.

Note: To simplify the analysis, ignore within-participant variance and the number of trials performed by each participant (despite the fact that, in reality, they also affect power).

Step 7: Correlation

Write an R script that tests whether user performance improves over time (i.e., as participants perform more trials) for each individual condition (MEMORIZATION and NO MEMORIZATION). Use correlation measures for your analysis. What do you observe? For the MEMORIZATION condition, also create a chart that shows how user performance evolves over time.

Step 8: Log-normal distributions (optional, for PhD students)

The researchers suspect that the distribution of time selection can be better modeled with log-normal distributions. Can you conduct the analyses for Steps 4 and 5 by using log-normal distributions? Your statistics of interest are now medians. You also need to compare ratios rather than differences.

Report

Write an R Markdown page to describe your solutions. This page will include (i) your R scripts, (ii) their outputs, (iii) sufficient text to explain your steps, and (iv) your conclusions.

TIPS: Make sure that you appropriately aggregate your data before each data analysis. Do not mix independent and dependent measurements together and use the correct methods to conduct significance tests and construct confidence intervals.

WHAT TO SUBMIT: The data file that you generated and used (dataset.csv), the R Markdown file (.Rmd), and the HTML page generated from your R Markdown code (.html).

NOTE: You are encouraged to discuss the problems and their solutions with your colleagues and with your instructor (e.g., on Slack). However, your final solutions and report are personal.