Representing Sets

Study on representing sets

We tested 4 different representations for multiple sets using Amazon Mechanical Turk using a between subjects methodology. These 4 links take you to the study completed by the participants. Participants took one of these 4 tests, and could only take one test. Participants who attempted to restart after starting a test but had not finished were not allowed to do so. For each different type the questions are the same, except for some variation in the text of training questions.

Here the questions are presented in the same order each time to allow for comparison between the diagram types and with the correct answers. In the actual study, the questions were presented in a random order (except the 5 training questions) as explained below. The pages below create email with the chosen answers, in the study these answers were sent to Mechanical Turk in a url query string.

The first three questions after the training questions and first spam check are dummy questions, whose data is not analysed. These were to accustom the participants to the question style. The order of these 3 questions was randomised. Every 5 questions there is a spam test question. The text of these requires the participant to click on the image rather than the radio button and "Next page" button. The intention is to ensure the participant reads the questions rather than clicks as quickly as possible through them. The order of these spam test questions was randomised. The 18 data generating questions were presented on the same set of pages each time, but the assignment of question number to page was randomised.

The correct answers are:

Page	Question	Answer
Page 2	Training 1	yes
Page 3	Training 2	no
Page 4	Training 3	no
Page 5	Training 4	yes
Page 6	Training 5	yes
Page 7	Spam 1	click on region
Page 8	Dummy 1	no
Page 9	Dummy 2	no
Page 10	Dummy 3	no
Page 11	Question 1	yes
Page 12	Question 2	no
Page 13	Spam 2	click on region
Page 14	Question 3	no
Page 15	Question 4	no
Page 16	Question 5	no
Page 17	Question 6	yes
Page 18	Question 7	yes
Page 19	Spam 3	click on region
Page 20	Question 8	no
Page 21	Question 9	yes
Page 22	Question 10	no
Page 23	Question 11	no
Page 24	Question 12	no
Page 25	Spam 4	click on region
Page 26	Question 13	yes
Page 27	Question 14	no
Page 28	Question 15	no
Page 29	Question 16	no
Page 30	Question 17	yes
Page 31	Question 18	no

Anonymised study data in csv format. The time column records the time spent on the question page in seconds. The error column gives a 1 if the question was incorrect and 0 if the question was correct. The size column gives the number of sets in the diagram which the question related to. 440 participants took the test, 16 were considered to be spammers, based on failing to click on the image (anywhere on the image) for more than one of the spam questions and were not rewarded through Mechanical Turk. The data here contains the 424 non-spammer participants.

The 3 set data (Questions 1-6) was derived by the investigators. This is because there are few 3 set combinations that mean the Shaded, Unshaded and Venn diagrams are all different. The rest of the data was adapted from the following real world diagrams:

Investigators:
Peter Chapman, University of Brighton
Gem Stapleton, University of Brighton
Peter Rodgers, University of Kent
Luana Micallef, University of Kent
Andrew Blake, University of Brighton

The code used in this study is modified from that of INRIA, Paris, http://www.aviz.fr/bayes. The paper describing their study is:
Micallef, L.; Dragicevic, P.; Fekete, J.Assessing the Effect of Visualizations on Bayesian Reasoning through Crowdsourcing. Visualization and Computer Graphics, IEEE Transactions on , vol.18, no.12, pp.2536,2545, Dec. 2012,