====== Additional Variables in the Data Set ====== There are additional variables before (left) and after (right) of your question's variables. This chapter will shortly describe their meanings. **Note:** Some variables must explicitly be enabled before starting the download. **Note:** For privacy reasons, recording data from the user's browser (browser, referer, IP address, etc.) needs to be to activated __before__ collecting data. ===== Interview Identification ===== * **CASE** Unique number for the interview. These numbers are provided in in order of interview beginnings (also see PHP function ''[[:en:create:functions:casenumber|caseNumber()]]'').\\ **Note:** A new number is provided each time someone retrieves the survey. If the person does not click //next// or immediately retrieves the survey another time, this results in a void data case. By default such data cases are deleted. Further, numbers are provided when testing the questionnaire. Therefore case numbers may not start with one (1) and may not be consecutive (e.g., 123, 125, 130, 131, 132, ...).\\ **Note:​** Testing the questionnaire during development also creates case numbers. Therefore, the //CASE// Will usually not start with 1. To ensure unique case numbers within a survey, the counter cannot be reset. Actually, this would not have any advantages for analysis, anyway. * **SERIAL** If the survey was started using a personalized link (authentification code), the participant's code is enlisted here (also see PHP function ''[[:en:create:functions:caseserial|caseSerial()]]''). * **REF** If the questionnaire link contained a reference ([[:en:survey:url|The Questionnaire's URL]]), the reference's text will be stored there (also see PHP function ''[[:en:create:functions:reference|reference()]]''). * **QUESTNNR** The ID of the questionnaire handled. This ID is set when assembling the questionnaire. A value "del:" means that the questionnaires used to collect the data has been deleted. * **MODE** Tells something about how the interview was started: * "interview" means that someone visited the survey URL * "pretest" flags cases from the pretest (with question IDs visible and feedback option) * "orgtest" flags cases from the pretest using the final layout * "admin" means that the survey has been started by the project administrator as preview ({{:button.run.gif?nolink|Starten}}) * "debug" flags cases started by the project administrator in debug mode ({{:button.debug.gif?nolink|Im Debug-Modus starten}}) * **LANGUAGE** Language of the interview. This variable is included in multi-language surveys **or** if the option //Download variables which have not been used in the questionnaire// was selected. Should the interview language change during the interview, this is the language used at last. * **STARTED** Time when the participant started the interview. * **MAILSENT** In interviews, which were started by a personalized mailing URL, the time of sending the mail is stored here. But only if the address entry uses the [[:en:survey:mailing#data protection mode|data protection mode]] "personalized". Otherwise, it may be possible to use the function ''[[:en:create:functions:paneldata]]''. ===== Interview Progress ===== These variables are placed at the data set's end. * **LASTDATA** Time when the participant most recently clicked the "next" button and, thereby, updated the data case. The interval between STARTED and LASTDATA may deviate from the sum of handling times as it fully comprises the webserver processing times. * **FINISHED** Did the participant reach the goodbye page (1) or not (0). * **LASTPAGE** The page most recently answered (and sent via //Next//) by the participant. The number is equivalent to the page number in the questionnaire ([[:en:create:questionnaire|Questionnaire Assembling]]). * **MAXPAGE** The greatest number of any page answered by the participant. This number is usually identical to //LASTPAGE// but wont be reduced (a) if the participant uses the back button (e.g., to check the welcome page for contact details after doing the questionnaire) and (b) if the questionnaire uses backlinks via ''goToPage()''. ===== Completion Times ===== * **TIMEnnn** The variables ''TIME001'' etc. store how long (in seconds) a participant has spent on a page in the questionnaire. The time from loading the page to submitting it by clicking "Next" is specified. * These times are only imprecise, as they also include loading times. The typical inaccuracy is in the range of 1-2 seconds, but can be higher in individual cases (e.g. with an unstable internet connection). More precise measurements are possible using [[:en:create:javascript|JavaScript]]: [[:en:create:javascript:latencytimer]] * The number in the name of the variable refers to the page number when "compiling the questionnaire", ''TIME007'' always refers to page 07 - regardless of where the page was displayed during the interview (e.g. because pages were skipped or presented in a different order due to rotation). * If a participant sees a page several times (e.g. by using the back button in the questionnaire), the times are added together. * If the ''loopPage()'' or ''loopToPage()'' functions are used, then the variable indicates the cumulative dwell time for all repetitions of the page. * If a respondent leaves the browser window with page 5 of the questionnaire open overnight and fills in page 5 the next day and then clicks on "Next", a dwell time >20,000 seconds (several hours) may be shown in ''TIME005''. * If several pages of the questionnaire are displayed at the same time (e.g. because one page shows no content or because ''goToPage()'' is used), the dwell time is saved for the first page that shows content. Example: Page 8 only contains a PHP code with ''setPageOrder()'', followed by page 9 with a question and a ''goToPage()'' to page 10, where a question is also shown, then the participant sees the questionnaire pages 8+9+10 all together on one page (i.e. 2 questions one below the other) and the dwell time is saved in ''TIME009''. Translated with DeepL.com (free version) * **TIME_SUM** The sum of dwell times (in seconds) after correction for breaks. If the participant suspends the interview and returns later, this seems like he or she stayed on the page for a long time (hours or even days). Such times are replaced by the other participants' page median. Dwell times are identified as break if * they are longer than 2 hours or * they exceed the page's dwell time media by more than 3 inter quartile ranges (IQR) divided by 1.34 (equals more than 3 standard deviations in a normally distributed sample) * **TIME_RSI** An index that indicates how much faster a participant has completed the questionnaire than the typical participant (median) has done. Values above 1 identify faster respondents, values below 1 slower respondents. Details see below. **Note:** The parameters ''TIME_SUM'' and ''TIME_RSI'' only contain a value if the downloaded data set contains at least 10 records for the respective questionnaire ([[:en:results:troubleshooting#selection_criteria_filter]]). The more records the download contains, the more accurate the values for ''TIME_SUM'' and ''TIME_RSI'' will be, because the distribution of response times in the sample is used to clean outliers or to normalize them. **Note:** The response times are only included in the data set if the option to download the dwell times has been checked the //variables// selection of the download options. This option is checked by default. **Note:** Processing times are recorded automatically. To deactivate the recording, please uncheck the option in **Survey Project** → **Project Settings** → tab //Privacy// → //record time and duration during the survey//. ===== Quality Indicators ===== Data quality in online surveys is usually quite good. Data cleaning, however, is necessary in mostly every survey. When using the option //Variables selection// -> //Download data quality parameters// SoSci Survey provides variables to support data cleaning: * **MISSING** The percentage of answers omitted by the participant (0 to 100). Only such questions and items are counted that have been shown to the participant -- therefore someone dropping out early may have answered all questions (to this page, 0% missing). This variable is useful to identify participants that just viewed the questionnaired. * Please note that no click in a checkbox question (multiple selection) is a valid answer. Therefore even void cases may not reach 100%. * "Don't know" options are counted as valid answers as well. * When using text inputs, an invalid answer is counted, if the respondent types nothign (or spaces, only). Please remember this, when optionally asking for texts (e.g., when the respondent may leave the text field empty instead of writing a zero). * When using [[:en:create:selection-textinput|Free text inputs within a selection]] (single or multiple choice selection), a option's void text input (e.g., "Other: %%___%%") is only counted as invalid data, if the appropriate option in the selection was selected. * **MISSREL** Percentage of missing answers weighted by the other participants answering behavior. Questions that are rarely answered (e.g., voluntary text questions) are mostly irrelevant for this value, questions that most participants have answered weight worse. The linear weighting factor for a question/item is the number of answers given to this question/item divided by how often the question/item has been asked.\\ **Note:** This value may vary, depeding on the subset of data retreived. * **TIME_RSI** This parameter is documented in detail in the article [[https://ojs.ub.uni-konstanz.de/​srm/​article/view/7403|Too Fast, too Straight, too Weird]] (named "relative speed index"). Records with a value in the range of 2.0 and above should be considered critically. However, knowledge questions that the participant may need to research may distort the value (participants with good prior knowledge are faster). * **Q_VIEWER** In the questionnaire you can enable the option "Option to view the questionnaire without answering mandatory questions". When the function is active, this variable indicates whether a participant has checked the corresponding box ("I only want to view the questionnaire"). The variables //LASTPAGE// and //FINISHED// can be used to determine whether a questionnaire has been completed in full (see above). The proportion of missing information (//MISSREL//) is a valuable indicator of the diligence of the participant or for data sets that originate from "just having a look". Although the time invested in completing the questionnaire is not a direct indicator of data quality, very low response times (low //TIME_SUM// and high //TIME_RSI//) indicate that the questions were not even read.