Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:create:functions:statistic [11.12.2014 21:31] alexander.ritteren:create:functions:statistic [05.09.2023 20:41] swissel.uni-mannheim
Line 1: Line 1:
 ====== statistic() ====== ====== statistic() ======
  
-''mixed **statistic**(string //statistic//, array|string //variables//, mixed //option//, [boolean //alldata//])''+''mixed **statistic**(string //statistics//, array|string //variables//, mixed //option//, [boolean //AllData//])''
  
-The function statistic() can determine specific univariate data from the data record (across all previous questionnaires).+The statistic() function can be used to obtain univariate characteristic values from the data set (across all previous interviews).
  
 +  * //Statistic//\ Which statistic should be determined?
 +    * ''%%'count'%%'' -- Count the frequency of the value given as ''//option//''.
 +    * ''%%'percent'%%'' -- Percentage of the value specified as ''//Option//''.
 +    * ''%%'frequencies'%%'' -- frequencies for all response codes in the dataset (as an array).
 +    * '''crosscount''' -- Count the frequency of co-occurrence of two values in two variables. The two variables are to be specified as an array (or separated by a comma), as are their values specified as ''//option//''.
 +    * '''mode''' -- most frequently occurring value.
 +    * '''min''' -- Smallest value.
 +    * '''max''' -- Largest value.
 +    * '''mean''' -- Arithmetic mean of the values.
 +    * '''groupmean''' -- Arithmetic mean of the values of a subgroup defined by ''//Option//'', specified as a string consisting of variable name and code for the cases to be counted '''AB01=2'''.
 +    * '''filter''' -- Specifies which cases to use in further calls to the function ''statistic()'' (see [[#teildatensaetze_auswerten|bottom]] for details).
  
-  * //statistic//\\ Which statistic should be calculated? +  * //Variables//\\ Specifies for which variable(s) the statistic is to be calculated. The identifiers of the individual variables can be found in the **Variables overview**. If the statistic requires several variables, these can be specified either as a comma-separated string or as an array. 
-    * '''count''' -- counts the frequency of the value specified as ''//option//''+  * //Option//\ Some statistics require or allow a third specification, which is given with this parameter (see below). 
-    * '''percent''' -- percentage of the value specified as ''//option//''+  * //AllData//\\ This optional specification determines that not only the completed interviews but all interviews are included in the statistics.
-    * '''crosscount''' -- counts the frequency of the joint occurrence of two values in two variables. The two variables should be specified as an array (or separated with a comma), as well as their values that are specified as ''//option//''+
-    * '''mode''' -- most commonly occurring value.  +
-    * '''min''' -- lowest value. +
-    * '''max''' -- highest value. +
-    * '''mean''' -- arithmetic mean of the values. +
-  * //variables//\\ Determines which variable(s) the statistic should be calculated for. The IDs of the individual variables can be found in the **Variables Overview**. If the statistic requires multiple variables, these can be given as a comma-separated string or as an array. +
-  * //option//\\ Some statistics call for or allow a third entry which is set with this parameter (see below). +
-  * //alldata//\\ This entry is optional and determines that all questionnaires be entered into the statistics; not just those that have been completed+
  
-**Note:** If ''true'' is not explicitly specified for the parameter //alldata//, only completed questionnaires are included when calculating the statistical values.+===== Notes =====
  
-**Note:** Test data collected during the developing of the questionnaire and pretesting is only included if the current questionnaire is a part of the test as well. If the questionnaire is being carried out as part of the regular data collection, ''statistic()'' only counts data from the regular data collection.+**Important:** Only completed interviews are included in the calculation of statistical values if ''true'' is not explicitly specified for the parameter //allData//.
  
 +**Important:** Test data from questionnaire development and pretest are only counted if the current interview is also part of the test. If the interview is conducted as part of the regular data collection, ''statistic()'' only counts data from the regular data collection.
  
-===== Frequency Count =====+**Note:** The data from the current interview is not taken into account by ''statistic()''.
  
-When counting the frequency (''count''),third argument can be specified: which value the frequency should be determined for. If a third value is not giventhe number of valid responses is outputMissing data is not counted+**Note:** The use of ''statistic()'' may be inefficient. If the questionnaire has to search the whole dataset several times in several ''statistic()'' calls, a warning will be displayed first. If there are more than 10 computationally intensive calls''statistic()'' will no longer return resultsUse ''statistic('load', ...)'' to load the data in advance to avoid this problem.
  
-For example, in the questionnaire there is a question where the respondent selects their gender (1=female, 2=male, -9=no input). The number of women who entered the third value ''1'' can be determined like so+**Tip:** The function ''statistic()'' can be used to close the questionnaire after reaching a predefined quota ([[:en:survey:quota]]) and either display a message to further participants or redirect them to the quota stop link of a panel provider.
  
-<code php> +**Tip:** If you do not want to count all completed interviews (e.g. if dropouts were redirected to another page using ''[[:en:create:functions:redirect]]''), it makes sense to copy the variable to be counted further back in the questionnaire into an [[:en:create:questions:internal]].
-$numberwomen = statistic('count', 'SD01', 1);  // frequency of women (1) +
-$numbermen = statistic('count''SD01', 2); // frequency of men (2) +
-$numbercompleted = statistic('count''SD01');    // number of valid data  +
-$numberall = statistic('count', 'SD01', false, true); // all data records +
-html(' +
-  <p>So far,'.$numberall.' people +
-  specified their gender in this survey, but the questionnaire was +
-  only completed in '.$numbercompleted.' cases.</p> +
-  <p>The questionnaires completed are made up of '. +
-  $numberwomen.' women and '. +
-  $numbermen.' men.</p> +
-'); +
-question('SD01');  // question about the respondent's gender +
-</code>+
  
  
-===== Multivariate Frequency =====+===== Frequency Count I =====
  
-The '''crosscount''' statistic counts the cases (like in cross-tabulationsin which multiple variables apply+As a third argument in a frequency count ('''count'''), you can specify for which value you want to determine the frequency. If you do not specify a third value, the number of valid answers is output. Missing data are not counted.
  
-Instead of a single variabletwo or more variables are specified as an array or separated with comma ('',''). The values being counted for each variable are specified as the third parameter //option//. Only cases which have specified the first value for the first variable, the second value for the second variable and so on are counted. +For exampleif you have selection for the gender (1=female, 2=male-9=not specified), you can determine the number of women by specifying the third value ''1'':
  
 <code php> <code php>
-$nYoungFemale = statistic('crosscount', 'SD01,SD02', '2,1');  // variables and values in a list with commas ... +$countWomen = statistic('count', 'SD01', 1); // frequency women (1) 
-$nGrownFemale = statistic('crosscount', array('SD01','SD02')array(2,2));  // ... or in arrays+$countMen = statistic('count', 'SD01', 2); // frequency men (2) 
 +$countDone = statistic('count', 'SD01'); // number of valid dates  
 +$countAll = statistic('count', 'SD01', falsetrue); // All records 
 html(' html('
-  <p>So far'.$nYoungFemale.' people have stated in this survey  +  <p>So far in this survey '.$countAll.' Persons 
-  that they are female and in age group 1 (up to 18 years old)+  have provided information about their gender, but the 
-  '.$nGrownFemale.' women stated they were older than 19 years old.</p>+  interview was completed only in '.$numberOf. Cases.</p> 
 +  <p>The completed interviews include '. 
 +  $countWomen.' Women and '. 
 +  $anzahlMaenner.' Männer.</p>
 '); ');
-question('SD01');  // question about the respondent's gender +question('SD01'); // Frage nach dem eigenen Geschlecht
-question('SD02');  // question about the respondent's age+
 </code> </code>
  
  
-===== Valid Percent =====+===== Frequency Count II =====
  
-The output is the percentage of a value within all valid data. The value to be counted must be given as the third argument+The ''%%'frequencies'%%'' statistic returns all possible values with one call. 
 + 
 +**Note:** Note that the array only contains entries for the response codes whose responses are present at least once in the data setTherefore, check whether the array key is present. This is possible, for example, with the ''??'' operator.
  
 <code php> <code php>
-$numberwomen = statistic('percent', 'SD01', 1); // percentage of women+$freq = statistic('frequencies', 'SD01'); // frequencies 
 +$numberWomen = ($freq[1] ?? 0); 
 +$numberMen = ($freq[2] ?? 0);
 html(' html('
-  <p>So far, '. +  <p>The completed interviews include '. 
-  $numberwomen.' women have taken part in this survey.</p>+  $countWomen.' Women and '. 
 +  $anzahlMaenner.' Männer.</p>
 '); ');
-question('SD01');  // question about the respondent's gender+question('SD01'); // Frage nach dem eigenen Geschlecht
 </code> </code>
  
  
-===== Mode: Value that Occurs Most Frequently ===== 
- 
-This returns the value that has been selected most frequently so far. If multiple values have been selected equally often then these are returned separated by a comma.  
- 
-As a third argument (in this instance a Boolean), it is possible to specify if invalid values (no answer etc.) should also be counted. 
- 
-<code php> 
-$mode = statistic('mode', 'AB01_02', true); 
-$modes = explode(',', $mode);  // separate multiple values 
-if (count($modes) > 1) { 
-  // multiple values stated most frequently 
-  html(' 
-    <p>Multiple answers were selected equally often.</p> 
- '); 
-} else { 
-  // answer options text (statistic() only provides the numeric code) 
-  $text = getValuetext('AB01_02', $mode); 
-  html(' 
-    <p>The most common answer for this question was: '.$text.'.</p> 
-  '); 
-} 
-</code> 
- 
- 
-===== Min, Max and Mean of the Valid Data ===== 
- 
-The statistics '''min''', '''mean''' und '''max''' only calculate a correct value if numerical values exist for the question. Data in a text input is ignored if it is not a number -- unless is it is specified that invalid values should also be entered into the statistics (''true'') as the third parameter.  
- 
-If no valid values are available, 0 is returned as the '''mean'', and the value ''false'' as the ''min'' and ''max'' 
- 
-<code php> 
-$min = statistic('min', 'BB01_03'); 
-$max = statistic('max', 'BB01_03'); 
-$mean = statistic('mean', 'BB01_03'); 
-html(' 
-  <p>The participant has given the programme 
-  an average rating of '.$mean.' so far.</p> 
-  <p>The ratings lie between '.$min.' und '.$max.'.</p> 
-'); 
-</code> 
en/create/functions/statistic.txt · Last modified: 03.05.2024 09:21 by jdupont
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Driven by DokuWiki