====== Coding for content analysis ====== In a quantitative content analysis, coders have to classify several different units (e.g. newspaper articles or social media posts) using the same categories. This can be done classically in an Excel spreadsheet ... but you can also use SoSci Survey as a form tool for coding. SoSci Survey has some practical advantages: * Coders can choose in selection questions and do not have to constantly look up/memorize the codes. * Instructions can be added directly to the individual categories (questions). * It is possible to use a [[:en:create:filters|Filter]] in order to skip unnecessary categories. * Coding for subordinate units (e.g. detailed coding of actors) is easier to map ([[:en:create:multilevel]]). * The content to be encoded can be displayed if it is already available in digital form. * The assignment of units/contents to coders can be randomized and/or automated. **Tip:** For categories with a large number of characteristics, use drop-down selection questions or [[:en:create:questions:suggest]], so that coders can enter the classification efficiently. ===== Application case===== This guide describes the use case in which a small number of coders are to code a large number of posts from a social media platform. The focus here is on the automated delivery of the posts to the coders. **Tip:** For coding with a large number of coders (crowd working e.g. via MTurk), you would rather use a [[:en:create:questions:random]] to select the unit. In the use case mentioned above, the procedure is as follows: * The content to be coded is stored in SoSci Survey. * Lists are stored which content is to be coded by which coder. * When the questionnaire is called up, it searches for the next content for the coder. * When a content has been coded, the questionnaire is called up again. ===== Database ===== The data for the coding is stored in SoSci Survey as follows. In addition to the question type [[:en:create:questions:internal]], the function [[:en:create:databank]] is used here, which makes structured content (the posts and the assignment of coders to posts) available in the background. ==== Organizational data ==== A question of the type [[:en:create:questions:internal]] provides four variables for the organizational information in the data set, specifically the following variables. In the following example, the question uses the identifier CX01. * Coder (CX01_01) * Consecutive number in the coding (CX01_02) * Post ID (CX01_03) * Post content (CX01_04) In the first variable, the //Type of Content// is set to “Numeric codes” and the names of the coders are entered as codes. 1 = Anna Alpha 2 = Bernhard Beta 3 = Christine Gamma 4 = Daniel Delta The fourth variable is optional; in this example, it stores the content of the social media post. This can simplify the evaluation slightly, especially when checking the correct coding. ==== Contents to be coded ==== In this use case, the social media posts are pure text content (images or other media files would also be possible based on file names). During the scraping process, the posts were assigned a unique ID, which should also appear in the data record. A ''P'' is placed in front of this ID to store the content. The post with the ID 1 is therefore given the database key ''P1''. This prefix is necessary to distinguish it from other categories of entries (see below) in the content database. **Tip:** The list is prepared using Excel, the prefix can be added here using the ''CHAIN()'' function, for example. The Excel table for the post content would look like this. {{:de:create:scr.content-analysis.posts.png?nolink|Excel-Datei mit zu codierenden Inhalten}} The first column contains (only) the database keys for the [[:en:create:databank]]. The second column contains the texts of the posts that will later be displayed to the coders. Further information (e.g. the file name of an image) could be added in other columns. The XLSX file is read in in SoSci Survey under **Special Features** -> **Database for Contents** with //Read file//. {{:de:create:scr.content-analysis.database-import.png?1000|Excel-Datei importieren}} The first column is selected as the database key. {{:de:create:scr.content-analysis.posts-db.png?nolink|Auszug aus der Datenbank für Inhalte}} **Tip:** In this example, the posts have numerical, sequential IDs. However, alphanumeric, non-sequential IDs can also be used without any problems. ==== Assignment of posts to coders ==== In the application example, the posts are to be distributed among the coders in such a way that each post is coded by two people. Another Excel table was created for this purpose. The table contains two large blocks one below the other, in which each post ID occurs once (because each post is to be coded twice). The IDs of the coders were then added to the second column. In the first block, the IDs were entered in regular sequence and the block was then copied to all other lines. {{:de:create:scr.content-analysis.assignment-1.png?nolink|Excel-Tabelle mit Zuordnung Codierer:innen zu Posts}} In the second block, the IDs were entered in different sequences (2,3,4,1,3,4,1,2) and copied so that different combinations of two coders each take care of one post across the entire table. The entire table was then randomly shuffled (an additional column with the content ''=RANDOM()'' is helpful for this), then sorted according to the column with the //Coder ID//, and then consecutive numbers for the posts of each coder were entered in a third column //Step//. {{:de:create:scr.content-analysis.assignment-2.png?nolink|Zuordnung von Posts und Codierer:innen}} A database key is also created in a fourth column “Key”. This is made up of the prefix “C”, the ID of the encoder, a separator (in this case a hyphen) and the consecutive number from the third column. This XLSX file is also imported into the database for content, with the “Key” column again serving as the database key. {{:de:create:scr.content-analysis.assignment-db.png?nolink|Zuordnung in der Datenbank für Inhalte}} ==== Error messages ==== This means that all information is stored in SoSci Survey to display the posts one after the other for coding. In addition, 3 texts are created in the **List of Questions**, which later serve as error messages in the questionnaire. The option “Warning” is selected as //Display// in all texts. * CX02 -- Content: “Invalid call” (is displayed if no encoder number was transferred in the link). * CX03 -- Content: "There is no entry in the system for the code no. %num%." (is displayed if there is no further mail item in the table with the assignment or if there is an error in the consecutive numbering). * CX04 -- Content: "The posting with ID %post% is missing in the system." (displayed if a post cannot be found in the content database). ===== Questionnaire ===== A questionnaire is now created under **Compose questionnaire**, which initially contains a lot of PHP code on page 1 ([[:en:create:php]]). // Importing the encoder ID from the URL $coder = (int)reference(); if (!$coder) { // Display error message if no encoder ID was transferred text('CX02'); buttonHide(); pageStop(); } put('CX01_01', $coder); // Read consecutive number $key = 'A'.$coder; $info = dbGet($key); if (!$info) { $num = 0; } else { $num = (int)$info[0]; } // We now code the next (consecutive) number put('CX01_02', $num + 1); // Determine post ID from the Encoder/Posts table $key = 'C'.$coder.'-'.value('CX01_02'); $info = dbGet($key); if (!$info) { show('CX03', ['%num%' => $key]); buttonHide(); pageStop(); } $postID = $info[2]; put('CX01_03', $postID); // Retrieve post (content) $info = dbGet('P'.$postID); if (!$info) { show('CX04', ['%post%' => $postID]); buttonHide(); pageStop(); } $html = $info[0]; put('CX01_04', $html); This code does the following: * First, the ID of the coder is read from the URL, more on this in a moment. * Then it looks in the database for content to see if there is an entry with the prefix “A” and the coder ID, e.g. “A1” for coder 1. If necessary, the entry is created. It saves the post up to which an encoder has already coded. The number corresponds to the consecutive number from the second table (Step). * The database key with the C prefix is then created from the ID of the coder and the next consecutive number, and the corresponding line (e.g. “C1-1”) is called up in the database for content. In this line it is noted (with the second table above) which post the coder should process. In the example, “C1-1” would contain the information that post item 13 is to be coded (see second table). * Finally, the content database is checked to see whether a post with the corresponding ID can be found there. The ID is again preceded by the “P”, ''dbGet()'' therefore searches for an entry with the key “P13”. For the sake of simplicity, the questionnaire saves the text of the post in an internal variable in the data set. This allows the content to be displayed on this and subsequent pages with the following PHP code: // Show posting html( '
'. htmlspecialchars(value('CX01_04')). '
' );
Below this PHP code, only the questions that are used for coding need to be added to the questionnaire. After the pages with the questions (before the “last page”), another page is inserted which only contains the following PHP code. This page is used to automatically call up the next posting for coding once coding is complete. // Save incremented consecutive number $coder = value('CX02_01'); $num = value('CX02_02'); $key = 'A'.$coder; dbSet($key, $num); // Redirect to the next coding redirect('?r='.$coder); ===== Personalized links ===== Now the coders still need individual links that the questionnaire uses to recognize which coder wants to code something. To do this, a reference is appended to the link to the questionnaire. For example, if the normal link to the questionnaire is ''%%www.soscisurvey.de/coding/%%'', the coders would receive the following links ([[:en:survey:url]]). * Coder 1: ''%%www.soscisurvey.de/coding/?r=1%%'' * Coder 2: ''%%www.soscisurvey.de/coding/?r=2%%'' * Coder 3: ''%%www.soscisurvey.de/coding/?r=3%%'' * Coder 4: ''%%www.soscisurvey.de/coding/?r=4%%'' Test the function of the questionnaire with these links. If everything works correctly, then delete the entries for the keys “A1” to “A4” (the counters for the coding) under **Special Features** -> **Database for Contents** with //Delete Entries// so that the links mentioned above start with the first post again. {{:de:create:scr.content-analysis.database-delete.png?nolink|Einzelne Einträge aus der Datenbank für Inhalte löschen}}