Data cleaning helps ensure your results are based on valid, high-quality responses. ReDem automates this process to help you detect and exclude unreliable data.ReDem’s data cleaning feature automates and streamlines this process, providing a standardized and transparent approach grounded in ReDem’s comprehensive evaluation framework.
Every respondent evaluated by ReDem undergoes a series of quality checks. These checks generate data points, classification labels and scores, which form the basis of the cleaning logic. When cleaning, you define what is acceptable or unacceptable by setting thresholds for these elements.OR Condition:
All cleaning options operate as OR conditions.
You must select at least one score as a cleaning condition - usually the ReDem Score, since it is a comprehensive metric covering all selected quality checks.
You may also add other scores in an OR condition if needed.
Example: If you set an R-Score threshold of 60, all respondents below 60 are flagged. If you also apply an OR condition for Time Score < 20, then even respondents with a valid R-Score (e.g., 70) are flagged if their Time Score is below 20 (speeding).
Data Points:
Data points indicate the number of measurements used to calculate a score.
Example: If you set the Open-Ended Score threshold to 40 with two data points, then any interview with at least two open-ended responses and an overall Open-Ended Score below 40 is excluded.
Default Settings:
To simplify the process, ReDem provides recommended default settings that work well for many projects. You can, however, adjust them to match the specific needs of your study. The next two sections first describe the default settings and then explain how to select your own.
Our default settings apply best-practice thresholds to the following metrics:
ReDem Score (R-Score): Respondents with an R-Score below 60 are excluded.
Open-Ended Score (OES) & Response Categories:
Respondents with an OES below 40 are excluded if they provide at least two open-ended responses.Respondents flagged in at least two open-ended responses for wrong language, bad language, AI generated Answer, nonsense or wrong topic are excluded.
Coherence Score (CHS): Respondents with a CHS below 30 are excluded.
Grid-Question Score (GQS): Respondents with a GQS below 20 and at least two valid grid-question responses are excluded.
Time Score (TS): Respondents with a TS below 20 are excluded.
Behavioral Analytics Score (BAS): Respondents with a BAS below 20 and at least two valid BAS data points are excluded.
You can define your own thresholds for each quality metric. This enables fine-tuned control over what qualifies as low-quality data based on your specific needs.Customizable elements include:
ReDem Score (R-Score): Threshold
Open-Ended Score (OES): Threshold + min. number of open-ended responses + category-based exclusion logic
Open-Ended Response Categories:
Generic Answer:
Bad Language:
No Information:
Duplicate Respondent:
Duplicate Answer:
Nonsense:
Wrong Language:
Wrong Topic:
AI Generated Answer:
Time Score (TS): Threshold + min. number of time data points
Grid-Question Score (GQS): Threshold + min. valid grid questions
Coherence Score (CHS): Threshold + min. number of coherence data points
Behavioral Analytics Score (BAS): Threshold + min. valid BAS data points + category-based exclusion logic
BAS Categories:
Unnatural Typing:
Copy and Paste:
Here’s an example of a cleaning settings object that can be used to customize the cleaning process via the API:
Reasons for exclusion are clearly documented and based on whether any of the active criteria are triggered. Only one exclusion condition needs to be met for a respondent to be removed.
ReDem Score below threshold
Open-Ended Score below threshold
Open-Ended categories exceed allowed count
Time Score below threshold
Grid-Question Score below threshold
Coherence Score below threshold
Behavioral Analytics Score below threshold
BAS categories (e.g. Copy-Paste, Unnatural Typing) exceed allowed count
This transparent system ensures your data cleaning is auditable, adjustable, and aligned with your quality requirements.