Why data cleaning matters?
Data cleaning helps ensure your results are based on valid, high-quality responses. ReDem automates this process to help you detect and exclude unreliable data. ReDem’s data cleaning feature automates and streamlines this process, providing a standardized and transparent approach grounded in ReDem’s comprehensive evaluation framework.How Cleaning Works
Every respondent evaluated by ReDem undergoes a series of quality checks. These checks generate data points, classification labels and scores, which form the basis of the cleaning logic. When cleaning, you define what is acceptable or unacceptable by setting thresholds for these elements. OR Condition:- All cleaning options operate as OR conditions.
- You must select at least one score as a cleaning condition - usually the ReDem Score, since it is a comprehensive metric covering all selected quality checks.
- You may also add other scores in an OR condition if needed.
Example:
If you set an R-Score threshold of 60, all respondents below 60 are flagged. If you also apply an OR condition for Time Score < 20, then even respondents with a valid R-Score (e.g., 70) are flagged if their Time Score is below 20 (speeding).
If you set an R-Score threshold of 60, all respondents below 60 are flagged. If you also apply an OR condition for Time Score < 20, then even respondents with a valid R-Score (e.g., 70) are flagged if their Time Score is below 20 (speeding).
Data points indicate the number of measurements used to calculate a score.
Example:
If you set the Open-Ended Score threshold to 40 with two data points, then any interview with at least two open-ended responses and an overall Open-Ended Score below 40 is excluded.
If you set the Open-Ended Score threshold to 40 with two data points, then any interview with at least two open-ended responses and an overall Open-Ended Score below 40 is excluded.
To simplify the process, ReDem provides recommended default settings that work well for many projects. You can, however, adjust them to match the specific needs of your study. The next two sections first describe the default settings and then explain how to select your own.
1. ReDem Recommended Cleaning Settings
Our default settings apply best-practice thresholds to the following metrics:- ReDem Score (R-Score): Respondents with an R-Score below 60 are excluded.
- Open-Ended Score (OES) & Response Categories: Respondents with an OES below 40 are excluded if they provide at least two open-ended responses. Respondents flagged in at least two open-ended responses for wrong language, bad language, AI generated Answer, nonsense or wrong topic are excluded.
- Coherence Score (CHS): Respondents with a CHS below 30 are excluded.
- Grid-Question Score (GQS): Respondents with a GQS below 20 and at least two valid grid-question responses are excluded.
- Time Score (TS): Respondents with a TS below 20 are excluded.
- Behavioral Analytics Score (BAS): Respondents with a BAS below 20 and at least two valid BAS data points are excluded.
2. Custom Cleaning Settings
You can define your own thresholds for each quality metric. This enables fine-tuned control over what qualifies as low-quality data based on your specific needs. Customizable elements include:- ReDem Score (R-Score): Threshold
- Open-Ended Score (OES): Threshold + min. number of open-ended responses + category-based exclusion logic
- Open-Ended Response Categories:
- Generic Answer:
- Bad Language:
- No Information:
- Duplicate Respondent:
- Duplicate Answer:
- Nonsense:
- Wrong Language:
- Wrong Topic:
- AI Generated Answer:
- Open-Ended Response Categories:
- Time Score (TS): Threshold + min. number of time data points
- Grid-Question Score (GQS): Threshold + min. valid grid questions
- Coherence Score (CHS): Threshold + min. number of coherence data points
- Behavioral Analytics Score (BAS): Threshold + min. valid BAS data points + category-based exclusion logic
- BAS Categories:
- Unnatural Typing:
- Copy and Paste:
- BAS Categories:
What Is the Outcome of the Cleaning Process?
Each respondent is classified as either:- Included
- Excluded
- ReDem Score below threshold
- Open-Ended Score below threshold
- Open-Ended categories exceed allowed count
- Time Score below threshold
- Grid-Question Score below threshold
- Coherence Score below threshold
- Behavioral Analytics Score below threshold
- BAS categories (e.g. Copy-Paste, Unnatural Typing) exceed allowed count
Example Case: Cleaning in Practice
Here’s a sample configuration and its impact on a respondent:- Scores < 60 on the ReDem Score
- Has 2+ generic answers
- Has 2+ “No Information” responses
- Or even just 1 flagged AI-generated response (due to its lower tolerance threshold)

