Data cleaning helps ensure your results are based on valid, high-quality responses. ReDem automates this process to help you detect and exclude unreliable data.ReDem’s data cleaning feature automates and streamlines this process, providing a standardized and transparent approach grounded in ReDem’s comprehensive evaluation framework.
Every respondent evaluated by ReDem goes through a series of quality checks. These checks produce scores and classification labels, which form the basis of the cleaning logic.ReDem offers two cleaning approaches:
Our default settings apply best-practice thresholds to the following metrics:
ReDem Score (R-Score): Respondents with an R-Score below 60 are excluded.
Open-Ended Score (OES) & Response Categories:
Respondents with an OES below 40 are excluded if they provide at least two open-ended responses.Respondents flagged in at least two open-ended responses for wrong language, bad language, AI generated Answer, nonsense or wrong topic are excluded.
Coherence Score (CHS): Respondents with a CHS below 30 are excluded.
Grid-Question Score (GQS): Respondents with a GQS below 20 and at least two valid grid-question responses are excluded.
Time Score (TS): Respondents with a TS below 20 are excluded.
Behavioral Analytics Score (BAS): Respondents with a BAS below 20 and at least two valid BAS data points are excluded.
You can define your own thresholds for each quality metric. This enables fine-tuned control over what qualifies as low-quality data based on your specific needs.Customizable elements include:
ReDem Score (R-Score): Threshold
Open-Ended Score (OES): Threshold + min. number of open-ended responses + category-based exclusion logic
Open-Ended Response Categories:
Generic Answer:
Bad Language:
No Information:
Duplicate Respondent:
Duplicate Answer:
Nonsense:
Wrong Language:
Wrong Topic:
AI Generated Answer:
Time Score (TS): Threshold + min. number of time data points
Grid-Question Score (GQS): Threshold + min. valid grid questions
Coherence Score (CHS): Threshold + min. number of coherence data points
Behavioral Analytics Score (BAS): Threshold + min. valid BAS data points + category-based exclusion logic
BAS Categories:
Unnatural Typing:
Copy and Paste:
Here’s an example of a cleaning settings object that can be used to customize the cleaning process via the API:
Reasons for exclusion are clearly documented and based on whether any of the active criteria are triggered. Only one exclusion condition needs to be met for a respondent to be removed.
ReDem Score below threshold
Open-Ended Score below threshold
Open-Ended categories exceed allowed count
Time Score below threshold
Grid-Question Score below threshold
Coherence Score below threshold
Behavioral Analytics Score below threshold
BAS categories (e.g. Copy-Paste, Unnatural Typing) exceed allowed count
This transparent system ensures your data cleaning is auditable, adjustable, and aligned with your quality requirements.