Dataset Cleaner

Remove duplicates, filter offensive content, and prepare clean datasets for fine-tuning

Input Dataset

0 lines

Cleaned Dataset

0 lines

Length Filters

10
1000

Cleaning Statistics

Total Lines
0
Cleaned Lines
0
Retention Rate
0%
Removed
0

Cleaning Rules

Remove Duplicates
Remove exact duplicate lines
Remove Empty Lines
Remove lines with empty text fields
Remove Too Short
Remove text shorter than minimum length
Remove Too Long
Remove text longer than maximum length
Remove Offensive Content
Filter out potentially offensive language
Remove Invalid JSON
Remove lines that are not valid JSON