Dataset Cleaner
Remove duplicates, filter offensive content, and prepare clean datasets for fine-tuning
Input Dataset
0 lines
Cleaned Dataset
0 lines
Length Filters
10
1000
Cleaning Statistics
Total Lines
0
Cleaned Lines
0
Retention Rate
0%
Removed
0
Cleaning Rules
Remove Duplicates
Remove exact duplicate lines
Remove Empty Lines
Remove lines with empty text fields
Remove Too Short
Remove text shorter than minimum length
Remove Too Long
Remove text longer than maximum length
Remove Offensive Content
Filter out potentially offensive language
Remove Invalid JSON
Remove lines that are not valid JSON