intro to cleaning data
Suggested workflow
This page is a synopsis of the last sections, in the order that I prefer to work.
- If importing CSV, retain leading zeros
- Make sure data was imported properly and column data matches the column headers
- Save a copy immediately
- Delete blank rows within the data
- Consolidate Column headers to a single row
- Format all integer columns to eliminate commas
- Find and replace special characters (&, ! , ~, etc) When in doubt, replace with a space, under-stroke or hyphen
- Check spelling (especially in columns you plan to Join)
- Check capitalization (especially in columns you plan to Join)
- Check abbreviations (especially in columns you plan to Join)
- Keep totals from original data to check against. This ensures data is not lost.
- Manipulate data in columns last.
- Delete what you don’t need

