intro to cleaning data

Suggested workflow

This page is a synopsis of the last sections, in the order that I prefer to work.

  • If importing CSV, retain leading zeros
  • Make sure data was imported properly and column data matches the column headers
  • Save a copy immediately
  • Delete blank rows within the data
  • Consolidate Column headers to a single row
  • Format all integer columns to eliminate commas
  • Find and replace special characters (&, ! , ~, etc) When in doubt, replace with a space, under-stroke or hyphen
  • Check spelling (especially in columns you plan to Join)
  • Check capitalization (especially in columns you plan to Join)
  • Check abbreviations (especially in columns you plan to Join)
  • Keep totals from original data to check against. This ensures data is not lost.
  • Manipulate data in columns last.
  • Delete what you don’t need