Chihuahua syndrome

Chihuahua syndrome refers to messy data from variations in spelling or input—Chihuahua is easy to misspell. The quality of your data matters—errors can creep in anywhere, particularly when people enter data. Garbage in, garbage out.

Here’s Chris Groskopf quoted in Seeing with Fresh Eyes—Meaning, Space, Data, Truth by Edward Tufte:

“There is no worse way to screw up data than to let a single human type it in, without validation. I acquired a complete dog licensing database. Instead of requiring people registering their dog to choose a breed from a list, the system gave dog owners a text field to type into, so this database had 250 spellings of Chihuahua. Even the best tools can’t save messy data. Beware of human-entered data.”

Chris Groskopf

Capitals, spaces, misspellings, hyphens, numbers stored as text, numbers entered as letters (I, O), accents, straight/curly apostrophes, dates out of order, languages, dialects, abbreviations, and more are all routes for misleading your analysis.

Spend time with your data.

The name The chihuahua syndrome is from Edward Tufte.

Originally posted on Sketchplanations