miedvied wrote:BTW, wtf is up with data messiness? I mean, really, why the hell can't people ever put things in in something resembling a clean format? I consulted at an academic MD's office and their access DB looked like cocaine-fueled apes threw feces at the keyboard.
The basic answer is that the data you receive is never really designed for you.
On one end, you have data designed for accounting, tracking, or other purposes. For instance, data at a doctor's office might be designed only for the purpose of looking up accounts and forwarding information to insurance companies. Things like missing values, inconsistent coding methods, etc., might not matter for those purposes. In my experience, data that we get is usually for basic recording purposes, and we have to figure a way to clean and format it to run several different tests (and you never know which ones pan out until you get to the end, meaning a lot of dead-ends in the meantime).
On the other end, even data designed for statisticians is going to involve a lot of cleaning (e.g., data provided by the US Census Bureau, BLS, CDC, etc.). This is because the data has to be designed so that anyone can use it, regardless of the software, operating system, encoding method (like ASCII, EBCDIC, Unicode), etc. Some places are pretty experienced and provide, say, the raw data plus programs to encode it in a dozen different ways, but usually all you'll have is the codebook and data.
The kicker is that the more useful your analysis, the more work is gonna be involved in reshaping the data. If the data is already nice and clean, chances are it's because there are a ton of people who've already done all the analyses possible and squeezed a bunch of info out of the data (which is what you usually deal with while you're a student). If it's some new problem that people haven't tackled before, or if it's all-new data that not many people have checked out yet, then you're more likely to be on your own.
Phew. That was more detail than I though I'd get into... KF