Even More Data Horror Stories

In a world full of corrupted drives and malware wolves, it’s nice to know you aren’t alone. There are other colleagues around you who also know the true meaning of terror. Read on for more stories selected as chilling cautionary tales by our Data Horror Story Contest  judges.

Be sure to check out this year’s Data Horror Story Contest first place story here

A one-sentence horror story

One time I discovered that a coworker had pasted an image of data in an Excel spreadsheet into an Excel spreadsheet rather than entering the data, thus rendering the spreadsheet unusable.  – Anonymous

Saved by the backup

Many years ago, during my master’s project, I was working with RNA-seq data generated by a former student. The files were stored on a server, so I had to get around using bash. For some reason, still a mystery to me.., the very first thing I tried was renaming the files. Unfortunately, I typed rm instead of mv and deleted everything. We all panicked, of course, but just when it seemed like everything was lost, someone discovered a backup on another computer a few days later. Since then, I never miss a chance to remind people about the importance of backups. – Anonymous

Check your data TWICE, analyze ONCE

My thesis, published in the University library, was written, approved, and bound based on a major data FAIL. Little did I know, the fail happened before data analysis had even begun!

My research tested an interaction effect between parent symptomology and non-parent adult social support, on the maladaptive behavior and psychological symptoms of adolescent teens across two separate ethnic backgrounds over a baseline and follow-up survey administration. The dataset included multiple predictor and control variables for over 250 families – mothers, fathers, sons and daughters. After cartwheels of data manipulation, path analyses of moderation revealed significant interactive effects. Many people, including the thesis committee whose dataset provided primary data, had oversight throughout the research. After the thesis defense, I moved on with my doctoral program and the thesis tucked neatly into the University library.

After a few months of data analytic recovery, it made sense to publish. Many more reviews by co-authors were followed by a long phone call with the Editor of the accepting journal. A retired academic, that Editor met for two hours with me on the phone instructing me in how to improve the quality of the manuscript. It was in the moments before submitting the “final final” version of the manuscript to the editorial staff for publication that it occurred to me to double check that my core data (e.g., means, standard deviations) matched two other publications that had emerged asking different questions of the same dataset. Wouldn’t it be embarrassing if the data didn’t match?

You can imagine the horror I felt upon learning that the datasets did NOT match; the variable means and standard deviations that I had as baseline data, the other publications had for follow-up data. What I had for follow-up was their baseline. It took less than an hour to discern from emails that the original data sent to me had been MIS-IDENTIFIED by the sender!! Hundreds of hours analyzing and plotting (in the days before SPSS did it for you, circa v. 8) graphic representations of data vanished before my eyes. Now, both my thesis and prepped manuscript had the order of predictor and criterion variables wrong. – ConfirmTWICEAnalyzeONCE