How to Lie with Statistics


I really try to avoid complaining about work here, because I’ve got a job I love working with interesting and qualified people and an interesting product, and there’s little that’s worth complaining about. But today’s story is just surreal enough to share.

So our project manager stops me and asks me to pull statistics out of the ticket-tracking system we’ve thrown together. The system is an 80/20 solution that got about 30% done, so it’s pretty bare-bones — it lets support follow their tickets, and that’s about it. Aside from queue, subject and priority, it lets people record what they need — there isn’t even anything to enforce usernames, so one guy’s “pkn”, another’s “Chris”, and so on.

I point out that there isn’t very much we can pull out of that. He settles for number of tickets since 5.1.2 shipped, and of those the number that were spam/duplicate/garbage, the number that were about versions prior to 5.1.2, the number that were about 5.1.2, and the number that were about the Mitel Networks 6000, our hardware bundle.

The first bits are easy, but the last bit isn’t.

I point out that we don’t record version anywhere unless someone happens to write it down, and that even then, I can’t programmatically tell the difference between someone asking a customer “Are you using 5.1.2?” and a customer writing “I’m using 5.1.2”. So he asks me to estimate. I point out that the number will be entirely unreliable, that I’ll essentially have to pull it out of the air, and that’s OK, someone somewhere needs this number.

Now, my academic background is in the social sciences, and generating statistics out of thin air isn’t particularly ethically sound, and I point this out to him. He’s sort of wavering, so I ask him to run it past my manager, which he has to do anyhow if he wants some of my time. Manager says exactly what I did — that we don’t record that, and that it’ll be a fabricated number, but if you want a fabricated number, sure.

Still bugging me.

So I tell the project manager that I’ll give him his number, but first I want him to send me email noting the requirements and acknowledging that it is impossible to determine the version statistics he wants, and that he wants me to estimate it. I don’t want to get into a situation where someone comes back and says “This number is wrong, how did you come up with it” and have to answer “Well, see, I had these dice” and then get blamed for fabricating the numbers.

He does. “I fully realize that the percentages that you are going to provide are pure guesses”, he writes. And in my response, I write “Based on an unreliably-distributed sample…” and “one might erroneously conclude a distribution like…”. And this generates no complaints. If things go pear-shaped and someone starts wondering about the numbers, there’s an audit trail of why they were made up. They’re probably reasonably accurate guesses, but they could also be absolutely wrong, and it’s not my problem.

This is a pretty good example of why I like my job.


2 responses to “How to Lie with Statistics”

  1. Wow. That’s kind of like you said “would you please cover my ass for me”, and he said “Hot damn! Sure will! There’s nothing I like better than COVERING ASS.”

    That never happens to me…

  2. HA HA HA. “Cover your ass? Why didn’t you say I could do that before?! That’s my very favoritist thing in the world to do!”