Facebook Data Science?

via The Formation of Love

An Atlantic post on some Facebook findings (link just above) from 2014 was in my Refind.com feed today, ostensibly because of my proclaimed interest in data science.  The “Formation of Love” post reported an analysis of a group of Facebook users who changed their relationship status (from single to “in a relationship”) AND who proclaimed an anniversary 1 year later.  The author noted that before the change in relationship status, daily average posts on the timeline reached a “peak” of 1.67, and 85 days after the change, it reached a nadir of 1.53.   The author accompanies this with some nice graphs (courtesy of ggplot2 in R, so props!) showing…well…it’s certainly a blip.  Or a bump. 

Is it a significant change?  Who knows? There’s no significance testing reported.  The Y-axis was considerably foreshortened so that we could see the blip.  The difference was 0.14 posts per day.  I think that equates to maybe 5 thumb movements on a cell phone.  Facebook did a whole series on this sort of analysis, and some of the findings were kind of interesting…

I had started this post as a cautionary tale about the use of big data and the fact that we’ll be able to see minute differences that are statistically significant but otherwise meaningless.  That’s probably the case with this analysis, but, counting the data points in the graph, I’m not sure this qualifies as big data.  Also, I have ended up uncertain and with many questions.  And THAT is the real point.  Data and science (the two principal terms in “data science”) are supposed to be used to illuminate and explain.  The feeling of uncertainty is not the ideal target emotion to cultivate in your audience.  And it comes down to explicitness – we need to know how this study was done.  The details.

“Come now,” you may be thinking, “this is a Facebook post…you can’t demand scientific rigor in a Facebook post!”  On the contrary.  In for a penny, in for a pound…If you’re going to name yourselves data scientists and publish findings for public consumption, you must make the methodology available to the audience.  “But the people who read Facebook posts don’t want to see all that stuff…they just want the conclusions and graphs.”

Sigh…now we’ve arrived at the inevitable conclusion of our root cause analysis…we’ve seen the enemy, and he is us.  Facebook is a company, and will only respond to consumer demand (I know, I know…recent events…but that’s how it should work…).  So, we consumers must demand explicitness and rigor.  A quick review of the comments on this post reveal that we are…NOT doing that.  The most recent comment on the science of this post was at least topical, if a little rude:

Start your Y axes at zero, idiots

The most appropriate comment overall was:

Don’t you see: the difference between 1.53 and 1.67 posts per day is NOTHING??!!!!! This means NOTHING. You just look at the layout of the chart and skim the text and buy right in!

That commenter may as well have added, “Wake up, sheeple!” The rest of the comments are, well, what we’ve come to expect from Facebook comments – an indecipherable cacophony of non-sequiturs, trollings, taggings, and at least one person hoping that Russian women looking for love will direct message him.

We must demand explicitness and rigor from people purporting to use science.  We need to elevate the conversation.  Making that sort of demand, however, requires reasonable training in numeracy and critical thinking.  So…a post that started as a critique of lazy uses of big data ends up as an argument for statistics education in the public schools.  I guess that’s what you get from a teacher.