Data Science

Data: Is it really Cataclysmic?

dataclysmRudder, C. (2014). Dataclysm: Who We Are (When We Think No One’s Looking).New York: Crown Publishers.

Some people find the sheer amount of data now available terrifying or boring to look at. What others find mundane or scary about it, I find exciting. Aggregating a large set of data and making sense of it in a clean and colorful graph brings me joy! This book, Dataclysm: Who We Are (When We Think No One’s Looking), brings in the human side by aggregating the results of human interest, as it relates primarily to the world of dating.

The author, Christian Rudder, is the co-founder and president of the popular dating website OkCupid. Therefore, it makes sense that his primary dataset is related to dating. Utilizing data that has been mined from his site, he is able to showcase the hidden preferences of people. Graphs are displayed showing the age preference of someone to date as self-reported, vs. the actual behavior of website users. The similarities and dissimilarities in word usage depending on gender and sexual orientation are contrasted both in simple lists and in charts. Outside of the OkCupid site, the rapid spread of the usage of a hashtag is examined through an area chart. It shows the progression in usage of the hashtag by millions of people on the Y axis and hours since its creation on the X axis. One particularly fun display is a map of the United States, broken out by the leading locations listed for “Missed Connections” ads, with Walmart taking over three large swathes of the country.

The only criticism I had while reading the book was the lack of mention of the sample sizes, or otherwise proof that the results were statistically significant. The author explains his decision towards the end of the book as saying “Mathematical wonkiness wasn’t what I wanted to get across.” (Rudder, 2014, p. 243). He further goes on to explain the process in which the data was gathered and diligently verified. I did not doubt his data was valid; I am just someone who likes to see the details. With having his explanation included it was a good display of a data scientist anticipating the needs of the recipient of the information. With that, my only recommendation would have been to preface it earlier in the book.

Is the boom in data availability cataclysmic, as the title would suggest? It depends on your expectation of privacy. I plan to have a Saturday evening to “Netflix and Chill”. I go in full knowing that my preference for Sci-Fi shows is most likely being tabulated along with my age and gender in some colorful chart by a Netflix analyst or consultation firm somehwere. But, what do I have to fear? Being part of that aggregated dataset represented by a dot on a scatter plot has served to my advantage, with the inception of so many more comic (Marvel partnership) and Netflix created Sci-Fi shows (ex: The OA, Sense 8 etc.) emerging the past year alone.

Colette Molteni

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s