In this post 2012 hackNY Fellow Daniel Alabi describes the hackNY Summer Series lecture by Dr. Mike Dewar.

On June 12, Dr. Dewar, a data scientist at with a PhD in systems engineering, talked to the hackNY Fellows on “Measuring Attention.” For those unfamiliar with the work does, is a url shortening service that’s very popular on micro-blogging sites like twitter. A first impression about might be that provides a url shortening and nothing more. But that would be too superficial a description, if not false.

Offline (and some online) surveys for research and analysis are usually conducted through the use of questionnaires, opinion polls, and manual observations. But the tedious and slow survey process of data collection can be scaled up,  sped up, and made more accurate by making and sharing links. Did you know that is collects detailed information –  necessary for rigorous analysis and accurate redirection – when a user clicks on a link? The information receives about the user includes location of the user, the IP address of the user’s computer, the probability that the user that clicks on the link is not a robot (cool, right?), and other useful information about the user. You might ask: what does do with all the information collected from users that click on links? That’s where the data scientists at come in. All the data that collects every second – or every fraction of a second – is mined, analyzed, and visualized by scientists at This process is done by a team of data scientists at or a subset of the team ( has a 7-member data science team out of ~40 employees).

Dr. Dewar began his talk with an idea that he believes has revolutionized the way we mine and use data: the 4th paradigm of scientific discovery. The 4th paradigm was proposed by Jim Gray and described in the book “The Fourth Paradigm: Data-Intensive Scientific Discovery.” Gray predicted that scientific innovation will, in the future, be data driven. To support the idea of this paradigm, Dr. Dewar showed us a Venn diagram originally by Drew Conway that illuminates the relationship among Hacking Skills, Math & Stats Knowledge, and Substantive Expertise.

Dr. Dewar also noted that the task of “Measuring Attention” is two fold: Tracking Attention, and Being Responsible. The data collected from clicks on links can be used to make histograms, line graphs, or “spike trains”, using the d3.js library or any other data visualization library, that are clear visual indicators of trends at a particular time, place, and in some given geographic region. For example, just before the Egyptian revolution, the State Security Intelligence Service of Egypt blocked Egyptian residents from accessing the Internet by shutting down a crucial data center in Cairo. Dr. Dewar showed us a plot of the Internet usage over time in Cairo before, during, and after the shut down in Egypt. The “spike train” graph had a deep and isolated valley immediately after the shut down. After Internet access was restored, the valley disappeared. Suppose we didn’t know about the shut down in Egypt, we can work backwards to investigate the circumstances behind such an abnormal point in our data set and eventually find out about the shut down. In other words, data science helps us investigate irregular occurrences. This is just a small glimpse into how visualization of data sets can yield interesting and sometimes crucial results. For more data visualization plots, check out Dr. Dewar’s github repository.

The plots data scientists make speak louder than words and can, and in fact are meant to,  be very convincing. As a result, data scientists are saddled with the responsibility of depicting information in the most accurate way possible without deceiving viewers. In addition, Dr. Dewar believes that the data that are made available to data scientists shouldn’t be exploited to siphon information for hideous means like for selling obtrusive ads.

Dr. Dewar’s a renowned data scientist (and really humble guy too).  It was a pleasure listening to him (and, of course asking him questions) and we hope to see him again.