I read the original article that tries to draw this conclusion, and I'm more than a little disappointed in the quality of the methodology. This was a pilot study at best. There are problems with sample selection, sample size, data collection, and data analysis -- basically all aspects of the study.
The coffee shops were admittedly not selected at random and there were only eight of them. I'm guessing that there were no power calculations conducted beforehand to determine adequate sample size, either of number of coffee shops or number of customers. (It certainly wasn't mentioned.) While customer selection was somewhat random, I would have liked to see that explored and verified a bit more as well.
There was no mention of inter-rater reliability being conducted between the enumerators. Also not included was the race or gender of the enumerators. It's entirely possible that they may have had a predetermined desired outcome in mind, and were unconsciously (or consciously) adding some lag time to the stopwatch times. Of course, we can't know this, because it was not reported or analyzed.
There's likely to be a common server behavior within a coffee shop, but
an actual "which-coffee shop" variable was not included. An even
better design would have been to model the data hierarchically within
the coffee shops; clearly that wasn't done either. Also not reported was what sort of regression model was used to build the six(!) separate regression models. I presume a plain, old linear regression model, but there's no mention, also no mention if the appropriate assumptions were met. Also, why six models? Of course if you do enough models, some results will be significant at the alpha = 0.05 level purely by chance. Why not just a simple backwards regression method so you can get a nice parsimonious model, plus keep in any variables that the researcher feels to be 'clinically' relevant?
I'm deeply disappointed in this study. I'm even more deeply disappointed in Mr. Harford for taking it this seriously. It's an interesting pilot study at best, it could easily be used to design a much, much better study, but it in no way warrants this kind of attention.