You're right, this study was terribly designed and executed. Claiming significance for a term with a p-value of .31 is just embarrassing.
I would say the primary flaw in this study was an inability to tie the responses variables back to the order. Categorizing the order as fancy or not-fancy collapses way too many categories to be meaningful. Particularly when that variable, fancy or not-fancy, turns out to be the primary term in all of the models for which it was included.
Looking at the distributions given in figure 1, I am at a loss to see how the reported results were calculated. The distributions of serving time by sex and drink type look almost exactly the same. If anything, women who ordered plain drinks were served faster.
As a side note, i'ts really cool that the wait times for non-fancy drinks appear to be exponentially distributed, while the wait times for fancy drinks appears to be normally distributed.
That is exactly what one would expect if the non-fancy drinks required a single step performed when the server got arond to it, while the fancy drinks required several steps, with each step having a duration independent of that of the other steps.