Algorithms, Statistical Rigor, and WhatsApp wisdom

 4 April 2021

Coffee - Good or Bad?
Coffee - Good or Bad?

My mother is one of those charming, spiritual souls who is, however, a bit gullible when it comes to the "wisdom" that gets shared in her many Whatsapp groups.  One week it would be about the goodness of drinking milk with honey, and the next week it would be about the heathy way to drink water (always sitting, never standing).  Despite my frequent protests, she would diligently forward these whatsapp wisdom to me - and I would spend half my time cleaning my inbox. 

But gullible as she is, even she was a bit taken aback when on a particular day she got 2 contradictory messages about the benefits or harmfulness of coffee.  Both claimed to be from scientifically conducted research.  

She vaguely knows that my profession has some "research" in it, and she promptly asked me the other day why is research wisdom so contradictory.  So here is my simplistic explanation for this....

Firstly much of nutritional research is observational i.e. they study a subset of people e.g. people who drink coffee, and then compare that data with that of the general population.  So if coffee drinkers have higher cancer incidence than the general population, then they conclude that it is the coffee that is the cause for their disease (refer to this brilliant NYT article by Kim Tingley).

The 2nd aspect is that very often there is poor interpretation - correlation is often confused with causation.  In the case of coffee, a few years later (after coffee had already been given a bad reputation by this research) another group of researchers discovered that there is a high correlation between coffee consumption and cigarette smoking.  When they isolated the cigarette smokers from within the coffee drinkers, they realised that it was the cigarette smoking that was the real cause of cancer.  So all of a sudden coffee was not the villain - in fact they found people who drank coffee (and did not smoke) actually had better health than the general population.

But the damage was already done - and the misconception about coffee being bad for health kept getting circulated, and often got contradicted with the opposing correct view. Hence nutrition research - and research generally - got a bad name.

Can these mistakes be avoided?  This is where research designs and correct statistical interpretation comes in.  In terms of design, a randomised control panel design usually is more rigorous - this means matched panels of consumers with similar characteristics but following a different regimen (one drinking coffee and the other avoiding it). But this requires enormous time (e.g a minimum 5 - 10 years to study effects on health).  And this is also a field agency's nightmare : organising 2 perfectly matched consumer panels. Moreover there may be ethical objections - especially if later some panelists sue the agency for inducing them to follow an unhealthy practice.   So in Consumer Market Research, randomised control panels (though more robust) are used largely only for simple studies where the influencing variables can be controlled, and the result can be known in a short period of time e.g. to determine which detergent formulation removes stains better.  

Likewise for interpretation - one needs to think very carefully even at the time of the research design about all the possible hypothesis (e.g. reasons for poor health) - and reflect that in the design.  But can we be ever 100% sure that all the possible variables have been thought of and reflected in the design? For example, relative poor health in the coffee drinking panel could just be because that panel had a higher number of people working in high stress occupations - and may have nothing to do with what they ate or drank.  And if this variable was not thought of - and hence matched - the interpretation could be significantly wrong. This is where experience and rigour starts weighing in. 

Today thanks to technological progress, the task is simplified as it is possible to examine a lot many variables simultaneously, and thereby increase the chances of discovering the real determinants.  With Machine learning, one can train machines to analyse a lot of data on its own - much beyond what humans are capable of managing. Much of the data can be fed into algorithms which then generates the results on certain pre-determined matrices.  

But while technology and use of machine learning does help improve scale, timeliness and accuracy - it also leads to creation of "black boxes" which are not re-examined or refined often enough.  They sometimes achieve "divine" status within organisations.  One can only hope the increasinging reliance on algorithms and machine learning does not lead to the good old principles of sound interpretation, re-validations, and statistical rigor taking a back seat.

So mom, NOT all the dodgy Whatsapp "research" wisdom is because of your son's profession 😃 - and please do drink your coffee before it gets cold. 

Comments

Popular posts from this blog

MR and Quantum Physics

Where is AI taking us?