Data science teams tend to pull in two competing directions. On one side there’s the data engineers who value highly reliable, robust code which carries low technical debt. On the other, there are the data scientists who value the rapid prototyping of ideas and algorithms in Proof-of-Concept like settings.


How an uninformed market gambled on data science to help combat the rise of tech-enabled disruptors. Why it failed, and how it’s finally paying off.

Image by mohamed Hassan from Pixabay

Disruption and the rise of data science

Financial crises act as accelerators of both creative destruction and technological accumulation. In the years following the 2008 financial crisis, widespread tech-enabled disruption produced a metamorphosis in the established order of America’s mega-corporations. …

Image by Pexels from Pixabay

What are the longest words spelled out from the first letters of names in football tables?

Given infinite time, a monkey typing at random will almost surely write out the full works of, say, Shakespeare.

In our problem set up, we don’t have infinite time, nor infinite monkeys, but we do have football tables, and plenty of them.

If still unclear, then rephrased as a more…

Image by Maria_Domnina from Pixabay

At-home projects should be free, here’s how.

The ROI for a hobbyist is rarely measured in dollar bills. Instead, people start hobby projects because they’re interested; maybe to learn, maybe for fun, maybe to build a PoC for an idea which could eventually yield riches.

Consequently, people are generally unwilling to pay for services to get their…

Image by TheAndrasBarta, license.

In 7 simple steps

All innovative technologies will eventually pass over a ‘peak of inflated expectations’, a phenomenon especially true when said technology is backed by heavy marketing war-chests, which serve chiefly to fuel hype and exacerbate public expectations.

The landscape of Machine Learning is slowly maturing beyond its ‘I’m a hammer and everything…

An apt summary of public discourse about A.I. — by Tabor (license).

The public debate around A.I. is consequential for funding, research, regulation, and the extent of its malign misuse. Our discourse is failing because we collectively flaunt several definitions of the term.

The shiny hype train

Ever since the overworked and cringeworthy remark that data science is the sexiest job of the 21st century, and the resulting hype train that torpedoed the modern conception of Machine Learning — Breiman’s conception — from the academic fringes to the dizzying lights of the mainstream labour market, the frantic…

Disclaimer: not all data scientists do, or even should have to, write production grade code. Whether they should is ultimately down to context. But if they could, it would make the field a much better place.

Common knowledge would have you think a data scientist spends the majority of their…

Linear regression. It’s the first type of regression analysis ever to be studied intensely, the foundation of any supervised learning course, the cornerstone of… you get the picture. Well, it sucks.

In real world settings, Linear Regression (GLS) underperforms for multiple reasons:

  • It is sensitive to outliers and poor quality…

The main benefit of using the median as oppose to other average approximation aggregates like the mean, is because it is less skewed by extremely large or small values. It offers a better approximation of what is called a typical value of the data.

To estimate the median of a…

One of the most crucial pieces of any data science puzzle is perhaps also the least glamorous: feature engineering. It can be protracted and frustrating, but if it’s not done right, it can spell disaster for any modelling or analysis that follows. …

Andy Greatorex

London based data scientist @Revolut. Formerly in NYC @Barclays. Building stuff for the fun of it.

