Premier League Points – How to Get a Heap of Analysis From Even the Smallest of Data

This blog post started as one thing and ended up another, but I’m actually glad it did.  I set out with the intention of using a range of Premier League data to examine Qlik Sense Chart chart types and best practices.  However, after coming up with a list of around 20 different measures off the back of just the first 4 columns I decided it would be interesting to see what Qlik Sense can do with even the smallest of data sets.  With just 4 columns; Club Name, Season, Position and Points, there’s enough to take you wherever your mind dares and Qlik Sense provides the tools to put your thoughts onto screen.  The tools are not just the visualisations but the ability to rearrange and aggregate data within your front end calculations; in the forms of set analysis, aggr() and firstsortedvalue() functions.

The aforementioned columns were loaded into Qlik Sense Cloud and a flag was added to mark any Premier League seasons from when the league had 22 teams.  For ease of use in calculations the season has been changed to the end year of the season (e.g. 1992/93 becomes 1993).

premiership-data

From here on all calculations are done in the front end (dashboard) with the aim of producing something like this:

key-stats

Now is a good place to mention that best practice would actually be to create more flags in the load script rather than relying on the front end.  For example, later on we’ll need to calculate the highest finishing position for every club.  In the front end this requires a set analysis modifier inside a min() function and on larger data sets will eat into the calculation speed that occurs with every user selection.   It’s therefore, in most cases, preferable to put such calculations in the load script because even if the script takes a little longer to run, it’s not going to impact end user experience.  However, in this case we’re testing what Qlik Sense allows you to do quickly with small data sets so we don’t need to worry much about the load script.

Let’s start with the first chart that sprang to mind, a league table position history of every club.  Using End_Year and Club as dimensions and Sum(Points) as the measure you end up with the colourful mess in the first chart below:

full-league-history

Even with a key, no one is going to be able to track a team or see any trends.  It’s not that you can’t see the woods for the trees, it’s that someone has set fire to the woods!  What we want is to only ever see a few clubs at a time.  In this situation some developers choose to hide the chart until a selection is made and use an error message to inform users they need to make a selection first.  My preference is to avoid hiding charts (empty space is never attractive) and instead offer a default view as per the second chart above.  Changing the measure calculation to the below one means that when no club is selected the default view is shown (the traditional top 6 teams) but when a selection is made only the selected clubs are shown.

league-history-calculation

As an example I’ve selected the 3 North East clubs – including my beloved Middlesbrough.

north-east-selected

Next we can create some key/headline figures.  These calculations are useful when creating KPIs and embedding calculated statistics into websites.

kpis

The first, biggest winning margin, is perhaps the most difficult.  It is made up of 3 different calculations:biggest-winning-margin-calculation

It starts by working out the name of the club that achieved the biggest winning margin.  Firstsortedvalue() allows you to sort a dimension by another field and extract the top/bottom value.  In this case we sort Club by winning margin which itself is calculated by subtracting the number of points achieved by the 2nd placed team from the points achieved by the 1st placed team.  The second part of the calculation works out the highest (max) winning margin and the last part does much the same as the first but it’s the End_Year that is sorted rather than Club.  All of them use the Aggr() function to group the raw data by a given dimension.  For more information on these functions you can type them into Qlik and press F1.  There’s also a wealth of information on https://community.qlik.com/welcome

To work out that Chelsea amassed the most points in a season we can make use of modifiers (<>) to group calculations by a dimension and then extract the min/max value:

highest-finish-calculation

To work out the average number of points required to stay up, i.e. finish in 17th place and narrowly avoid relegation, we need to use some set analysis to narrow down the data we want to work with:

average-survival-points-calculation

So using you usual aggregations (sum/count/min/max/avg) and combining them with aggr(), firstsortedvalue() and grouping via modifiers we’re able to rearrange data with every click an end user makes and uncover all kinds of insight from even just a small data set.  Plus, we can do it all in the front end if we want.

key-stats

Another example is to examine club’s record (highest) finish.  The below table can be achieved with the formula that follows it:

highest-finishhighest-points-calculation

You could go on analysing the Premier League in this way all day (believe me I did!).  The same techniques can be applied to your sales tables, call centre performance, survey responses, the list goes on.  Basically, if you want to examine it, Qlik Sense allows you to do it and you don’t always have to rely on a hefty load script to do all the work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s