Data Analytics and Data Visualization for Soccer/Football: World Cup, Champions League, UEFA, AFCON, Copa America, Europa League, Premier League, La Liga, Serie A, Bundesliga, AFC Asian Cup, Major League Soccer, CONCACAF, etc..)

1- Football Passes Visualizing. 880k passes in 890 matches

Tool: Tableau

Interactive visualization

Data: Here

This viz aggregates all passes on a grid with 1 meter step. It means, all distances and passes on a square meter of the football pitch represent by a line with average lenght and direction. So this viz is an 'averaged' picture.

The data includes:

- Champion League 1999 - 2019

- FA Women's Super League 2018 - 2020

- FIFA World Cup 2018

- La Liga 2004 - 2020

- NWSL 2018

- Premier League 2003 - 2004

- Women's World Cup 2019

2- Chances of winning a football (soccer) game using the ELO rating.

Tool: Python with Seaborn
Source: Elo Ratings and Here

Important to consider: The Elo rating was never really good in Football because the amount of matches played is too low plus a national team changes over the course of years, in 8 or even 4 years it will be a different team. Elo system is great for games like Chess or Starcraft, where players often play tens of matches every month and sometimes single match consists of multiple rounds. Additionally it’s just the player who actually plays that is rated. In football there is a really low data base to start with. Over the course of 2 years a team will play qualifiers for a large Cup, a few friendly matches and the Cup itself. That’s less than 20 matches over 2 years, probably half of them irrelevant (friendly matches and matches against semi-professional, small teams) on top of that things like injuries or club matches mean the team that earns the elo points doesn’t fully correspond with the players that will play in the actual game.

3- Women & Men in Football/Soccer

TOOLS: Tableau, Python

DATA: Statsbomb

INTERACTIVE VIZ: here

4- The 100 Best Footballers (Soccer Players) in the World in 2020-21

The 100 Best Footballers in the World, according to transfermarkt.us.

Data retrieved via transfermarkt.us Visualization service: Tableau

5- The Impact of Fans on Home Team Performance: Evidence from European Soccer

Interactive version here.

Tool: Tableau

Source: fbref.com

6- Most Scoring Soccer/ Football National Teams 1900 - 2020

The author made a visualization on the basis of all games played by soccer national teams since 1872. In this video, he ranked each nation by the most goals shot in average per game. Shown is always the added up average over all games played in history. Nations are only accounted if they have a total of at least 100 games in the dataset, and are displayed first after 10 games to have a liable average score.

Data source: Kaggle, list of National Soccer Games played since 1872, flourish.com used for data visualization

7- Eras of dominance in top flight English football (soccer)

Source: Wiki

Tools: Excel and Paint

From the author:
There's an awful lot of information to be gleaned from this plot, so bear with me while I describe some of the important stuff:

An "Era of Dominance" is classified by a team scoring 7 "points" or more in a 7 year period. Winning the league earns 2 points. Runners-up earn 1 point. 3rd-4th each earn 0.5 points. In theory it's possible for a team to earn this status by getting nothing but runner-up (never happened), but in reality it always required at least 2 wins in a rolling period.

The x-axis is in years. For the purpose of this plot, this represents the calendar year the season ended. The y-axis is in "score", with a maximum of 14 points which would mean winning every available title in the rolling period.

When two or more teams share an era of dominance then the team with the higher "score" is shown on top. For example: 2008 with Man U > Arsenal > Chelsea.

The bars under the team logos are provided to show the length of the era, as it can get a little confusing in the top plot.

Some fun stats:

Liverpool had the longest uncontested era of dominance. 14 years between 1976-1990. Nearest was Arsenal with 10 years in the 30's.

Manchester United had the longest continuous era of dominance. 26 years between 1990-2016. Nearest was Liverpool with 23 years just prior to Man U.

A difference between the Liverpool and Manchester reigns was that Liverpool spent 22/23 of their seasons as the premier team. Manchester only spent 20/26 of their seasons, being usurped by Liverpool, Arsenal, and Chelsea throughout.

Only one 7 year period in the history of English football saw three teams establishing dominance. The period ending in 2008.

No football was played during either of the great wars.

Lastly, for fun, here is the all-time table for top flight English Football.

8- When to score the first goal in a soccer match in order to win the game

Green indicates victory, Orange a draw and Red a loss.

The opacity of each bar is proportional to how often the first goal of the match is scored on that minute (more transparent means less often).

Dataset used from Kaggle.

Blogpost with more details on the analysis and results, as well as link to github code.

Done using jupyter notebook and matplotlib.

9- Average Height and Weight of Soccer Players by Country

Data source: FIFA19 database

Tools: python + matplotlib
Weight and height are obviously highly correlated (not just in soccer players).

Here's an interesting idea: plot the deviation from the average height-weight correlation in relation to some measure of success in soccer. Then we would know if it pays out for your team to be relatively underweight for your height (China) or relatively overweight for your height (Paraguay).

10- Distribution of Soccer Power Index for the top 10 leagues (according to revenue)

Author
Data is from FiveThirtyEight's SPI

Ranking based on revenue sourced from Wikipedia

Plotted in python with Seaborn violin plots and edited in Inkscape to add the logos sourced from Wikimedia.

11- Referee Activity in the Big 5 European Soccer Leagues

Author: Here
Data Source: fbref.com
Tool: R

12- European Soccer Dynasties

This sports themed viz shows the periods of dominance by European soccer clubs over time. I define a "dynasty" as a rolling 7 year window in which a team accumulates 3 "points," with a title being 1 pt and a runner up being 0.5 pt. You can hover over each dynasty for more details, choose a team for you to highlight, or highlight teams based upon the dynasty index (3pts, 3.5pts, 4 pts). Enjoy!

Tool: Tableau

Link:

Data: Wikiepdia sites for each Soccer league ex.

13- The top goal scorers in 40 years of elite football (soccer)


Source: Here
Tool: Tableau

Interactive visualization: Here

Data: Here

This viz aggregates all passes on a grid with 1 meter step. It means, all distances and passes on a square meter of the football pitch represent by a line with average length and direction. So this viz is an 'averaged' picture.
You can compare goals vs age, vs minutes played and more for over 100 players including Messi, both Ronaldos, Zlatan, Totti, plus the legends of yesteryear like Shearer and Baggio, and the next generation like Mbappé.

There are 191 players to compare in total, going right back to 1970 (every player who scored 20 or more league goals in a single season of a top European league).

13- Football(Soccer)'s home-grown talent by country - how many players were club-trained?

Legend (All numbers from here)

Dark green: Number of players that played for at least three years/seasons for their current club between the ages of 15 and 21

Light green: Number of players that trained at that club but play currently for another top division team.

14- How soccer lineup formations changed over time

The author is working at a soccer app livescore company (Forza Football) and digged some into the data. So by using some simple SQL in BigQuery, and then visualising it in Mode Analytics (where we build all our dashboards) he got to this one.

This is based on around 12,000 lineups from all kinds of leagues. The majority is from European Leagues, but we cover competitions all over the world.

15- Economic prosperity and soccer prowess

The author made the visualization with ggplot2, with data from eloratings.net and the World Bank.