MRKT Insights is a new company that has drawn together data scientists, scouts, and analysts. We offer bespoke services to each client depending on their needs. The work can be extremely complicated. But technical scouting doesn’t have to be comlex.
The below blogs were written over the course of the 2018/19 season and show some examples of what is possible in technical scouting with minimal access to data.
My belief is that data scouting works. You can narrow a massive field to a very small field of potential players by identifying players with the attributes you want. You can identify these attributes by looking at how players perform these tasks which are reflected in their performance statistics.
But how many statistics do you need to look at to see if a player is actually any good?
Maybe fewer than you think.
One very basic statistic alone has been enough to identify pretty much the cream of Europe’s defensive talent.
As regular readers will know I think Ligue 2 is the best value league around, for many reasons, so we will base this experiment on that.
What I want to do is use the data from previous years to look at young players who showed up well in a category and follow their career from that point.
I’m choosing the Wyscout category of “progressive passes” which means players who play passes that move the ball towards the opponent’s goal and normally shows up good passers from deep positions.
The PL top 5 from the previous season were:
Looks fairly accurate to me.
If I restrict it (as I will be doing for France) to players who are 22 or younger:
I don’t think anybody would look at that list and think it was ridiculous. All these are young players, getting regular minutes who can pass the ball well from deep positions on the pitch.
The great thing now is that data has been around for a few years. My Wyscout data goes back to 2015/16 so what would happen if I looked at the good passing players from Ligue 2 in that season who were then <22. Would they have all disappeared into obscurity?
Lenglet of Barcelona, Gbamin (now Everton) and Niakhate of Mainz, Ait-Bennasser of Monaco. Cyprien of Nice. The most obscure is Cisse of Olympiakos but he is extremely highly rated and likely to move to a big 5 league club soon. If we take (along with a pinch of salt) the value off Wyscout (adding in 10m fo Cisse) then that is around £124m of players picked up by a single “meaningless stat”
OK, you say, what if that was a one-off? What happened in the next season?
Top of the list is a player I’m constantly promoting Jean-Kevin Duverne, he had a bad injury but is now back and looking good. Niakhate and Cisse are both in the list, showing the repeatability of the data. Then we have lots of new arrivals.
Ferland Mendy, of Lyon, now of Real Madrid. Briancon of Nimes who has anchored the midfield of the 7th placed team in Ligue 1. Florian Miguel too, who is now also at Nimes.
Junior Sambia defensive midfielder for 5th placed Montpellier. Fulgini of Angers now a French U21 international with 30 Ligue 1 games this season. Dylan Bronn? Player of the season for Gent and linked with Lyon.
Every single player over two seasons of data has gone on to success at a much higher level.
So it works for defenders? What about attackers. Can very basic techncial scouting show us anything at all of use?
The most basic statistic of “goals and assists” filtered by age.
The French 3rd division in 2015/16 sorted by goals and assists of players 23 and under:
JP Mateta, who has just scored 14 goals in the Bundesliga in his first season.
Nicholas Pepe, 22 goals with Lille and now a £70m player.
Wissa who is now with Lorient near the top of Ligue 2.
Bouanga who is now at Nimes (now St Etienne) with 10 G+A this season in Ligue 1
Kamara of Fulham
Sada Thioub also of Nimes.
This probably just shows again that tehre is lots of talent in France. But it also shows that simple age filtering is a poweful tool.
Progressive passes, as defined by Wyscout, are forward passes that are 30m long when the pass starts in the team’s own half or at least 10m in length in the opponent’s half.
So far I’ve only looked at the volume of progressive passing. This has been because I cannot filter the data on the player ranking section of Wyscout other than by age. If I choose per 90 statistics a lot of players with a tiny number of minutes show up and push other players off the results screen (limited to 30).
However, my suspicion has been that by using the volume of passes I am effectively looking at young players who played a lot. Young players who play a lot of games in lower divisions will get picked up by bigger teams. So maybe progressive passing isn’t the great stat that I keep saying it is?
How about I look instead at the accuracy of progressive passing and restrict it to midfielders and forwards?
I’ll look for any player with over 1000 minutes play who is classed as a midfielder or attacker and appears in the top 20 list for players in their league listed by progressive pass accuracy who are now <24 years old
I also only looked at France in my first attempt. How about Italy and Spain?
ITALY SERIE B 2015/16
1. Luca Torreira
2. Frank Keissé
3. Alexis Zapata
4. Stefano Sensi
5. Rolando Madragora
6. Luca Mazzitelli
7. Mattia Aramu
8. Moses Odjer
9. Bruno Petkovic
10. Nicolo Barella
A very good start. Of the players listed only Zapata and Aramu have dropped down a level. Odjer has remained in Serie B and everyone else has moved up to top tier or international football.
But is the problem still there that I am restricting it by age too much and therefore just showing all the players who were young and playing lots?
How about looking at Spain and increasing the age range to players 25 and below
SPAIN SEGUNDA 2016/17
1. Enzo Capilla
2. Santi Comensana
3. Inigo Ruiz de Galaretta
4. José Pozo
5. Amath Diédhiou
6. Gonzalo Melero
7. Iban Salvador
8. Joan Jordan
9. Damia Sabater
10. Sasa Zdjelar
12. Ager Aketxe
Less successful. The main difference is that there are more older players appearing in the list. In the case of Joan Jordan this has picked up another excellent player. Also we have far more wide attackers appearing rather than deeper midfielders.
This could be showing that stylistic differences between leagues have an impact with more space out wide in second tier Spain compared with second tier Italy?
So maybe in Spain we should forget about accuracy and look for players in central areas who still attempt to progress the ball, and try the per 90 stat but manually filter the results.
So same year, same restrictions we’ll look at per 90 progressive passes from midfielders and attackers.
1. Fabian Ruiz
2. Joan Jordan
3. Sasa Zdjelar
4. Damia Sabater
Only 4 players make the list that is dominated by defenders with low amounts of minutes. However top of the list by a mile was a young midfielder called Fabian Ruiz currently valued around £50m. The top 2 were miles ahead in terms of progressive passes per 90 and minutes played.
So what about the last 2 seasons of data? I can filter that by minutes and progressive passes per 90.
First of all the test data for a league I know. For all players with over 1500 mins ordered by Progressive Passes per 90 and who play in midfield or attack.
Now I want to restrict the minutes so that anyone who has played at all appears.
Shelvey and De Bruyne have now made the list.
And restrict to 23 or under:
Some new names now including Bacuna, Pereria, Anguissa, Quina and Winks.
At the bottom of the list are players you might expected such as Calvert-Lewin and Solanke who are less involved in build up play.
So if we apply this* to the Segunda and Serie B what names come up as good, young, progressive passers?
* had to slightly fiddle with the age/mins settings as the data was picking up too many tiny minute samples but these are roughly the same as the Prem data.
And for a test lets chose, at random Switzerland and Romania, although we can already see that data-driven Brighton have signed the number one progressor in the Romanian league.
It will be interesting to check back and see if these players go on to have good careers.