In e-Sport one of the games played is the Dota2 video game. Dota2 is a multiplayer online battle arena. There are two teams (the Dire and the Radiant), and a team consists of 5 players. During a match every player collects gold, items, and experience points for one of the 111 available Heroes. The team wins when it destroys the other team’s Ancient building. If you go to the live broadcast page , you may see an actual game being played at the moment. For example:
Would it be possible predict the outcome based on the first 5 minutes of gameplay? The variables that are used to predict the winner include levels/gold/experience/kills/items/hero for every of 10 participants, statistics of the buildings, and match aggregates such as duration, first blood team, first blood time, and so on.
As technology Python, was used and probability of the Radiant team to win a particular match w. The winner is determined by the area under the curve (AUC-ROC) summary statistic . It varies from 0 to 1, where 1 means a perfect predictor. The highest predictability the models used reached was 0.75499.
There were a few paths that I went through to come up with the result. They included gameplay understanding, data understanding, data preparation, multiple data transformations, compositions of own variables (sort of karma points ranking system), codification of some variables, and calibrating multiple parameters for a machine learning algorithms.
Although it is fun to try to predict e-Sport outcomes as with DOTA2, the same techniques and methods can be used in real-world applications. (Although e-Sport is a highly paid professional sports so I am sure it has applications in the gaming industry as well)
The example is advantageous as a demonstrator for a number of reasons:
- Similar applications can be found in e-commerce, online behavior (“Will a client make a purchase?“), churn predictions models (“Will a client go to another provider?“), winner prediction when it’s about a team and each player is unique, etc.
- Creativity with assignment formulation, possibly to match business requirements. Alternative problem formulations could have been: “What is the minimum number of minutes that are necessary to predict the outcome with 90% certainty?“, or “Analyze what is the best predictor of winning in terms of players: do the best people best predict the outcome, or the worst?“, or “How important is gold in determining the winner?“, “How crucial is a hero choice?“, or other questions.
- Being able to operate with different data types (structured as CSV/JSON and unstructured as text) and from different fields (online video game in this case), and variable types (categorical, discrete, continuous, text).
- Feature engineering (modifying, composing, transforming, dealing with outliers, aggregating) and selection.
- Testing different machine learning algorithms (starting from simpler regression and moving towards complex neural network algorithms).
- Testing different machine learning tools (the course requirement was Python, but I have rewritten the assignment to test Google’s TensorFlow).
- The final assignment requires peer review: see inspiring examples of how other students have tacked the problem.
These demonstrators we develop as a data science team help us explore new ideas and challenges. Both our own capabilities get better and they provide excellent ways to explain what is possible in laymen’s terms. It gives us a fresh perspective, new data sets, opportunity to test our skills and to challenge ourselves. Constant learning, tackling unusual and hard problems is part of our culture.
By Sergej Obzigailov, Machine Learning expert