OpenAI and the Future of Humanity

Warning! This post gets into Dota 2, a complex, real-time, multiplayer, strategy game that I have talked about on numerous occasions on this blog, but that I have always avoided getting into the weeds of since the game features a lot of specialist jargon that is impossible for a layperson to follow. Unfortunately for this post I’m going to get into the weeds a little bit, and I won’t be explaining all that jargon because that will make this post two to three times longer. You have my apologies, and if the going gets weird and confusing, feel free to skip down a bit, there’s no shame!

This Sunday I watched as 5 bots beat a team of Dota players that are in the top 99.95 percentile. This was not the first demonstration of the triumph of machine learning over human professionals in Dota, at last year’s International event an OpenAI bot was featured in a 1v1 mid matchup against Na’Vi famed mid player Dendi. He lost handily, however it was difficult to evaluate how profound the implications of this were. Winning that matchup against a professional player was certainly impressive, but it was such a constrained experience compared to a regular match of Dota that it would be impossible to say that OpenAI had developed bots that had mastered this complex game. The demonstration on Sunday showed OpenAI much closer to that goal.

Some observations about the matches that I found interesting:

OpenAI clearly went out of their way to demonstrate that their bots had developed superior strategies and tactics compared to their human opponents. They limited the hero pool to heroes that only have to control the single hero unit (i.e. no heroes with illusions like Phantom Lancer or Terrorblade, no heroes with clones or multiple instances like Arc Warden or Meepo, and no heroes with units they can summon like Nature’s Prophet or Enigma), and also no items that produce these extra units like Manta Style, Necronomicon, or Helm of the Dominator. By doing this OpenAI were able to nullify what would be an obvious objection to the match vs human opponents: that a computer can manage controlling multiple units effortlessly unlike their human counterparts. However this constraint also avoids a rather interesting phenomenological critique of AI.
Similarly, the OpenAI bots were programmed to have a 200ms reaction time. This would put their reaction time in league with their highly skilled human opponents. After the matches I saw some commentators that were unaware of these constraints dismissing the results citing a machine’s superhuman reaction times. This was actually not the case, the bots won through superior drafting, strategy, and tactics.
When computer AIs started to become good enough to defeat Chess pros you would sometimes see bizarre moves that no human player would ever make. Humans are limited by the capacity to only think about how to win the games in a logical manner. That is, we won’t try some alternative strategy out that makes no sense to us because it is difficult to think up an illogical strategy, and keep applying it if it doesn’t immediately produce results. A computer aided by machine learning has no such constraints. It can try out seemingly pointless strategies, and after enough simulations it will gather enough data to determine whether this “arbitrary” move is worth executing or not. So I was interested to see what sort of odd behavior the AI might display in Dota. The bots map movements in general actually resembled the way pro teams move around the map, however one curious behavior of the bots was how they used consumable items. The bots understood the value of wards, but they would sometimes inefficiently place sentry wards right next to each other, and in other bizarre spots. They would also use smoke and dust at random times. The OpenAI team speculated this might just have to do with how it prioritizes resources. It knows to buy smoke of deceit, but then it needs the inventory slot for some other item it is prioritizing.
Another limitation of the bots is they appeared to heavily rely on hero build guides in terms of itemization decision making. But the bots didn’t always understand the value of these items. For instance, we would see Crystal Maiden purchase of Hand of Midas without using it once. This is actually very bizarre if you think about it. If the bots are optimized for efficiently acquiring resources it seems bizarre that it wouldn’t learn to use this item.
One consumable item the bots were excellent at abusing was healing salves. They constantly ferried salves to themselves and would use salves in the middle of battle such that even though they got canceled by an attack, the extra 50 or so HP they earned from that canceled salve made the difference between winning and losing a fight, making the salve a valuable investment.
Of course the reason they were able to abuse salves was because of an artificial crutch placed on the game where each bot received its own invincible courier. I didn’t see an explanation for why this constraint existed, but it heavily implies that these bots are unable to figure out how to efficiently use a limited, shared resource.
One cool thing to watch was how the bots didn’t care about the preconceptions of the 1 to 5 system as a human player would understand it. The bots seemed happy to place any hero in any lane, and it wasn’t worried about having 3 cores in a single lane because it knew it could efficiently acquire resources around the map in the post-laning phase that whatever inefficiencies this lane configuration might have wouldn’t matter. But if we think about the assumed limitations it has with shared resources it would be interesting to observe what happens if the bots are forced to play in a “scarcity” patch where the resources it can acquire on the map are much more constrained. How does it respond to this selection pressure?
Another interesting detail that came up during the post-match Q&A was that the OpenAI needed to nudge the bots to start taking Roshan fights by creating a variable for how much HP Roshan had. Only by having this sometimes weaker version of Roshan were the bots able to learn about the value of Roshan. Although it is difficult to evaluate what specific limitations this implies for the machine learning techniques utilized by OpenAI, it does imply real, concrete limitations.
By far what I would consider the biggest disappointment of the bots is how they responded to playing from behind. In the third match between the humans and the bots they had the audience and Twitch chat draft for the bots. Unsurprisingly, they drafted the worst possible draft for the bots, and the bots estimated their chances of winning at about 2.5% (in the previous matches at the start they estimated their win probability between 90% and 99%). The question then became, how did the bots respond to playing at such a severe disadvantage? Although they did try a novel quad lane configuration, they ended up reverting to their usual strategy. A human team at such a disadvantage would try to make some high risk high reward strategies like sneaking in a Roshan fight when the enemy team wasn’t expecting it, but the bots didn’t attempt anything like that.

Among these observations I include a lot of criticisms of the bots to highlight that much work is needed, but also that despite these limitations they were able to beat extremely capable players, and that is very impressive. The dream of “general AI” has been on the backburner for some time now, and instead efforts have been directed towards the more fruitful research on “limited AI” (see this blog post for a simple explanation). Observing this match I still see some important limitations of “limited AI”, but those limitations do not imply a lack of value. At the Q&A OpenAI algorithm mentioned that their machine learning process had resulted in an AI that can control a mechanical hand that is analogous to a human hand. They attribute their success to domain randomization techniques that allow their AI to learn about certain kinds of experiences via simulation, and then successfully learn from physical interactions.

If an AI can gather a lot of valuable insight in a limited period of time, this means that the AI can then teach humans insights that it would otherwise take years or decades to arrive at. After the OpenAI 1v1 mid demonstration last year, the OpenAI team provided a version of their bot to a bunch of pro players who practiced against the bot. Over time, many of these pros learned how to develop strategies that they could exploit to defeat the bot. This meant that humans were capable of quickly advancing their skill at the game through these interactions, raising the overall skill level at the task of playing Dota.

The question then becomes what are the domains where AI simulations can be used to teach humans? Could an AI assist doctors with a correct prognostication of a patient? This is a far more complicated problem than even a complex strategy game. The difference is that when you are manipulating objects or playing a game of Dota it is relatively easy to create well defined models that map to the real world. Can the same be said for treating diseases when there will be instances of known unknowns? What about developing tools for better economic prognostications where you are dealing with a highly complex, dynamic system? OpenAI’s current research offers both promises and hesitations.