The NetHack Challenge at NeurIPS 2021

I’m a big fan of NetHack from way back. I’ve often thought about implementing an AI agent (not a bot) to play the game. In particular, I though of NetHack when this thread started discussing implementation of a 2D ASCII world for doing object recognition.

Well, now it seems that Facebook AI is sponsoring the NetHack Challenge competition in collaboration with the AICrowd platform.

There are multiple aspects of the game world that present interesting challenges to a typical reinforcement learning approach.

  • The world is procedurally generated. When the player/agent dies, the world resets. However, the behavior of the entities within the world is consistent between runs. An RL agent cannot take advantage of any policy information obtained for specific map layout. It must be able to generalize.

  • Information about the world is provided in the form of a partially-observable map, a status bar, and frequent plain text messages describing what happened on the previous time step or as a result of the previous action. These text messages are essential for understanding what is actually happening within the game. This is especially important when the graphical map does not update in any visually distinguishable way (e.g. during combat that lasts more than one turn, an unseen entity throws something at you, or you hear something in the distance). So, there is an opportunity for at least a little bit of NLP as well.

  • The game is very long, and there are many different ways to play it (mostly determined by the character race, class and alignment selected at startup). However, there are many opportunities for accomplishing sub-goals, some of which are defined in the modified Nethack Learning Environment.

  • The game is a turn based RPG, meaning that the game remains paused until the user/agent selects their next move. Also, the graphics for each level are essentially ASCII art. The response of the game to each move is nearly instantaneous. Thus, the majority of the computational resources can be dedicated to the agent rather than updating the environment.

The competition runs until October. If anyone else would like to team up to work on this challenge, please let me know. I’ll try to post updates on the Forum if I make any interesting progress.