Training a Reinforcement Learning agent to fish in Stardew Valley

8 min readJan 21, 2022

Because actually learning to play a simple minigame wouldn’t make for a cool blog post

This is a project I have been grinding on and off for quite some time. I had never tried to mod a game and I had never successfully trained a “real” reinforcement learning agent (I gave up on Flappy Bird). The challenge for me was understanding exactly what was the “state space” of my problem. When you code using some easy RL framework that gives you the agent, the environment and the reward you don’t have to think at all about the modeling part of your problem. Within a game context you have to think about the state the model will read each frame and the inputs the model will feed to the game and then collect a suitable reward accordingly. You have to make sure that the model has the correct affordances inside the game (it sees only what the player sees), otherwise it could just learn to exploit a bug or break the game, or not converge at all.

My goal was to code an agent that would read the state of the fishing minigame and perfectly play the game. The end result is a mod for Stardew Valley, written in C# using the Official Stardew Valley’s modding API. The mod loads a serialized DQN model trained in Python. I first collected data from the game, then I used this data to train a simple DQN with Pytorch. After some iterations I can generate a serialized model with ONNX and I can then load the model from the C# side, and in every frame it receives as input the state of the fishing minigame and (hopefully) outputs the correct action on every frame.

The Fishing Minigame

This agent is written with the help of the fantastic SMAPI, the Official Stardew Valley’s modding API. The API gives access to the the game memory in runtime and provides everything I needed to make an agent that interacts with the game state and feeds an input to the game in real time.

In the fishing minigame you have to align the “hook” (a green bar) with the moving fish, you give the bar a little nudge up by clicking with the left mouse button. The fish moves erratically in this vertical bar. Every frame that you manage to align the hook bar with the fish the green bar fills up a little bit, and if the fish manages to escape the green bar starts to empty. You catch the fish when you fill the green bar and lose the fish when it empties.

The Reinforcement Learning problem

So, I just need to read these specific attributes from the game’s memory every frame and save them as my state for the frame t. You can see the code for reading specific attributes from the game memory, for this problem I need the 4 variables the game keeps track during the fishing minigame. The position of the center of the “hook”, the position of the fish, the speed of the hook and the amount the green bar is filled (this is our reward!). The names the game uses internally are a little odd, but in this order, this is the code to read them.

The first three define our state:

This is what the model can use on each frame to decide what it will do. To set this up as a Reinforcement Learning problem we also need a reward to guide training. The reward will be how much the green bar is filled, this is the variable distanceFromCatching. This goes from 0 to 1, and is perfect for a reward. The RL agent will hopefully learn to carry out the actions that will make it maximize it’s future reward (i.e. catch the fish).

Replay Memory

Replay Memory is a technique used in Q-learning to de-correlate the training to a specific “episode”. We do this by storing state transitions on a buffer and training the model by sampling random batches from this memory instead of training directly with the latest data. To train the model we need 4 things, respectively the current state, the next state, the action we took and the reward:

The crux of Q-learning is to know which state you were, which state you got to, which action you took to get to this new state and the reward you got from performing this action. With thousands and thousands of these examples we can use a simple algorithm like Value Iteration (a dynamic programming algorithm) and propagate rewards from terminal states (winning states) to preceding states that leads to it, so for every possible state the model will know the direction that maximizes it’s future reward. I will not train the model with Value Iteration because real problems tend to have too many states and dynamic programming takes forever, but this is the idea of what is happening.

Just to illustrate, this is how I save each entry of my Replay Memory in C#. I use buffers to get the state and action from the last frame and store all of this together with the state and the reward from the present frame.

All this data becomes a huge csv file that I load via Python and use to train the DQN model.

The Neural Network

Q-learning using Neural Networks to estimate the Q-table is called Deep Q-Learning. The idea is very well explained in this Pytorch Tutorial, from where I mercilessly stole all the code and changed it a bit for my problem. The main idea is that we will use two neural networks. One will estimate the value of the Q(s,a) (Policy Net), the other to estimate the value of the future Q-values (Target Net). We then backpropagate on the difference of both these networks.

Here is the basic equation for the Q-learning algorithm. We will use one network to estimate the correct value of the current state Q(s,a), the other one will estimate the maximum possible value for the next state. Both networks are initialized with random values and every few iterations we copy the Policy Net weights to the Target Net. The only network whose weights we update via backpropagation is the Policy Net! And this is the one we use when we wish to test the model later. The idea is that by backpropagating on this difference the Policy Net will eventually learn to estimate both values.

α is our Learning Rate and 𝛾 is the Discount Rate used to select the importante we give for the future values of Q.

Reinforcement Learning is quite hard, so I’ll leave a bunch of links that will do a better job than I could ever do explaining all of this details. [1][2][3]

Training Procedure

The training process was “bootstraped”, first I played the game myself, collected state and reward data, trained a shitty model and made it play Stardew Valley. I’ve written code so the game will continually generate new data, that I would then manually use to train new models on the Python side, generate a new ONNX artifact that the mod would reload every 1000 frames or so and then keep playing the game with new models and generating data to train newer ones. I’ve made it this way b̶e̶c̶a̶u̶s̶e̶ ̶I̶ ̶s̶u̶c̶k̶ ̶a̶t̶ ̶C̶# because since I had to compile the mod and pack it in a Windows DLL compatible with the game’s executable, I had some trouble finding a .NET Machine Learning framework that could generate the correct binaries (Stardew Valley is compiled in .NET 5), so I just gave up and coded this part in Python.

As you can see, one important insight that I had is that the model doesn’t need to be trained online. Q-learning is all about finding a good approximation of the function Q(s,a), i.e. the function that estimates the value of performing a certain action a in a certain state s. The important thing is to make your data explore this state space thoroughly, it doesn’t matter that much whether you or the model played the game, although I will admit that it is way more impressive when the model does this on it’s own.

Reading the ONNX model from C

The only true ML code on the C# side is the ONNX runtime library, which defines a Tensor type and the session object that can send tensor inputs and collect tensor outputs from a serialized ONNX model. As you can see, the code is very self explanatory and straightforward. This Update function is run with every frame and queries the trained model for an action with the current state as input, these last lines are just some code to take the argMax of the model’s output, which is the index corresponding with the action that yielded the greater reward. I’m quite proud that the serialized model only weights about 120kb, so it is all very light to run.

Using Harmony to HACK inputs into the game

One thing that is missing from SMAPI is the ability to feed inputs in the game, and I think that 99.999% of the mods doesn’t need something like that. Mine did, so I found some code using a C# library called Harmony to change an internal function inside the game’s code in runtime so I could hack the game to think that it received a mouse input. This is what I use to let the mod play the game on it’s own. Huge thanks to Drynwynn, author of the mod FishingAutomaton, from which I used a lot of code to set up my mod.