Reinventing Jakob Junis Using Q-Learning

Junis strikes out Lowrie with slider

Okay, I admit it…I have a little crush on Jakob Junis. I’ve gotten the opportunity to watch him pitch a few times when the Royals have come to the Coliseum; I can’t help but think this guy has the tools for big league success, but for one reason or another he can’t connect the dots. For the past few years Jakob Junis has been a consistently below average starter on a consistently below average Royals team. His woes primarily stem from his ability, or lack thereof, to control his pitches — especially his four-seam fastball which is his second most used pitch in 2019 and the worst in terms of barrel percentage at 9.8%, and pitch value at -23 fastball runs below average. Any analysis that you can find on the internet of Junis in the past two years points to this as his achilles heel. Despite these seemingly indelible negatives Junis has one ace in the hole — his slider, which may be the difference between him wearing a Royals uniform and any given triple-A uniform. The pitch is about as filthy as filthy can get and he hasn’t been shy about using it, as the pitch is equally as effective versus lefties as it is against righties. Junis has slowly digressed in his time in the majors to say the least, winning nine games in each of his three seasons but accruing twelve and fourteen losses respectively in his last two seasons. Again, this is most likely a symptom of scouting reports catching up to him which happens quickly in the Major Leagues. Nonetheless, Junis is still a big league pitcher with a big league slider in an era of baseball in which players are being treated as works-in-progresses, rather than fixed values. Maybe it’s because I’m an A’s fan and I have a unique soft-spot for players who have been historically unproductive and in need reinventing, but I truly believe that Junis is capable of making a significant turnaround. Obviously, if Junis is able to turn his career around, his individual progression will be orthogonal to the struggling Royals who will most likely be in the rebuilding stage for at least a few more years. So the question must be asked — How can the Royals rebuild Jakob Junis? To answer this, I decided to use Q-learning which is a type of reinforcement learning in which an environment exists, and it’s up to an agent to take actions to maximize rewards across a given state-space. Before I go further I should first add that this was inspired by the paper  MONEYBaRL- Exploiting Pitcher Decision-Making Using Reinforcement Learning . If you aren’t familiar with Q-learning,  here  is a great video that is clear and doesn’t take too long to digest. When applied to baseball, the states become the counts in an at-bat, the agent in this instance is Jakob Junis, and the actions are the five pitches in Junis’ arsenal; four-seam fastball, sinker, slider, curve, and changeup. In a 0–0 count, Junis can throw any one of his pitches, and he receives a reward, if no terminal state (a state that ends the at-bat) is reached after the action is performed then the process starts over. The goal of the agent is to create a strategy that maximizes its reward based on a reward function, this is called the policy or the optimal policy. Theres a somewhat tedious formula behind the reward function so I’ll spare you the details and just tell you why I chose the rewards that I did.

Reward Function

The rewards mirror something called ‘linear weights’ in baseball. Linear weights are a way of calculating estimated runs of outcomes in baseball by year. For example, a home run isn’t worth the same in 1964 as it was in 2016 — heck a home run isn’t worth the same in 2010 as it is in 2019. This may sound obvious upon face value, but it’s important to acknowledge baseball as an environment that changes radically through time and any machine learning endeavor into baseball should keep this in mind (if it’s relevant to the project of course). The goal behind having the values of the reward function reflect linear weights is to give the most accurate representation of the environment of baseball at a given time. It’s not ideal but the linear weights I used for this project are from the 2015 MLB season. It’s possible that these weights have changed dramatically since then, and that could be an experiment for another day, but here’s what the reward function looks like.

Tuning Hyperparameters

The Q-learning algorithm was trained using Junis’ 2018 season data along with two parameters — alpha (learning rate) and gamma (discount factor). In order for the agent to learn quickly enough, I found that alpha set at 0.25 was the most justifiable. An alpha set too high will cause the agent to only consider the newest information and ignore all prior information learned, whereas an alpha set to 0.25 will cause the agent to value old information slightly more than newly acquired information. Baseball players are famous for having good memories and learning from their mistakes as well as successes. Pitchers usually won’t change their approaches significantly based on one bad pitch or one bad game, sometimes not even a bad year, this seems congruent with a learning rate of 0.25.

The discount factor is responsible for assessing the importance of future rewards. A gamma value of 1 leads to the agent caring about high long-term rewards later in at bats (ie. strikeouts) slightly more than quick rewards (outs that required minimal pitches to achieve), whereas a gamma set at 0 would mean the agent is short sighted and only cares about immediate rewards. Setting gamma at 0.5 would mean the agent values future and current awards equally. For Junis a strikeout shouldn’t be more or less valuable than a quick two pitch out or ultimately keeping runs off the board which is a pitcher’s main priority. One could argue that the process that I took to get these values was less scientific than ideal. Again, more work needs to be done to iron out the kinks in this method of player analysis.


After running the algorithm on Junis’ 2018 season data, here is the optimal policy produced. The key takeaway here isn’t that Junis should be following this policy literally (only throwing sliders in 0–0 counts, changeups in 0–1 counts, curveballs in 2–0 counts etc.), but rather what the optimal policy suggests about the nature of Junis’ behavior. The optimal policy suggests that Junis should mostly remove the four-seam fastball from his repertoire, making him exclusively a sinkerball pitcher. This could be the refreshing and feasible change that Junis needs. He’s a relatively average velocity pitcher, with a solid ability to keep the ball on the ground with his sinker (his best groundball pitch at 56.8% in 2019). This is not to say that it’ll be an easy transition, but statistically speaking Junis should have more success and induce more ground balls with this strategy — if keeping the ball on the ground is in fact the antidote to Junis’ struggles (which I believe it is). When this optimal policy is applied to Junis’ 2019 season data, here’s what the suggested pitch distribution looks like in comparison to Junis’ actual pitch distribution.

Player Development Analysis

This is a time in which players are constantly reinventing themselves and its up to the Royals organization to surround Junis with the right people to coach him through developing better pitches. One could argue that the suggested pitched mix is a tad slider-heavy — which is a fair. However, pitchers like Lance McCullers, Patrick Corbin, and Rich Hill (among others) have historically relied heavily on their breaking balls as their primary pitch. This doesn’t necessarily have to be Junis’ fate, but his slider continues to be his best pitch and that shouldn’t change anytime soon. In the past, Jakob Junis has struggled with hanging changeups to left handed hitters as he almost exclusively uses the pitch against lefties; the silver lining being that there’s proof Junis can throw an effective changeup as it was his second best pitch behind his slider in terms of WOBA (0.378) and xWOBA (0.327) in 2019 — it’s an issue of consistency rather than provability. After pitching a few innings in spring training late February, Junis said, “I got a strikeout with my curveball, which is something I’ve worked on this winter. I got a swing and miss on my changeup, that was great. I commanded my fastball decent, and threw my slider well too”. In addition to working on his changeup, Junis also tweaked the grip on his curveball which should produce a sharper break, possibly helping him hang less in the zone.

Poor changeup execution
Ideal changeup execution


The Future of Q-Learning Applications

As far as player development goes, this is where guys like me get off the bus. It’s best to give players the information they can use to whatever extent they see fit, and then get the heck out of their way. These are suggestions to help pitchers improve their behavior and pitch mixes — not strict guidelines. There are certainly improvements to be made to this method of player analysis. However, its strength comes with its versatility to combine pitcher behavior with respective pitch value. The algorithm is blind to statistics like PITCHf/x, barrel percentage, spin rate, velocity etc., all things that are important in assessing pitchers but are largely unused or unavailable at the college level. The algorithm doesn’t need these metrics to evaluate what a ‘bad’ pitch is versus what’s a ‘good’ pitch is, these things are more or less implicit within the data, and for that reason I can see future applications being the most effective at the college level where programs not only don’t use the same metrics as professionals but are filled with pitchers who are still figuring themselves out and are in need of a way to assess their pitches beyond just the ‘eye test’. This isn’t to dispute the importance of evaluating a pitcher in person, but it can’t hurt to have a strategy informed by data. There are a few noteworthy obstacles to overcome. One being the that an open source database of college level pitch-by-pitch data is non-existent. Pitch-by-pitch data is the driving force behind the data used in this method of reinforcement learning and college programs likely keep track of it themselves and if they don’t it’s never too late to start. Fortunately on the other hand, it’s through great work done by Robert Frey that collegiate linear weights exist and can be used in the hypothetical collegiate application for the Q-Learning algorithm.

You Might Also Like
Leave a Reply