![]() If you’re a frequent FanGraphs reader, you’re probably familiar with at least one statistical metric: r², the square of the correlation coefficient. (The ‘pch’ argument sets the shape of the data points ‘xlim’ and ‘ylim’ set the extremes of each axis.)Īgain, a decent correlation–but just *how* decent? Let’s turn to the numbers. So instead of comparing ISO to SLG, let’s see how ISO holds up year-to-year: plot(yby$ISO.x, yby$ISO.y, pch=20, col="red", main="ISO year-over-year trends", xlab="ISO 2013", ylab="ISO 2014") (Players who only appeared in one season will be omitted automatically.) yby= merge(fg13, fg14, by=("Name")) head(yby)Īs you can see, 2013 stats have an. This will create one large dataset with two sets of columns: one with a player’s 2013 stats and one with their 2014 stats. Subset your data into 20 sets, fg13 = subset(fgdata, Season = "2013") fg14 = subset(fgdata, Season = "2014") There are several ways you could do this in R, but we’ll use a fairly straightforward one. So let’s take a different tack and look at year-over-year trends. ![]() Results-wise, we’re starting to push against the limits of our data set–too many of these stats are directly connected to find anything interesting. Unsurprisingly, slugging percentage and ISO are fairly well-correlated. You can create bar charts, pie charts, and all of that, but you’re probably more interested in everyone’s favorite, the scatterplot.Īt its most basic, the plot function is literally plot() with the two variables you want to compare: (It’ll show up in the same directory you’re loading your data set from.) When you want to save your plots, you can copy them to your clipboard–or create and save an image file directly from R: png(file="whatisitgoodfor.png",width=400,height=350) hist(fgdata$WAR, breaks=25) dev.off() ![]() You can also plot multiple charts at the same time–use the par(mfrow) function with the preferred number of rows and columns: par(mfrow=c(2,2)) hist(fgdata$wOBA, breaks=25) hist(fgdata$wRC, breaks=25) hist(fgdata$Off, breaks=25) hist(fgdata$BABIP, breaks=25) (In the first line above, “freq = FALSE” indicates that the y-axis will be a probability density rather than a frequency count the second line creates a normal curve with the same mean and standard deviation as your data set. hist(fgdata$wRC, breaks=25, freq = FALSE, main="Distribution of wRC+, 2013 - 2014", xlab="wRC+", ylab= NULL, col="darkorange2") curve(dnorm(x, mean=mean(fgdata$wRC), sd=sd(fgdata$wRC)), add=TRUE, col="darkblue", lwd=2) if you’re more familiar with them.Ī bit better, right? The distribution doesn’t look quite as normal now, but it’s still pretty close–we can actually add a bell curve to eyeball far off it is. R recognizes a pretty wide range of colors, though you can use RGB, hex, etc. In this command, ‘breaks’ is the number of bars in the chart, ‘main’ is the chart title, ‘xlab’ and ‘ylab’ are the axis titles, and ‘col’ is the color. hist(fgdata$wRC, breaks=25, main="Distribution of wRC+, 2013 - 2014", xlab="wRC+", ylab= NULL, col="darkorange2") You can spend endless amounts of time customizing charts in R, but let’s add a few parameters to make this look nicer. The hist() function, right out of the box, displays the data and does it quickly–but it doesn’t look that great. (You can confirm this quantitatively by using a function like summary(fgdata$wRC).) This histogram looks like a pretty normal, bell-curveish distribution, with an average a bit over 100–which makes sense, since the players with a below-average wRC+ won’t get enough playing time to qualify for our data set. This Instant Histogram(™) displays how many players have a wRC+ in the range a given bar takes up in the x-axis. You might think you have to run a bunch of different commands to name the type of chart, load your data into the chart, plot all the points, and so on? Nope: hist(fgdata$wRC) Let’s say you want to make a histogram–a chart that plots the frequency counts of a given variable. Possibly my favorite thing about R is how, often, all it takes is a very short function to create something pretty cool. We’ll be using the same set of 2013-14 batter data that we did last time, so download that (if you haven’t already) and load it back up in R: fgdata = read.csv("FGdat.csv") ![]() I’ll keep taking screenshots in the R console for consistency, but feel free to try out an IDE and see if it works for you.) Look At That Data Integrated development environments, like RStudio, work similarly to the basic R console, but provide helpful features like code autocompletion, better-integrated documentation, etc. (Before we start, one commenter reminded me that it can be very helpful to use an IDE when coding.
0 Comments
Leave a Reply. |