I started writing the Money Ballers column on IL Indoor in January 2012. While writing the articles that year (or while writing other stat-related articles for this blog), I frequently came across interesting statistics about the previous weekend’s games, and tweeted them with the hash tag #NLLStatOfTheDay (or something similar). I found that a number of people responded to them, whether to ask about them, or mention a similar one, or just to RT them. I figured if I worked a little harder, I could probably come up with one of these every day and if I could find a way to schedule them automatically, this would be a cool thing. So I created @NLLFactOfTheDay (originally @NLLStatOfTheDay but I changed it so I could post things that weren’t stats) in April, 2012.
Since then I have tweeted more than 200 facts, and collected well over 600 followers. This is far more than I have on my own twitter account (recently passed 400! Woo!), which I created three years earlier. I originally did this for NLL fans/stats geeks like me, but there are a lot of NLL players and executives following now, and most tweets are RT’ed at least once. It even got me interviewed on Teddy Jenner’s Off The Crossebar radio show (mine is the Feb 24, 2013 one). It’s safe to say this has become a fair bit more popular than I imagined.
I thought some people might be interested to see where the facts come from. So today we’re going behind the scenes and I’ll let you in on some of the secrets. Note that much of this qualifies as “Behind the scenes at the Money Ballers” as well. This article started to get kind of long, so I’ve broken it up into two. The first part describes where the information comes from, the second describes how I mine the information for the actual facts and post them.
I basically have four sources of information that I use for these facts: three databases and a web site. The contents of the three databases come from from three different sources. The web site is Wikipedia, and many of the lacrosse articles are those that I’ve written or contributed to myself over the years. I’ll start with an overview of the databases (a little bit technical) and then describe where I got the information stored in them.
The Technical Stuff
I have been a software developer for more than twenty years. I’ve been working at Sybase (now part of SAP) for over fifteen years, as part of the SQL Anywhere database server development team. So of course when I needed to create a database, SQL Anywhere was the obvious choice – and not only because I’m very familiar with it. SQL Anywhere has a built-in HTTP server (which I helped patent), which means that I can write procedures in SQL that directly access the data, and then easily create web pages that display the data in any way I want. When I started the Money Ballers, I decided that the best way to compile the points (i.e. easiest, fastest, and most accurate) would be to write a program to download the information from the NLL and then crunch the numbers. I wrote a python script to download the game sheet and parse it, then insert the resulting information into the database.
Extreme Game Detail
Here’s what I get from each game sheet:
- Date/time of game
- Home team, away team
- Final score, winning team, losing team
- Attendance
- For each goal: who scored it, who assisted, what quarter, what time, whether it was power play, shorthanded, empty net, or penalty shot
- For each penalty: who got it, what quarter, what time, what class of penalty (major, minor, misconduct, match, etc.), and what type (holding, high-sticking, fighting, etc.). This includes bench penalties
- For each player: number of goals, assists, points, power play goals, shorthanded goals, loose balls, face offs won, face offs attempted
- For each goalie: number of goals against, saves, minutes, shots on goal
All of this information is pushed into a database, and then I can view one of the many web pages that summarize it. With this information about each game, I can calculate pretty much anything: league standings, league or team scoring leaders, yearly records like most goals/assists/loose balls/etc. in one game, number of power play or shorthanded goals per team, attendance records, and of course the Money Ballers numbers.
There are a few things that the game sheet does not include. For example, penalty shots that are not successful, time spent on the floor, goaltender win/loss (or who started the game), forced turnovers, stuff like that. Another notable thing that I do not have is power play efficiency. I know when PP goals are scored, but I have not put in the work to determine how many power play opportunities there were. There’s more to it than just “a team is on the PP for two minutes after every opposing minor penalty”, since we can have coincident penalties for each team, penalties that overlap, penalties that end early because of goals scored, that sort of thing. I believe I have all the information to calculate it, I just haven’t done it.
Some game sheets are available from 2011 and previous seasons, but they are missing enough information that I don’t use them, so I only have this level of detail for the 2012 and 2013 seasons.
An example
Here’s the “Game summary” section of the page for Philadelphia’s 10-8 win over Rochester on Feb 23, 2013. It lists each goal in order, separated by quarter. The pages are fairly plain; I’ve made no attempt to make them pretty. I don’t care what they look like as long as they’re functional. Note also that none of these pages are available on the internet – they are only on my computer. Click on the image to see a full-size version that might be easier to read.
The goals shaded in dark green are go-ahead goals, light green are tying goals, and red is the game-winner. The “special” column lists those three events as well as power play, shorthanded, empty net or penalty shot. You can see the current score, how far up the winning team is, how many goals in a row the team has scored, how long it’s been since the last goal, and how long it’s been since the last goal by the same team. The last four columns can be calculated given the rest of the information (if the score is 7-5, I know the difference is 2; I don’t really need the program to calculate it for me), but having it there makes patterns and extremes much easier to spot.
In this case, you can see the patterns in the goals – Rochester scored 4, then Philly scored 4, then Rochester scored 2, and so on. You can see that Rochester went almost 19 minutes between their 4th and 5th goal. You can see that Rochester didn’t score in the third and Philly didn’t score in the first, though I do have another chart that lists the number of goals in each quarter.
The idea here isn’t just to give me all the numbers, it’s to give me the numbers in such a way that I can see patterns or outliers. If I look at a game report like this and there are lots of dark and light green rows, then I know it was a close game. If there’s a green line at the top and a red line at the bottom, then one team led throughout the game. If the Special column is filled with PP and SH, then there were a lot of penalties.
All The Games
About a year ago, a reader of this blog sent me an email saying that he had a spreadsheet containing every game ever played in the NLL. He asked if I wanted it, and of course I said yes. I immediately put that into another database, and started creating web pages for that as well. With that, I can reconstruct the final standings of any season, show the entire win-loss history of any team, the head-to-head matchups of any two teams, all kinds of attendance figures, game score records, goal differentials, and even things like a team’s record on a particular day of the week.
All The Players
Between the 2012 and 2013 seasons, the NLL did some work on their web site. For a while, when you clicked on a player, there was a link to “Career stats” but rather than a page listing the stats for that player, the link gave you an Excel spreadsheet with the yearly stats for every player that has ever played in the NLL. I downloaded this spreadsheet and put that in a database as well. Not only can I search for any player and get their overall stats for any season (regular season and playoffs), but I can list all the players with their career stats, or list all the players who’ve ever played for a certain team, or display the final scoring stats for any season (Gary Gait led with 48 points in 1995), list the best seasons in any category, or combine seasons (eg. who scored the most from 1990-1999? Don’t fall over, but it was Gary Gait again, with 536).
I have since found that some of the stats are incorrect and some from the very early years of the league are missing. For example, there are no numbers from the 1987-1989 seasons, and apparently only five players were in the league in 1990. In the 2008 playoffs, did Cory Bomberry really take 360 face-offs in 3 games and only win 8 of them?
I looked at the top 20 single-season loose ball records and saw that Devin Dalep had 193 in 2002 and Erik Miller had 186 in 2003. I happen to know that those guys were goalies, but I notice that no other goalies are listed. In particular, there’s no Watson, no Eliuk, no Dietrich, no O’Toole, no Vinc. None of these elite goalies have had great loose-ball seasons since 2003? Strange. So I looked up their numbers. The same year Dalep had 193, Watson had 10. O’Toole and Eliuk had 11. Steve Dietrich had 0. I have a feeling Dalep’s numbers may not be correct.
Wikipedia
A lot of people have a bad impression of Wikipedia. Since anyone can make changes to it, it’s got to be full of misinformation and crap, right? Well yes, there are a lot of pages that have rather non-encyclopedic content, and there are always pages with incorrect data, but for the most part I find things to be quite accurate. There are lots of people who edit Wikipedia as a hobby, and if any page they are interested in gets edited, they will know about it very quickly and correct any errors or vandalism within minutes. I used to edit Wikipedia quite a bit, and have created many pages on lacrosse players and teams. There is a page on every NLL team that has ever existed, as well as each NLL season since 1987 (and some pages for individual team’s seasons), each NLL award, the Hall of Fame, expansion; entry; and dispersal drafts, and so on. I created most of them.
There used to be a web site called The Outsider’s Guide to the NLL which had tons of press releases, game summaries, stats, and so on from games back in the ’90s and early 2000’s. That site simply vanished one day without a trace, a huge loss to the lacrosse world. But before that happened, I got a fair bit of information from it for the Wikipedia pages, so at least some of the information wasn’t lost. Some of the player pages are unfortunately outdated since I no longer have time to maintain them, but there are others that are doing a great job of keeping things as up-to-date as they can.
Even without the Outsider’s Guide, there’s still a fair bit of information in those pages, so some of the non-stat-related facts come from there.
So now I have a ton of information including lots of historical data as well as incredibly detailed stats on every game in the past two seasons. Tomorrow, I’ll describe what I do with all this information.
Pingback: NLL Stats: now available to everyone | NLL Chatter