Behind the scenes at @NLLFactOfTheDay, Pt. 2

Yesterday, I described the sources of information I use for the @NLLFactOfTheDay twitter account. In this (much shorter) article, I discuss how I use all that information to create the facts that I publish.


Putting it all together

Now that I have all of this information, I start combing through for the actual facts. When Teddy Jenner interviewed me on the Off the Crossebar radio show, he referred to me as some kind of stats guru, and others have said similar things. While I greatly appreciate these compliments, let me set the record straight – I’m no savant. I don’t just know all of this stuff. I don’t have millions of stats memorized and running through my head for instant retrieval. At the risk of sounding immodest, I do have a pretty good memory for trivia and such, and I do have a Bachelor of Math degree (though in computer science, not statistics) so I understand the stats, but what I am good at is hunting for anomalies.

If I see a column of numbers that looks like “1, 2, 2, 1, 3, 0, 8, 1, 2”, I want to find out about that 8 and see what it means and if it’s interesting or not. I’ll sort one list by different values (eg. list of teams sorted by PP goals per game and then sorted by SH goals per game) and see if a team stays in the same position, or moves from top to bottom or vice versa. Maybe the league-leading team is in last place in some statistic, or the last-place team is leading in something. I look over the season records for repeated names. Maybe a player set two season records in the same season, or one person holds 8 of the top 10 records in some stat.

I look at the dates that significant events happened to see if there are any coincidences there. I look at the best seasons in different stats (eg. most goals in a season in the league – Athan Iannucci, 71) and for individual teams (eg. most penalty minutes in a season for the Roughnecks – Geoff Snider, 74). I look at career stats for the league (eg. most career goals in the league – John Tavares, 778) or a particular team (eg. most career assists on the Knighthawks – Shawn Williams, 491). I look at team stats (eg. most goals scored in home games – Philadelphia, 2151) and franchise records (eg. fewest goals scored by the Roughnecks in a game – 6, twice). I look at game, team, season, and league attendance records, both total and average.

Occasionally I think about a particular obscure statistic and look deeper into that. Recently, it was how often teams played two overtime games in a single weekend. I did some digging and came up with a few facts about that.

Once I’ve got a few facts I want to publish, I use a web app called HootSuite to schedule them. It allows me to write up the tweet, pick which account to use (mine or @NLLFactOfTheDay), and pick a date/time for the tweet to be sent out. I originally started tweeting at 11:00am, but then for some reason I have since forgotten, changed to 3:00pm.


Now that I’ve written it all down, it sounds like a lot of work, but it’s really not – now that I have the infrastructure in place. Whenever I come across an interesting stat, I fire up HootSuite and add it to the list of scheduled tweets – that takes a minute or two, tops. Once or twice a week, I make a point of spending 15-20 minutes adding new ones, so I usually have about a week’s worth done ahead of time. If I went off the grid right now, you’d still be seeing a tweet every day at 3pm for another 8 or 9 days. If something interesting comes up in the meantime, I can reschedule upcoming tweets to add in a more timely one. The hardest parts now are (a) try not to repeat any facts I’ve published before, and (b) to squash all the information down into 140 characters.

So that’s pretty much it – that’s how the magic happens. I say that facetiously, because there’s no magic here. I’m just a stats geek that enjoys sharing the facts that I find with others that might also be interested. For the cynics among you, I’m not doing this for the money – I haven’t made a dime off of this little project. In fact, I’m not sure how I could monetize it even if I wanted to. It’s just fun to come up with these things, fun to get replies from people asking questions about them, and fun to see that others enjoyed them enough to retweet them. If you follow, thanks, and I hope you enjoy reading these facts as much as I enjoy finding them.

Behind the scenes at @NLLFactOfTheDay, Pt. 1

I started writing the Money Ballers column on IL Indoor in January 2012. While writing the articles that year (or while writing other stat-related articles for this blog), I frequently came across interesting statistics about the previous weekend’s games, and tweeted them with the hash tag #NLLStatOfTheDay (or something similar). I found that a number of people responded to them, whether to ask about them, or mention a similar one, or just to RT them. I figured if I worked a little harder, I could probably come up with one of these every day and if I could find a way to schedule them automatically, this would be a cool thing. So I created @NLLFactOfTheDay (originally @NLLStatOfTheDay but I changed it so I could post things that weren’t stats) in April, 2012.

Since then I have tweeted more than 200 facts, and collected well over 600 followers. This is far more than I have on my own twitter account (recently passed 400! Woo!), which I created three years earlier. I originally did this for NLL fans/stats geeks like me, but there are a lot of NLL players and executives following now, and most tweets are RT’ed at least once. It even got me interviewed on Teddy Jenner’s Off The Crossebar radio show (mine is the Feb 24, 2013 one). It’s safe to say this has become a fair bit more popular than I imagined.

I thought some people might be interested to see where the facts come from. So today we’re going behind the scenes and I’ll let you in on some of the secrets. Note that much of this qualifies as “Behind the scenes at the Money Ballers” as well. This article started to get kind of long, so I’ve broken it up into two. The first part describes where the information comes from, the second describes how I mine the information for the actual facts and post them.


I basically have four sources of information that I use for these facts: three databases and a web site. The contents of the three databases come from from three different sources. The web site is Wikipedia, and many of the lacrosse articles are those that I’ve written or contributed to myself over the years. I’ll start with an overview of the databases (a little bit technical) and then describe where I got the information stored in them.

The Technical Stuff

I have been a software developer for more than twenty years. I’ve been working at Sybase (now part of SAP) for over fifteen years, as part of the SQL Anywhere database server development team. So of course when I needed to create a database, SQL Anywhere was the obvious choice – and not only because I’m very familiar with it. SQL Anywhere has a built-in HTTP server (which I helped patent), which means that I can write procedures in SQL that directly access the data, and then easily create web pages that display the data in any way I want. When I started the Money Ballers, I decided that the best way to compile the points (i.e. easiest, fastest, and most accurate) would be to write a program to download the information from the NLL and then crunch the numbers. I wrote a python script to download the game sheet and parse it, then insert the resulting information into the database.

Extreme Game Detail

Here’s what I get from each game sheet:

  • Date/time of game
  • Home team, away team
  • Final score, winning team, losing team
  • Attendance
  • For each goal: who scored it, who assisted, what quarter, what time, whether it was power play, shorthanded, empty net, or penalty shot
  • For each penalty: who got it, what quarter, what time, what class of penalty (major, minor, misconduct, match, etc.), and what type (holding, high-sticking, fighting, etc.). This includes bench penalties
  • For each player: number of goals, assists, points, power play goals, shorthanded goals, loose balls, face offs won, face offs attempted
  • For each goalie: number of goals against, saves, minutes, shots on goal

All of this information is pushed into a database, and then I can view one of the many web pages that summarize it. With this information about each game, I can calculate pretty much anything: league standings, league or team scoring leaders, yearly records like most goals/assists/loose balls/etc. in one game, number of power play or shorthanded goals per team, attendance records, and of course the Money Ballers numbers.

There are a few things that the game sheet does not include. For example, penalty shots that are not successful, time spent on the floor, goaltender win/loss (or who started the game), forced turnovers, stuff like that. Another notable thing that I do not have is power play efficiency. I know when PP goals are scored, but I have not put in the work to determine how many power play opportunities there were. There’s more to it than just “a team is on the PP for two minutes after every opposing minor penalty”, since we can have coincident penalties for each team, penalties that overlap, penalties that end early because of goals scored, that sort of thing. I believe I have all the information to calculate it, I just haven’t done it.

Some game sheets are available from 2011 and previous seasons, but they are missing enough information that I don’t use them, so I only have this level of detail for the 2012 and 2013 seasons.

An example

Here’s the “Game summary” section of the page for Philadelphia’s 10-8 win over Rochester on Feb 23, 2013. It lists each goal in order, separated by quarter. The pages are fairly plain; I’ve made no attempt to make them pretty. I don’t care what they look like as long as they’re functional. Note also that none of these pages are available on the internet – they are only on my computer. Click on the image to see a full-size version that might be easier to read.

Click to embiggen

The goals shaded in dark green are go-ahead goals, light green are tying goals, and red is the game-winner. The “special” column lists those three events as well as power play, shorthanded, empty net or penalty shot. You can see the current score, how far up the winning team is, how many goals in a row the team has scored, how long it’s been since the last goal, and how long it’s been since the last goal by the same team. The last four columns can be calculated given the rest of the information (if the score is 7-5, I know the difference is 2; I don’t really need the program to calculate it for me), but having it there makes patterns and extremes much easier to spot.

In this case, you can see the patterns in the goals – Rochester scored 4, then Philly scored 4, then Rochester scored 2, and so on. You can see that Rochester went almost 19 minutes between their 4th and 5th goal. You can see that Rochester didn’t score in the third and Philly didn’t score in the first, though I do have another chart that lists the number of goals in each quarter.

The idea here isn’t just to give me all the numbers, it’s to give me the numbers in such a way that I can see patterns or outliers. If I look at a game report like this and there are lots of dark and light green rows, then I know it was a close game. If there’s a green line at the top and a red line at the bottom, then one team led throughout the game. If the Special column is filled with PP and SH, then there were a lot of penalties.


All The Games

About a year ago, a reader of this blog sent me an email saying that he had a spreadsheet containing every game ever played in the NLL. He asked if I wanted it, and of course I said yes. I immediately put that into another database, and started creating web pages for that as well. With that, I can reconstruct the final standings of any season, show the entire win-loss history of any team, the head-to-head matchups of any two teams, all kinds of attendance figures, game score records, goal differentials, and even things like a team’s record on a particular day of the week.


All The Players

Between the 2012 and 2013 seasons, the NLL did some work on their web site. For a while, when you clicked on a player, there was a link to “Career stats” but rather than a page listing the stats for that player, the link gave you an Excel spreadsheet with the yearly stats for every player that has ever played in the NLL. I downloaded this spreadsheet and put that in a database as well. Not only can I search for any player and get their overall stats for any season (regular season and playoffs), but I can list all the players with their career stats, or list all the players who’ve ever played for a certain team, or display the final scoring stats for any season (Gary Gait led with 48 points in 1995), list the best seasons in any category, or combine seasons (eg. who scored the most from 1990-1999? Don’t fall over, but it was Gary Gait again, with 536).

I have since found that some of the stats are incorrect and some from the very early years of the league are missing. For example, there are no numbers from the 1987-1989 seasons, and apparently only five players were in the league in 1990. In the 2008 playoffs, did Cory Bomberry really take 360 face-offs in 3 games and only win 8 of them?

I looked at the top 20 single-season loose ball records and saw that Devin Dalep had 193 in 2002 and Erik Miller had 186 in 2003. I happen to know that those guys were goalies, but I notice that no other goalies are listed. In particular, there’s no Watson, no Eliuk, no Dietrich, no O’Toole, no Vinc. None of these elite goalies have had great loose-ball seasons since 2003? Strange. So I looked up their numbers. The same year Dalep had 193, Watson had 10. O’Toole and Eliuk had 11. Steve Dietrich had 0. I have a feeling Dalep’s numbers may not be correct.


Wikipedia

A lot of people have a bad impression of Wikipedia. Since anyone can make changes to it, it’s got to be full of misinformation and crap, right? Well yes, there are a lot of pages that have rather non-encyclopedic content, and there are always pages with incorrect data, but for the most part I find things to be quite accurate. There are lots of people who edit Wikipedia as a hobby, and if any page they are interested in gets edited, they will know about it very quickly and correct any errors or vandalism within minutes. I used to edit Wikipedia quite a bit, and have created many pages on lacrosse players and teams. There is a page on every NLL team that has ever existed, as well as each NLL season since 1987 (and some pages for individual team’s seasons), each NLL award, the Hall of Fame, expansion; entry; and dispersal drafts, and so on. I created most of them.

There used to be a web site called The Outsider’s Guide to the NLL which had tons of press releases, game summaries, stats, and so on from games back in the ’90s and early 2000’s. That site simply vanished one day without a trace, a huge loss to the lacrosse world. But before that happened, I got a fair bit of information from it for the Wikipedia pages, so at least some of the information wasn’t lost. Some of the player pages are unfortunately outdated since I no longer have time to maintain them, but there are others that are doing a great job of keeping things as up-to-date as they can.

Even without the Outsider’s Guide, there’s still a fair bit of information in those pages, so some of the non-stat-related facts come from there.


So now I have a ton of information including lots of historical data as well as incredibly detailed stats on every game in the past two seasons. Tomorrow, I’ll describe what I do with all this information.

Have your say

Hey NLL Chatter readers! I am looking for feedback on this blog, and this is your chance to let me know what you think. Are articles too long or short? Are there too many or too few stats-based articles? I make no apologies for being a Rock fan, but am I too biased in that direction?

I’ve created a small survey over at SurveyMonkey.com and I would greatly appreciate it if you would head on over there and fill it out. There are only 10 questions, all but one are multiple choice, and the one that’s not is optional. All responses are completely anonymous.

It’ll take you 30 seconds and it would really help me out, so please click here to go to the survey.

Thank you for your help!

NLL Chatter now on Facebook

You can now follow your favourite NLL blog on Facebook! If you’re a Facebook user, just head on over to https://www.facebook.com/NLLChatter, click Like, and join the party. I’ll post links to articles when I write them, questions, polls, that kind of stuff.

Since I’m talking about social networking anyway, let’s keep going. I haven’t set up a blog-specific twitter account but if you’re on twitter, you can follow me at @GraemePerrow. I’m on Google+ as well, though I’m less active there.

Lacrosse memories: 2012

One of the advantages to having a personal lacrosse blog is that I can write an article about myself. It’s rare, but here’s one of them.

It’s standard at the end of a year to go over the year and think about special things that happened during the previous 12 months. I had a few, the best of which were my wife’s graduation from teacher’s college and the birth of my niece Elizabeth. But this is a lacrosse blog, so let’s take a look at some of my more memorable lacrosse-related events from 2012.

January 16: My first article for IL Indoor was published. It was very exciting to see my name on an article on that site, which I’ve been reading since it was NLL Insider. IL Indoor has a +/- voting system on each article, and this article has a +4 rating with 14 votes. After crunching the numbers (half the difference is the number of down-votes), we find that nine people read the article and then clicked plus while five people clicked minus. To the nine: thanks! To the five: Sorry to disappoint, and hopefully you felt that things improved throughout the season. Note that I have no idea how many people read the article without clicking anything but I would guess that it’s the vast majority.

February 24: My first interview, done over email with Teddy Jenner, is published. With apologies to Delta Airlines, Teddy loves to talk lacrosse, and it shows.

February 26: The All-Star Game in Buffalo. This was my second NLL All-Star game (I was also at the one in Toronto in 2006), and despite the fact that it wasn’t a “normal” lacrosse game (almost no penalties, very little defense, pretty much no hitting), I enjoyed it.

April 27: My interview with future NLL Hall of Famer Steve Toll is published. This was my first “real” interview as it was done over the phone, and not over email like Teddy’s.

September 7: I attended Game 1 of the Mann Cup. The Langley Thunder defeated the Peterborough Lakers 13-12 in a bit of a surprising game. I don’t really follow the MLS or WLA that closely, but everything I’d read said “Lakers in a cakewalk”. That first game (and the second game which Langley also won) proved them wrong. The Lakers did win the next four to win it all, but it wasn’t as easy as many said it would be. We were sitting only a couple of rows back from the glass, in the row behind John Grant’s sister and her family. At one point, someone (not Grant) tried an over-the-shoulder shot but missed the net. I said “Who do you think you are, John Grant?”. She smiled.

October 1: The NLL Entry Draft. This was my first time at an NLL event that wasn’t a game, and I found it fascinating to see bits of the “inner workings” of the league.

October 31: My only non-Moneyballers article on IL Indoor was published, a story about Calgary’s Scott Ranger and how he deals with Type 1 diabetes. I’m very proud of this article, which received 43 votes, 41 of which were plus‘s.

December 18: The Toronto Rock’s first-ever town hall meeting. Checking out the Rock dressing room at the new Toronto Rock Athletic Complex was very cool. I also asked a question which both Colin Doyle and Garrett Billings answered.

NLL Chatter: The First Season

So here we are at the end of the first season of NLL Chatter. For those of you who have been reading for the whole season, a very big thank you! I hope it has been exciting and informative! Well perhaps not, but I hope it’s been entertaining. OK, so I hope it didn’t suck too badly. Today I’m going to take a quick look back over the first season of this blog, so please forgive my short journey into narcissism.

My first posting was December 14, 2011, and the article you are reading is the 80th one posted. Many are game reports and there was one “picks” article per week. There were some stats analysis, some opinions, a couple of interviews, and I tried to be funny a few times.

A few of my personal favourite postings, in no particular order:

The most viewed articles over the last six months:

Are you seeing a trend here? My most popular articles are the ones that other people helped to promote. This should not surprise anyone – that’s how the web works. But I think I need to engage my readers a little more. I got a few comments and even a few RT’s on twitter about some articles, but after the 79 postings, there are a grand total of six comments on the blog itself. I didn’t even get any spam comments that I had to delete. Obviously, this means that everyone must have agreed with all of my opinions and comments. Note to self: be more controversial next year!

Anyway, thanks again for joining me throughout this season. I don’t really follow the summer leagues much, nor the MLL, so don’t expect too many new articles over the next few months. I will probably post something when big trades are made, or free agent signings, or at the entry draft, and then again when training camp begins next season. I also have a couple of stats-analysis things in the queue. This season is barely over, and I’m already looking forward to the next one.

Until then, you can follow me on twitter (@graemeperrow), and don’t forget to follow @NLLFactOfTheDay for a daily dose of fun and sometimes goofy NLL trivia (once the NLL season begins again). Enjoy your summer!

The best lacrosse writers in the world… and me

This coming NLL season, I will be writing for the two best lacrosse blogs in the world! This one (obviously) and now I can announce that I will be joining the staff at ILIndoor.com. I will be writing a weekly column called The Moneyballers every Monday starting on January 16.

ILIndoor.com has some of best-known names in indoor lacrosse, including the aforementioned Teddy Jenner, a former player and current blogger, radio show host, and in-game announcer for the Washington Stealth; Ty Pilson, sports editor for the Calgary Sun and former Tom Borrelli winner (that’s the NLL’s award for the best writer of the year); Brian Shanahan, another former NLL player who has done colour for many lacrosse TV broadcasts (and yes, he’s Brendan’s brother); Marty O’Neill, the former GM of the Minnesota Swarm; and other great writers like Bob Chavez, Stephen Stamp and Casey Vock.

The Moneyballers will be a weekly look at the clutch players in the league from a statistical point of view. We have a system that assigns points to players for goals and assists that either tie a game or put their team ahead. Goals later in the game count for more than goals earlier, and OT goals count the most. Each week, I will tally up the points for that week’s games, and keep track of the league leaders as the season goes on. Here’s a link to last year’s season-ending article.

I am very excited about this opportunity, but very nervous as well. The Moneyballers is a series that has been on ILIndoor.com for a few years, and up to this year, it was written by another legendary lacrosse writer, Paul Tutka. Tutka won three straight Tom Borrelli awards, so that’s a pretty tough act to follow. However, I am up to the challenge. But if you call me on a Sunday evening during NLL season, don’t expect me to answer the phone.