Open source release! Generalized linear modeling with glum and tabmat.

I’m super excited to announce the release of glum and tabmat. These are the first two open source projects that QuantCo has released! Hopefully there will be many more. glum is a efficient and featureful Python-first library for generalized linear model (GLM) estimation built with an sklearn-style API. We focused a ton on correctness, performance and satisfying a wide range of feature requirements.

While working on this project, my coworkers and I heard repeatedly from folks on other data science or economics teams that they either struggled with the same GLM software problems we had or they had built their own internal GLM tool similar to glum. I’m really happy to be able to help rectify this situation and release something that the whole community can use.

We started working on glum in March 2020 and it’s been in heavy use within QuantCo since July 2020. During that same timeframe, tabmat grew out of our efforts to make glum as fast as possible when we realized that a key performance issue was efficiently handling a mix of dense, sparse and categorical subcomponents. You can read a lot more about the story behind glum.

Performance against glmnet and h2o

glum is at least as feature-complete as glmnet and h2o, two of the most popular existing GLM tools. On top of the capabilities that you know and love from those packages like elastic net regularization, regularization paths and automatic cross validation, we have some additional features that might get people excited:

If you’re interested in using glum, check out the getting started guide and the tutorials!

Different levels of smoothing between adjacent zip-codes when predicting home prices. There’s a full tutorial showing how to do this! Tikhonov smoothing with zipcodes

Posts BIEBook