Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.

135 votes
6 answers

Run an OLS regression with Pandas Data Frame

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,...
122 votes
7 answers

Weighted standard deviation in NumPy

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?
89 votes
13 answers

ValueError: numpy.dtype has the wrong size, try recompiling

I just installed pandas and statsmodels package on my python 2.7 When I tried "import pandas as pd", this error message comes out. Can anyone help? Thanks!!! numpy.dtype has the wrong size, try ...
88 votes
10 answers

auto.arima() equivalent for python

I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima() which will tune ...
65 votes
5 answers

Pythonic way of detecting outliers in one dimensional observation data

For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am ...
65 votes
9 answers

Variance Inflation Factor in Python

I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python: a b c d 1 2 4 4 1 2 6 3 2 3 7 4 3 2 8 5 4 1 9 4 I have already done this in R using the ...
63 votes
7 answers

confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels: import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) ...
57 votes
6 answers

Why do I get only one parameter from a statsmodels OLS fit

Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>&...
50 votes
5 answers

Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like: est = sm.OLS(y, X).fit() It throws: Pandas ...
47 votes
11 answers

Where can I find mad (mean absolute deviation) in scipy?

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers: However, ...
46 votes
5 answers

Print 'std err' value from statsmodels OLS results

(Sorry to ask but is currently down and I can't access the docs) I'm doing a linear regression using statsmodels, basically: import statsmodels.api as sm model = ...
46 votes
5 answers

How to extract the regression coefficient from statsmodels.api?

result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c?
44 votes
9 answers

Converting statsmodels summary object to Pandas Dataframe

I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object ...
44 votes
7 answers

ImportError: No module named statsmodels

I downloaded the StatsModels source from this location. Then untarred to /usr/local/lib/python2.7/dist-packages and per this documentation, did this sudo python install It installed but ...
42 votes
3 answers

What's the difference between pandas ACF and statsmodel ACF?

I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa. This is ...
41 votes
7 answers

Highest Posterior Density Region and Central Credible Region

Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in ...
39 votes
4 answers

Using statsmodel estimations with scikit-learn cross validation, is it possible?

I am looking for a way I can use the fit object (result) obtained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be ...
38 votes
1 answer

ANOVA in python using pandas dataframe with statsmodels or scipy?

I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I ...
36 votes
1 answer

How to silence in python

When I want to fit some model in python, I often use fit() method in statsmodels. And some cases I write a script for automating fitting: import statsmodels.formula.api as smf import pandas as pd df =...
35 votes
2 answers

Pandas rolling regression: alternatives to looping

I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS ...
33 votes
3 answers

What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the model and make predictions. ...
32 votes
3 answers

Confidence interval for LOWESS in Python

How would I calculate the confidence intervals for a LOWESS regression in Python? I would like to add these as a shaded region to the LOESS plot created with the following code (other packages than ...
31 votes
2 answers

Why am I getting "LinAlgError: Singular matrix" from grangercausalitytests?

I am trying to run grangercausalitytests on two time series: import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests n = 1000 ls = np.linspace(0, 2*np.pi, ...
31 votes
3 answers

OLS Regression: Scikit vs. Statsmodels? [closed]

Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients ...
30 votes
3 answers

statsmodels linear regression - patsy formula to include all predictors in model

Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include ...
30 votes
2 answers

How to plot statsmodels linear regression (OLS) cleanly

Problem Statement: I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it: Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried ...
29 votes
2 answers

Capturing high multi-collinearity in statsmodels

Say I fit a model in statsmodels mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit() When I do mod.summary() I may see the following: Warnings: [1] The condition ...
29 votes
3 answers

Python statistics package: difference between statsmodel and scipy.stats [closed]

I need some advice on selecting statistics package for Python, I've done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats. ...
27 votes
3 answers

python stats models - quadratic term in regression

I have the following linear regression: import statsmodels.formula.api as sm model = sm.ols(formula = 'a ~ b + c', data = data).fit() I want to add a quadratic term for b in this model. Is there a ...
27 votes
3 answers

How to get the P Value in a Variable from OLSResults in Python?

The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit....
26 votes
2 answers

Error: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting

So I have a CSV file with two columns: date and price, but when I tried to use ARIMA on that time series I encountered this error: ValueWarning: A date index has been provided, but it has no ...
25 votes
3 answers

Fixed effect in Pandas or Statsmodels

Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels. There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is ...
25 votes
1 answer

Python statsmodels ARIMA LinAlgError: SVD did not converge

Background: I'm developing a program using statsmodels that fits 27 arima models (p,d,q=0,1,2) to over 100 variables and chooses the model with the lowest aic and statistically significant t-...
24 votes
4 answers

ImportError: cannot import name 'factorial'

I want to use a logit model and trying to import statsmodels library. My Version: Python 3.6.8 The best suggestion I got is to downgrade scipy but unclear how to and to what version should I ...
24 votes
2 answers

Python statsmodels OLS: how to save learned model to file

I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here. returns the learned model. Is there a way to save it to the file and reload it?...
24 votes
2 answers

What statistics module for python supports one way ANOVA with post hoc tests (Tukey, Scheffe or other)?

I have tried looking through multiple statistics modules for Python but can't seem to find any that support one-way ANOVA post hoc tests.
24 votes
3 answers

Any Python Library Produces Publication Style Regression Tables

I've been using Python for regression analysis. After getting the regression results, I need to summarize all the results into one single table and convert them to LaTex (for publication). Is there ...
24 votes
2 answers

How to get the regression intercept using Statsmodels.api

I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library: import statsmodels.api as sm It prints all the regression analysis ...
24 votes
7 answers

Predicting on new data using locally weighted regression (LOESS/LOWESS)

How to fit a locally weighted regression in python so that it can be used to predict on new data? There is statsmodels.nonparametric.smoothers_lowess.lowess, but it returns the estimates only for the ...
22 votes
5 answers

Changing fig size with statsmodel

I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation. I know that to make networkX ...
22 votes
5 answers

Decomposing trend, seasonal and residual time series elements

I have a DataFrame with a few time series: divida movav12 var varmovav12 Date 2004-01 0 NaN NaN NaN 2004-02 ...
22 votes
3 answers

logit regression and singular Matrix error in Python

am trying to run logit regression for german credit data ( To test the code, I have used only numerical variables and tried regressing it with ...
22 votes
2 answers

Statsmodels ARIMA - Different results using predict() and forecast()

I use ARIMA from statsmodels package in order to predict values from a series: plt.plot(ind, final_results.predict(start=0 ,end=26)) plt.plot(ind, forecast.values) I thought that I would ...
22 votes
2 answers

Difference in Python statsmodels OLS and R's lm

I'm not sure why I'm getting slightly different results for a simple OLS, depending on whether I go through panda's experimental rpy interface to do the regression in R or whether I use statsmodels in ...
22 votes
3 answers

Understanding output from statsmodels grangercausalitytests

I'm new to Granger Causality and would appreciate any advice on understanding/interpreting the results of the python statsmodels output. I've constructed two data sets (sine functions shifted in time ...
21 votes
2 answers

Holt-Winters time series forecasting with statsmodels

I tried forecasting with holt-winters model as shown below but I keep getting a prediction that is not consistent with what I expect. I also showed a visualization of the plot Train = Airline[:130] ...
21 votes
2 answers

Linear regression with dummy/categorical variables

I have a set of data. I have use pandas to convert them in a dummy and categorical variables respectively. So, now I want to know, how to run a multiple linear regression (I am using statsmodels) in ...
21 votes
3 answers

Specifying which category to treat as the base with 'statsmodels'

In understand that when I have a category variable in a model passed to a statsmodels fit that dummy variables will automatically be generated for the categories. For example if I have a variable '...
21 votes
2 answers

Poisson Regression in statsmodels and R

Given the some randomly generated data with 2 columns, 50 rows and integer range between 0-100 With R, the poisson glm and diagnostics plot can be achieved as such: > col=2 > row=50 > ...
20 votes
1 answer

ValueWarning: No frequency information was provided, so inferred frequency MS will be used

I try to fit Autoregression by sm.tsa.statespace.SARIMAX. But I meet a warning, then I want to set frequency information for this model. Who used to meet it, can you help me ? fit1 = sm.tsa....
