Problem :

I have the pandas data frame with some of the categorical predictors or variables as 0 & 1, and some of the numeric variables. When I fit that to a stasmodel like below :

est = sm.OLS(y, X).fit()

It throws the below error :

Pandas data cast to numpy dtype of object. Check input data with np.asarray(data). 

I tried to convert all of the the dtypes of the DataFrame using below code:


After this all the dtypes of dataframe variables appeaerd as int32 or int64. But at the end of it, it still shows the dtype: object, like below :

5516        int32
5523        int32
5525        int32
5531        int32
5533        int32
5542        int32
5562        int32
sex         int64
race        int64
dispstd     int64
age_days    int64
dtype: object

Here 5516, 5523 are variable labels.

Any clue? I just need to build the multi-regression model on more than the hundreds of variables. For that I have concatenated the 3 pandas DataFrames to come up with the final DataFrame to be used in the model building.

1 Answer

Solution :

If X is your dataframe, then try to use the .astype method to convert to the float when running your model as shown below:

est = sm.OLS(y, X.astype(float)).fit()


If both the y(dependent) and X are taken from the data frame then type cast both as shown below :-

est = sm.OLS(y.astype(float), X.astype(float)).fit()
