• Register
0 votes

Problem :

I have the pandas data frame with some of the categorical predictors or variables as 0 & 1, and some of the numeric variables. When I fit that to a stasmodel like below :

est = sm.OLS(y, X).fit()

It throws the below error :

Pandas data cast to numpy dtype of object. Check input data with np.asarray(data). 

I tried to convert all of the the dtypes of the DataFrame using below code:


After this all the dtypes of dataframe variables appeaerd as int32 or int64. But at the end of it, it still shows the dtype: object, like below :

5516        int32
5523        int32
5525        int32
5531        int32
5533        int32
5542        int32
5562        int32
sex         int64
race        int64
dispstd     int64
age_days    int64
dtype: object

Here 5516, 5523 are variable labels.

Any clue? I just need to build the multi-regression model on more than the hundreds of variables. For that I have concatenated the 3 pandas DataFrames to come up with the final DataFrame to be used in the model building.

6 5 3
7,540 points

1 Answer

0 votes

Solution :

If X is your dataframe, then try to use the .astype method to convert to the float when running your model as shown below:

est = sm.OLS(y, X.astype(float)).fit()


If both the y(dependent) and X are taken from the data frame then type cast both as shown below :-

est = sm.OLS(y.astype(float), X.astype(float)).fit()
9 7 4
38,600 points

Related questions

0 votes
1 answer 1.3K views
Problem : I am getting bellow error attributeerror: can only use .str accessor with string values, which use np.object_ dtype in pandas
asked Nov 7, 2019 peterlaw 6.9k points
0 votes
1 answer 1.4K views
Problem : I have the two DataFrames which I would want to merge. I have referred many documents and also tried to perform many operations but I am not sure what to do now. Please find my two DataFrames as below: DataFrame1: id name type currency 0 BTTA.S Apple ... here I met with the exception as below : ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
asked Dec 24, 2019 alecxe 7.5k points
0 votes
1 answer 7 views
Problems I am trying to update selected datetime64 values in a pandas data frame using the loc method to select rows satisfying a condition. However, instead of assigning the new date-time value it results in NaT. Here is a simplification of my code that shows the problem: ... as the second element in the new_date column. Any ideas on how this should be done or why this is not working as intended?
asked Sep 15 Marivoke 530 points
0 votes
1 answer 1.2K views
Problem : Currently I am trying to learn NumPy. I am trying to execute my code but I am facing following error while trying to use my code. TypeError: Cannot cast array data from dtype('float64')            to dtype('S32') according to the rule 'safe' Please Note : My NumPy version is 1.11.0. How can I fix the above error ?
asked Feb 17 mphil 2.3k points
0 votes
1 answer 68 views
Problem: I have currently started learning about using the pandas in ipython notebook: import pandas as pd But I have encountered the below error on my above line of code: AttributeError  Traceback (most recent call last) <ipython-input-17-c7ecb2b0a99d> in <module>() ----> 1 from ... ' I have no knowledge on how to fix the above error, what is a problem here? My python's version is currently 3.6
asked Aug 10 Raphael Pacheco 4.9k points