• Register
0 votes
1.1k views

Problem :

I am new to RandomForest model. While predicting my test data using the RandomForest model I am often facing below ValueError.

Input contains nan, infinity or a value too large for dtype('float64')
I have spent more than two days on the above error but I am unable to fix above error. Can somebody help me in fixing above error?
8 4 2
2,300 points

Please log in or register to answer this question.

2 Answers

0 votes

Solution :

This had bothered me too in the past. You can use below ways to fix your error.
Mostly by removing all the infinite and null values will solve your problem.

To remove all infinite values you can follow below methods.

df.replace([np.inf, -np.inf], np.nan, inplace=True)
It will remove all the null values in the way you like the very specific value such as the 999, mean values or you can create your own method to assign the missing values
df.fillna(999, inplace=True)

or the below method for mean values.

df.fillna(df.mean(), inplace=True)
You can also use the below written code for guidance on replacing your NaN values with zero values and also the infinity values with the large finite numbers. using numpy.nan_to_num.
df[:] = np.nan_to_num(df)
If you are having values which are larger than the float32 then you must try to run some scaler first.
I hope above solutions will help you in fixing your issue.
5 2 1
4,980 points
0 votes

This can be happened inside scikit and depends upon what you are doing. Read the documentation of the functions you are using. If you are using on which depends, like on your matrix is positive definite and not fulfilling the criteria.

Code Example:

If you are using the code like;

np.isnan(mat.any())   // get the result false

np.isfinite(mat.all()) // get the result True

Obviously this will generate an error message.

Solution:

You can use the above lines as;

np.any(np.isnan(mat))

And

np.all(np.isfinite(mat))

I think you want to check whether any of the elements is NAN, and nor the return value of any function is a number.

Sklearn with pandas:

If you are facing the same issue while using sklearn with pandas. The solution is to reset the index of data frame df before running any sklearn code;f

df = df.reset_index()

Remove some entries like;

df = df[df.label == ‘desired_one’]

Infinite and null values:

In most cases getting rid of infinite and null values can solve this problem.

Getting rid of infinite values:

You can do this by using the code like;

df.replace([np.inf, -np.inf], np.nan, inplace=True)

Getting rid of null values:

Get rid of null values as you like. Specifies values such as 999, or create your function to impute missing values

df.fillna(999, in place = True)

 

11 5 2
3,890 points

Related questions

0 votes
1 answer 4.4K views
4.4K views
Problem : I have the pandas data frame with some of the categorical predictors or variables as 0 & 1, and some of the numeric variables. When I fit that to a stasmodel like below : est = sm.OLS(y, X).fit() It throws the below error : Pandas data ... hundreds of variables. For that I have concatenated the 3 pandas DataFrames to come up with the final DataFrame to be used in the model building.
asked Dec 18, 2019 alecxe 7.5k points
0 votes
1 answer 180 views
180 views
Problem : I have done a lot of research on this extensively without finding any solution on it. I have tried cleaning my data set as follows: library("myraster") impute.mean <- function(l) replace(l, is.na(l) | is.nan(l) | is.infinite(l) , mean(l, na.rm = TRUE)) losses <- ... TRUE, na.rm=TRUE, nan.rm=TRUE) All my research says it should be NA's or Inf's or NaN's in the data but I don't have any
asked Dec 5, 2019 alecxe 7.5k points
1 vote
2 answers 382 views
382 views
Anyone aware of this error. I am facing this issue in this function. It is not going to IF statement where I put my filter condition which is true then simply raise Error and without IF it is working but I want to put the filter condition which checks date between ... "value"]) It is not going to IF statement where I put my filter condition which is true then simply raise Error. Any suggestion?
asked May 8, 2020 Kashish
1 vote
2 answers 1.1K views
1.1K views
Problem : I am receiveing error as the truth value of a dataframe is ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all().
asked Nov 11, 2019 peterlaw 6.9k points
0 votes
1 answer 9 views
9 views
Problem: I have the following dataframe time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2 0 0.002876 0 10 0 NaN NaN NaN NaN NaN 1 0.002986 0 10 0 NaN 0 NaN NaN NaN 2 0.037367 1 10 1 1.000000 0 NaN 0 NaN 3 0.037374 2 10 2 0.500000 1 1.000000 0 ... value too large for dtype('float32').whenever I try to fit the regression modelfit(X_train, y_train) How can we remove both values NaNand -infat the same time?
asked Dec 24, 2020 sasha 5.3k points
0 votes
1 answer 80 views
80 views
Problem : I am currently working on the TensorFlow that uses very "weird" format of uploading the data. I am trying to to use the NumPy or pandas format for the data to be uploaded, to compare it with scikit-learn results.Also I am getting the digit recognition ... value Variable_1 [[Node: gradients/add_grad/Shape_1 = Shape[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_1)]]
asked Nov 30, 2019 alecxe 7.5k points
0 votes
1 answer 119 views
119 views
What is the difference between data type'datetime64 [ns]' and'<M8 [ns]'? I made TimeSeries with Panda. In [346]: from datetime import datetime In [347]: dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7),   .....: datetime(2011, 1, ... .index.dtype dtype('<M8[ns]') What is the difference between the two types'datetime64[ns]' and'<M8[ns]'? And why do you get different types?
asked Aug 29, 2020 sasha 5.3k points
0 votes
1 answer 1.8K views
1.8K views
Problem : I am getting bellow error attributeerror: can only use .str accessor with string values, which use np.object_ dtype in pandas
asked Nov 7, 2019 peterlaw 6.9k points
0 votes
1 answer 257 views
257 views
Problem: I am trying deletesome column and convert some value to column with df2.drop(df2.columns[[0, 1, 3]], axis=1, inplace=True) df2['date'] = df2['date'].map(lambda x: str(x)[1:]) df2['date'] = df2['date'].str.replace(':', ' ', 1) ... : A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead What's the problem?
asked Nov 14, 2020 sasha 5.3k points
1 vote
1 answer 318 views
318 views
Problem : I am very new to Python. While trying to execute my code I am facing below warning C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer ... '] = mydf.review_text.map(lambda x: tokenizer.tokenize(x)) print mydf[:3] I am looking for expert help to fix above error.
asked Apr 27, 2020 stewart 4k points