• Register
0 votes
8.6k views

Problem :

I have encountered the following error while compiling "process.py"

 python tools/process.py --input_dir data --            operation resize --outp

ut_dir data2/resize

data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>

  main()

File "tools/process.py", line 167, in main

  src = load(src_path)

File "tools/process.py", line 113, in load

  contents = open(path).read()

      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode

  (result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What may be the cause of the error? I am using Python's version as 3.5.2.

6 5 3
6,930 points

Please log in or register to answer this question.

2 Answers

0 votes

Solution :

Here Python is trying to convert the byte-array the bytes which it assumes to be a utf-8-encoded string to a unicode string (str). This process of decoding is according to utf-8 rules. When it is trying this it is encountering a byte sequence which is not allowed in utf-8-encoded strings (Mainly the 0xff at position 0).

As you did not provide any code that we could look at, we can only guess on the rest.

From the stack trace we can guess that the triggering action was at the reading from a file (e.g. contents = open(path).read()). Please recode this in a fashion as shown below:

with open(path, 'rb') as f:
contents = f.read()

The b in the mode specifier in the open() states that the file must be treated as binary, so contents will remain as bytes. And so No decoding attempt will happen in this way.

9 7 4
38,600 points
0 votes

Solution:

Python attempt to convert a byte-array (a bytes which it receive to be a utf-8-encoded string) to a unicode string (str). This method of course is a decoding according to utf-8 rules. At the time it attempts this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (for example this 0xff at position 0).

Because you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can accept that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() condition that the file shall be employed as binary, so contents will keep a bytes. No decoding attempt will occur this way.

Practice this solution it will strip out (ignore) the characters and return the string without them. Just employ this in case your require is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Employ errors='ignore' You'll only lose some characters. however in case your don't care about them as they appear to be extra characters created from a the bad formatting and programming of the clients linking to my socket server. Then its a simple direct solution. 

Had an problem same to this, Ended up employing UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, however it would return the code in UTF format. from there it would be decoded and seperated by lines.

Employ only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

Inspect the path of the file to be read. My code kept on providing me errors until I altered the path name to present performing directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

It easily implies that one chose the wrong encoding to read the file.

On Mac, employ file -I file.txt to trace the accurate encoding. On Linux, employ file -i file.txt.

You have to exercise the encoding as latin1 to read this file as there are few special character in this file, employ the below code snippet to read the file,

import pandas as pd

data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')

print(data.head())

The exact error is here:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

You can't solve it in that sense of middling with the code and solve it. Is a bug which IMO will be quite simple to solve from the developer perspective (modify the encoding of the file). Presently, the only method to remove the package is forcefully, which I don't recommend for any instance.

I view that /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here appears to be a dummy file, and possibly the cause of problems. You must check with file /usr/share/ubuntu-drivers-common/quirks/* whenever there files are not UTF-8, like this:

$ file /mnt/usr/share/ubuntu-drivers-common/quirks/*
/mnt/usr/share/ubuntu-drivers-common/quirks/dell_latitude:        ASCII text
/mnt/usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad:      ASCII text
/mnt/usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here: empty

In case those files are not ASCII text, consider removing them all, then attempt to remove the package again.

10 6 4
31,120 points

Related questions

0 votes
1 answer 19 views
19 views
Problem : What may be the cause of the error: Unicodedecodeerror: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte.
asked Mar 6 Wafa Abu Yousef 6.1k points
0 votes
1 answer 1 view
1 view
Problem: What are my options for dealing with this issue? Unicodedecodeerror: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte.
asked Apr 1 rakib1 51.3k points
0 votes
1 answer 28 views
28 views
Problem: This problem happened with me any help? Unicodedecodeerror: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte.
asked Mar 7 Wafa Abu Yousef 6.1k points
0 votes
1 answer 2.6K views
2.6K views
Problem : I am new to the Python, I am using Python-2.6 CGI scripts but facing following error in the server log while I was doing json.dumps(), Traceback (most recent call last): File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module> print json.dumps ... = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ') print json.dumps({'current_time': now}) // I guess this is the culprit
asked Nov 24, 2019 alecxe 7.5k points
0 votes
1 answer 644 views
644 views
Problem : Please find below my code for your reference. import os for root, dirs, files in os.walk('Path'):      for file in files:          if file.endswith('.c'):              with open(os.path.join(root, file)) as f:                     for line in f: ... already tried setting it with the open(os.path.join(root, file),'r',encoding='cp932') as f: but got the same above error
asked Jan 31, 2020 jwilliam 3.9k points
0 votes
1 answer 1K views
1K views
Problem : Getting bellow error while executing numpy arrays unicodedecodeerror: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
asked Nov 7, 2019 peterlaw 6.9k points
0 votes
1 answer 36 views
36 views
Problem: Hello Kodlogs, I need someone to help me out here. I keep getting an 'invalid byte sequence in utf-8' error while running a code in ruby.
asked Oct 28, 2020 Festus James 380 points
0 votes
0 answers 93 views
93 views
I'm applying Laravel to write a service for mobile. I am trying to figure the solution. Can someone give me the hint?
asked Sep 14, 2020 Daniel Anderson 4k points
0 votes
1 answer 6 views
6 views
Problem: How to fix: &ldquo;UnicodeDecodeError: 'ascii' codec can't decode byte&rdquo;
asked Mar 16 ummesalma 22.9k points
0 votes
1 answer 9 views
9 views
Problem: How to I Solve this: unicodedecodeerror: 'ascii' codec decode byte HELP! I need help understanding this: unicodedecodeerror: 'ascii' codec decode byte Can someone please help solve this, I am new in python(). Someone have idea to solve this thanks!
asked Mar 5 Mashhoodch 13k points