• Register
0 votes
2.2k views

Problem :

I have encountered the following error while compiling "process.py"

 python tools/process.py --input_dir data --            operation resize --outp

ut_dir data2/resize

data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>

  main()

File "tools/process.py", line 167, in main

  src = load(src_path)

File "tools/process.py", line 113, in load

  contents = open(path).read()

      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode

  (result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What may be the cause of the error? I am using Python's version as 3.5.2.

6.9k points

2 Answers

0 votes

Solution :

Here Python is trying to convert the byte-array the bytes which it assumes to be a utf-8-encoded string to a unicode string (str). This process of decoding is according to utf-8 rules. When it is trying this it is encountering a byte sequence which is not allowed in utf-8-encoded strings (Mainly the 0xff at position 0).

As you did not provide any code that we could look at, we can only guess on the rest.

From the stack trace we can guess that the triggering action was at the reading from a file (e.g. contents = open(path).read()). Please recode this in a fashion as shown below:

with open(path, 'rb') as f:
contents = f.read()

The b in the mode specifier in the open() states that the file must be treated as binary, so contents will remain as bytes. And so No decoding attempt will happen in this way.

36.1k points
0 votes

Solution:

Python attempt to convert a byte-array (a bytes which it receive to be a utf-8-encoded string) to a unicode string (str). This method of course is a decoding according to utf-8 rules. At the time it attempts this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (for example this 0xff at position 0).

Because you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can accept that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() condition that the file shall be employed as binary, so contents will keep a bytes. No decoding attempt will occur this way.

Practice this solution it will strip out (ignore) the characters and return the string without them. Just employ this in case your require is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Employ errors='ignore' You'll only lose some characters. however in case your don't care about them as they appear to be extra characters created from a the bad formatting and programming of the clients linking to my socket server. Then its a simple direct solution. 

Had an problem same to this, Ended up employing UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, however it would return the code in UTF format. from there it would be decoded and seperated by lines.

Employ only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

Inspect the path of the file to be read. My code kept on providing me errors until I altered the path name to present performing directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

It easily implies that one chose the wrong encoding to read the file.

On Mac, employ file -I file.txt to trace the accurate encoding. On Linux, employ file -i file.txt.

You have to exercise the encoding as latin1 to read this file as there are few special character in this file, employ the below code snippet to read the file,

import pandas as pd

data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')

print(data.head())

The exact error is here:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

You can't solve it in that sense of middling with the code and solve it. Is a bug which IMO will be quite simple to solve from the developer perspective (modify the encoding of the file). Presently, the only method to remove the package is forcefully, which I don't recommend for any instance.

I view that /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here appears to be a dummy file, and possibly the cause of problems. You must check with file /usr/share/ubuntu-drivers-common/quirks/* whenever there files are not UTF-8, like this:

$ file /mnt/usr/share/ubuntu-drivers-common/quirks/*
/mnt/usr/share/ubuntu-drivers-common/quirks/dell_latitude:        ASCII text
/mnt/usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad:      ASCII text
/mnt/usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here: empty

In case those files are not ASCII text, consider removing them all, then attempt to remove the package again.

15.3k points
edited by

Related questions

0 votes
1 answer 965 views
965 views
Problem : I am new to the Python, I am using Python-2.6 CGI scripts but facing following error in the server log while I was doing json.dumps(), Traceback (most recent call last): File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module> print json.dumps ... = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ') print json.dumps({'current_time': now}) // I guess this is the culprit
asked Nov 24, 2019 alecxe 7.5k points
0 votes
1 answer 207 views
207 views
Problem : Please find below my code for your reference. import os for root, dirs, files in os.walk('Path'):      for file in files:          if file.endswith('.c'):              with open(os.path.join(root, file)) as f:                     for line in f: ... already tried setting it with the open(os.path.join(root, file),'r',encoding='cp932') as f: but got the same above error
asked Jan 31 jwilliam 3.9k points
0 votes
1 answer 390 views
390 views
Problem : Getting bellow error while executing numpy arrays unicodedecodeerror: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)
asked Nov 7, 2019 peterlaw 6.9k points
0 votes
2 answers 206 views
206 views
Problem : I want to read my .csv file into Python (Spyder) but I am facing the error. Please find below my code : import csv mydata = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener") mydata = csv.reader(mydata) print(mydata) I face the following error: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
asked Dec 6, 2019 alecxe 7.5k points
1 vote
1 answer 36 views
36 views
Problem : I am beginner to Python. Currently if I try to use the request module to have the urls text then I often face below Error: Traceback (most recent call last):  File "/usr/local/lib/python3.6/site-packages/requests-2.18.1-py3.5.egg/requests/adapters.py" ... to get the url's text to parse it. Please find below my line of code for your reference: mytext = requests.get(detail_path).mytext
asked May 2 stewart 4k points