Programmers learn & share
0 votes
1.2k views

Problem :

I have encountered the following error while compiling "process.py"

 python tools/process.py --input_dir data --            operation resize --outp

ut_dir data2/resize

data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>

  main()

File "tools/process.py", line 167, in main

  src = load(src_path)

File "tools/process.py", line 113, in load

  contents = open(path).read()

      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode

  (result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What may be the cause of the error? I am using Python's version as 3.5.2.

by (6.9k points)   | 1.2k views

2 Answers

0 votes

Solution :

Here Python is trying to convert the byte-array the bytes which it assumes to be a utf-8-encoded string to a unicode string (str). This process of decoding is according to utf-8 rules. When it is trying this it is encountering a byte sequence which is not allowed in utf-8-encoded strings (Mainly the 0xff at position 0).

As you did not provide any code that we could look at, we can only guess on the rest.

From the stack trace we can guess that the triggering action was at the reading from a file (e.g. contents = open(path).read()). Please recode this in a fashion as shown below:

with open(path, 'rb') as f:
contents = f.read()

The b in the mode specifier in the open() states that the file must be treated as binary, so contents will remain as bytes. And so No decoding attempt will happen in this way.

by (36.1k points)  
0 votes

Solution:

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Since you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. 

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.

use only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

It simply means that one chose the wrong encoding to read the file.

On Mac, use file -I file.txt to find the correct encoding. On Linux, use file -i file.txt.

You have to use the encoding as latin1 to read this file as there are some special character in this file, use the below code snippet to read the file,

import pandas as pd

data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')

print(data.head())

The exact error is here:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

You can't fix it in the sense of middling with the code and fix it. Is a bug which IMO will be quite easy to fix from the developer perspective (modify the encoding of the file). Now, the only way to remove the package is forcefully, which I don't recommend for any case.

I see that /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here seems to be a dummy file, and probably the cause of problems. You should check with file /usr/share/ubuntu-drivers-common/quirks/* whenever there files are not UTF-8, like this:

$ file /mnt/usr/share/ubuntu-drivers-common/quirks/*
/mnt/usr/share/ubuntu-drivers-common/quirks/dell_latitude:        ASCII text
/mnt/usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad:      ASCII text
/mnt/usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here: empty

If those files are not ASCII text, consider removing them all, then try to remove the package again.

ago by (11.2k points)  
2,227 questions
2,734 answers
59 comments
241 users