• Register
0 votes
873 views

Problem :

Please find below my code for your reference.

import os
for root, dirs, files in os.walk('Path'):
     for file in files:
         if file.endswith('.c'):
             with open(os.path.join(root, file)) as f:
                    for line in f:
                        if 'word' in line:
                            print(line)


Currently I am getting below error

“UnicodeDecodeError: 'cp932' codec can't decode byte 0xfc in position 6616: illegal multibyte sequence”

I think my file needs the shift jis encoding. Can i set the encoding at start only? I have already tried setting it with the open(os.path.join(root, file),'r',encoding='cp932') as f: but got the same above error

3.9k points

Please log in or register to answer this question.

1 Answer

0 votes

Solution :

You could pass the errors='ignore', as given below but you need to make sure to check what is a encoding of your files.

open(os.path.join(root, file),'r', encoding='cp932', errors='ignore')

It will not ignore a file completely, but just required characters that cannot be decoded inside your file. Maybe there are only few files or lines incorrectly encoded. You could check that how many of these errors you have by catching your exception and printing a filename.

OR

You can also try using the io library as given below:

io.open(os.path.join(root, file), mode='r', encoding='cp932')

 I am very sure that the above mentioned solutions will be the great help in fixing your error.

38.6k points

Related questions

0 votes
0 answers 37 views
37 views
here is the code : import requests from bs4 import BeautifulSoup import csv import time base_url = 'https://en.wikipedia.org/wiki/' url_list = ['List_of_Japanese_actors', 'List_of_Japanese_actresses'] all_names = [] for i in range(len(url_list)): target_url = base_url + url_list[i ... .writer(f, lineterminator='\n') writer.writerow(['name']) for name in all_names: writer.writerow([name]) f.close()
asked May 21 Yash verma 4k points
0 votes
2 answers 11.5k views
11.5k views
Problem : I have encountered the following error while compiling "process.py" python tools/process.py --input_dir data -- operation resize --outp ut_dir data2/resize data/0.jpg -> data2/resize/0.png Traceback (most recent call last): File "tools/process.py", line 235, in <module ... 0xff in position 0: invalid start byte What may be the cause of the error? I am using Python's version as 3.5.2.
asked Nov 22, 2019 peterlaw 6.9k points
0 votes
1 answer 75 views
75 views
Problem: I know similar questions has been asked already I have seen all of them and tried but of little help. I am using OSX 10.11 El Capitan, python3.6., virtual environment, tried without that also. I am using jupyter notebook and spyder3. I am new ... ._get_header (pandas/_libs/parsers.c:9691)() UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
asked Apr 25 sumaiya simi 43.9k points
0 votes
1 answer 191 views
191 views
Problem : What may be the cause of the error: Unicodedecodeerror: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte.
asked Mar 6 Wafa Abu Yousef 6.1k points
0 votes
1 answer 14 views
14 views
Problem : What is the best way for me to address this problem: Utf8' codec can't decode byte 0x8a in position 29: invalid start byte?
asked Jun 18 Sifat55 108k points
0 votes
1 answer 165 views
165 views
Problem: Someone somehow assist me please : Unicodedecodeerror: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
asked Jun 11 asha 121k points
0 votes
1 answer 477 views
477 views
Problem: Why is the below item failing? Why does it succeed with "latin-1" codec? o = "a test of \xe9 char" #I want this to remain a string as this is what I am receiving v = o.decode("utf-8") Which results in: Traceback (most recent ... 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte
asked Apr 24 anika11 32.2k points
0 votes
1 answer 489 views
489 views
Problem: I am trying to read all PDF files from a folder to look for a number using regular expression. On inspection, the charset for PDFs is 'UTF-8'. Throws this error: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte Tried reading in binary ... .group(1).split('"')[0].strip() print(api) except Exception as e: print(e) Expecting to find API number from PDF files.
asked Apr 24 sumaiya simi 43.9k points
0 votes
1 answer 48 views
48 views
Problem: What are my options for dealing with this issue? Unicodedecodeerror: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte.
asked Apr 1 rakib1 51.5k points
0 votes
1 answer 91 views
91 views
Problem: This problem happened with me any help? Unicodedecodeerror: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte.
asked Mar 7 Wafa Abu Yousef 6.1k points