• Register
1 vote
919 views

Problem :

I am getting following error while trying to read the CSV file with R.
in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : eof within quoted string
6 5 3
6,930 points

Please log in or register to answer this question.

2 Answers

0 votes

Solution :

Using read.csv() to read a file with text content is not a good idea.Disabling the quote as set quote="" is only a temporary solution it will only work with Separate quotation marks. There are other reasons which causes the warning, such as some special characters.

The permanent solution(using read.csv()), finding out what those special characters are and use a regular expression to eliminate them is the correct way..

Have you ever installed the package {data.table} and used fread() to read the file. it is much faster and will not bother you with this EOF warning. Please note that the file it loads it will be stored as a data.table object but not a data.frame object. The class data.table has many good features, but anyway, you can transform it using as.data.frame() if needed.

9 7 4
38,600 points
0 votes

Solution:

STEP 1: download and unzip the file

# download the file
site <- "http://www.informatics.jax.org/downloads/mgigff"
file <- "MGI.20170803.gff3.gz"
url <- paste0(site, "/", file)
if(!file.exists(file)) download.file(url, file)

# unzip to a temporary file
file <- sub(".gz$", "", file)
tmpfile <- tempfile()
remove_tmpfile <- FALSE
if(!file.exists(file)) { # need to unzip
    system(paste0("gunzip -c ", file, ".gz > ", tmpfile))
    remove_tmpfile <- TRUE
    file <- tmpfile
}

 

STEP 2:  read it into R with read.table().

tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE)

This gives a warning message:

Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

read.delim() vs read.table()

> read.delim
function (file, header = TRUE, sep = "\\t", quote = "\\"", dec = ".",
    fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
    dec = dec, fill = fill, comment.char = comment.char, ...)
tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE,
                  quote="", fill=FALSE)

You need to disable quoting

cit <- read.csv("citations.CSV", quote = "", 
                 row.names = NULL, 
                 stringsAsFactors = FALSE)

str(cit)
## 'data.frame':    112543 obs. of  13 variables:
##  $ row.names    : chr  "10.2307/675394" "10.2307/30007362" "10.2307/4254931" "10.2307/20537934" ...
##  $ id           : chr  "10.2307/675394\t" "10.2307/30007362\t" "10.2307/4254931\t" "10.2307/20537934\t" ...
##  $ doi          : chr  "Archaeological Inference and Inductive Confirmation\t" "Sound and Sense in Cath Almaine\t" "Oak Galls Preserved by the Eruption of Mount Vesuvius in A.D. 79_ and Their Probable Use\t" "The Arts Four Thousand Years Ago\t" ...
##  $ title        : chr  "Bruce D. Smith\t" "Tomás Ó Cathasaigh\t" "Hiram G. Larew\t" "\t" ...
##  $ author       : chr  "American Anthropologist\t" "Ériu\t" "Economic Botany\t" "The Illustrated Magazine of Art\t" ...
##  $ journaltitle : chr  "79\t" "54\t" "41\t" "1\t" ...
##  $ volume       : chr  "3\t" "\t" "1\t" "3\t" ...
##  $ issue        : chr  "1977-09-01T00:00:00Z\t" "2004-01-01T00:00:00Z\t" "1987-01-01T00:00:00Z\t" "1853-01-01T00:00:00Z\t" ...
##  $ pubdate      : chr  "pp. 598-617\t" "pp. 41-47\t" "pp. 33-40\t" "pp. 171-172\t" ...
##  $ pagerange    : chr  "American Anthropological Association\tWiley\t" "Royal Irish Academy\t" "New York Botanical Garden Press\tSpringer\t" "\t" ...
##  $ publisher    : chr  "fla\t" "fla\t" "fla\t" "fla\t" ...
##  $ type         : logi  NA NA NA NA NA NA ...
##  $ reviewed.work: logi  NA NA NA NA NA NA ...

I think is because of this kind of lines (check "Thorn" and "Minus")

readLines("citations.CSV")[82]
[1] "10.2307/3642839,10.2307/3642839\t,\"Thorn\" and \"Minus\" in Hieroglyphic Luvian Orthography\t,H. Craig Melchert\t,Anatolian Studies\t,38\t,\t,1988-01-01T00:00:00Z\t,pp. 29-42\t,British Institute at Ankara\t,fla\t,\t,"

In the R help section, as pointed out above, just disabling quoting altogether, by simply adding:

    quote = "" 

I also ran into this problem, and was able to work around a similar EOF error using:

read.table("....csv", sep=",", ...)

The readr package will fix this issue.

install.packages('readr')
library(readr)
readr::read_csv('yourfile.csv')

 

10 6 4
31,120 points

Related questions

0 votes
1 answer 5 views
5 views
Problem: in scan(file = file, what = what, sep = sep, quote = quote, dec = dec : eof within quoted string
asked 4 days ago Muneeb Saadii 2.4k points
0 votes
1 answer 374 views
374 views
Problem : I am very new to the R, and after researching my error as below extensively, I am still unable to find the fix for it. I have already checked my working directory, and also made sure that the files are in a correct directory. Error in file(file, "rt") : ... sulfate", 1:10) In addition: Warning message: In file(file, "rt") : cannot open file './specdata001.csv': No such file or directory
asked Jan 4, 2020 alecxe 7.5k points
0 votes
1 answer 11 views
11 views
Problem: I am getting error while running this part of the code. tried some of the existing solutions, none of them helped elec_and_weather = pd.read_csv(r'C:\HOUR.csv', parse_dates=True,index_col=0) # Add historic DEMAND to each X vector for i in ... 24 for k in range(n_hours_advance,n_hours_advance+n_hours_window): elec_and_weather['DEMAND_t-%i'% k] = np.zeros(len(elec_and_weather['DEMAND']))'
asked Feb 17 Mashhoodch 9.1k points
1 vote
1 answer 67 views
67 views
Problem: Hello guys, I am facing an error from my Python program that says python unexpected EOF while parsing. I am a newbie in python. So, it has been tough to understand and decode the problem for me alone. Could any of you please explain this in simple ... this problem and got some solutions but I couldn&rsquo;t crack them at all. I am looking forward to reading your answers. Many thanks.
asked Jun 28, 2020 adamSw 11.3k points
1 vote
1 answer 94 views
94 views
Problem: Hello people I am doing python programming for a few days. I am still learning. Could anybody please help me by showing how can I import a text file and read five lines from it by my python program? If possible make it as easy words as possible for me. Thanks.
asked Mar 11, 2020 Gavin 15.3k points
1 vote
2 answers 1.2K views
1.2K views
Problem: Hello Kodlogs, Myself is a pretty new student in programming. I am wondering if I can even send a file to a function as a parameter to check it&rsquo;s emptiness. Let&rsquo;s say, I have passed a file into a function as a parameter, could it check the file is ... ifstream to FILE. So, could you please tell me, how do you do this? I am looking forward having a solution of it. Thanks.
asked Apr 30, 2020 Gavin 15.3k points
0 votes
1 answer 10 views
10 views
how to read csv file in matlab
asked Jan 25 waji 1.9k points
0 votes
1 answer 32 views
32 views
Problem: How Open CSV file using FileReader object? How Create BufferedReader from FileReader? How Read file line by line using readLine() method?
asked Nov 6, 2020 Mashhoodch 9.1k points
0 votes
1 answer 18 views
18 views
Question: How to select the csv file and then upload the content to mysql. Also, how to prevent the data from being duplicated in php. The field names of the csv is listed below. table: id name // cannot be duplicated talent // cannot be duplicated address email
asked Nov 2, 2020 RJ Lam 650 points
0 votes
1 answer 2.5K views
2.5K views
Problem : MySample.csv contains the below details : NAME Id No Dept Tommy 1 12 CS Jimmy 2 35 EC Bonny 3 21 IT Franky 4 61 EE And my Python file contains the below code : import csv myifile = open('mysample.csv', "rb") read = csv.reader(myifile) for row in read : ... in for row in read : _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) How should I fix it?
asked Jan 2, 2020 alecxe 7.5k points