• Register
1 vote

Problem :

I am getting following error while trying to read the CSV file with R.
in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : eof within quoted string
6.9k points

2 Answers

0 votes

Solution :

Using read.csv() to read a file with text content is not a good idea.Disabling the quote as set quote="" is only a temporary solution it will only work with Separate quotation marks. There are other reasons which causes the warning, such as some special characters.

The permanent solution(using read.csv()), finding out what those special characters are and use a regular expression to eliminate them is the correct way..

Have you ever installed the package {data.table} and used fread() to read the file. it is much faster and will not bother you with this EOF warning. Please note that the file it loads it will be stored as a data.table object but not a data.frame object. The class data.table has many good features, but anyway, you can transform it using as.data.frame() if needed.

36.1k points
0 votes


STEP 1: download and unzip the file

# download the file
site <- "http://www.informatics.jax.org/downloads/mgigff"
file <- "MGI.20170803.gff3.gz"
url <- paste0(site, "/", file)
if(!file.exists(file)) download.file(url, file)

# unzip to a temporary file
file <- sub(".gz$", "", file)
tmpfile <- tempfile()
remove_tmpfile <- FALSE
if(!file.exists(file)) { # need to unzip
    system(paste0("gunzip -c ", file, ".gz > ", tmpfile))
    remove_tmpfile <- TRUE
    file <- tmpfile


STEP 2:  read it into R with read.table().

tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE)

This gives a warning message:

Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

read.delim() vs read.table()

> read.delim
function (file, header = TRUE, sep = "\\t", quote = "\\"", dec = ".",
    fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
    dec = dec, fill = fill, comment.char = comment.char, ...)
tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE,
                  quote="", fill=FALSE)

You need to disable quoting

cit <- read.csv("citations.CSV", quote = "", 
                 row.names = NULL, 
                 stringsAsFactors = FALSE)

## 'data.frame':    112543 obs. of  13 variables:
##  $ row.names    : chr  "10.2307/675394" "10.2307/30007362" "10.2307/4254931" "10.2307/20537934" ...
##  $ id           : chr  "10.2307/675394\t" "10.2307/30007362\t" "10.2307/4254931\t" "10.2307/20537934\t" ...
##  $ doi          : chr  "Archaeological Inference and Inductive Confirmation\t" "Sound and Sense in Cath Almaine\t" "Oak Galls Preserved by the Eruption of Mount Vesuvius in A.D. 79_ and Their Probable Use\t" "The Arts Four Thousand Years Ago\t" ...
##  $ title        : chr  "Bruce D. Smith\t" "Tomás Ó Cathasaigh\t" "Hiram G. Larew\t" "\t" ...
##  $ author       : chr  "American Anthropologist\t" "Ériu\t" "Economic Botany\t" "The Illustrated Magazine of Art\t" ...
##  $ journaltitle : chr  "79\t" "54\t" "41\t" "1\t" ...
##  $ volume       : chr  "3\t" "\t" "1\t" "3\t" ...
##  $ issue        : chr  "1977-09-01T00:00:00Z\t" "2004-01-01T00:00:00Z\t" "1987-01-01T00:00:00Z\t" "1853-01-01T00:00:00Z\t" ...
##  $ pubdate      : chr  "pp. 598-617\t" "pp. 41-47\t" "pp. 33-40\t" "pp. 171-172\t" ...
##  $ pagerange    : chr  "American Anthropological Association\tWiley\t" "Royal Irish Academy\t" "New York Botanical Garden Press\tSpringer\t" "\t" ...
##  $ publisher    : chr  "fla\t" "fla\t" "fla\t" "fla\t" ...
##  $ type         : logi  NA NA NA NA NA NA ...
##  $ reviewed.work: logi  NA NA NA NA NA NA ...

I think is because of this kind of lines (check "Thorn" and "Minus")

[1] "10.2307/3642839,10.2307/3642839\t,\"Thorn\" and \"Minus\" in Hieroglyphic Luvian Orthography\t,H. Craig Melchert\t,Anatolian Studies\t,38\t,\t,1988-01-01T00:00:00Z\t,pp. 29-42\t,British Institute at Ankara\t,fla\t,\t,"

In the R help section, as pointed out above, just disabling quoting altogether, by simply adding:

    quote = "" 

I also ran into this problem, and was able to work around a similar EOF error using:

read.table("....csv", sep=",", ...)

The readr package will fix this issue.



17.7k points

Related questions

0 votes
1 answer 134 views
Problem : I am very new to the R, and after researching my error as below extensively, I am still unable to find the fix for it. I have already checked my working directory, and also made sure that the files are in a correct directory. Error in file(file, "rt") : ... sulfate", 1:10) In addition: Warning message: In file(file, "rt") : cannot open file './specdata001.csv': No such file or directory
asked Jan 4 alecxe 7.5k points
1 vote
1 answer 25 views
Problem: Hello guys, I am facing an error from my Python program that says python unexpected EOF while parsing. I am a newbie in python. So, it has been tough to understand and decode the problem for me alone. Could any of you please explain this in simple ... this problem and got some solutions but I couldn&rsquo;t crack them at all. I am looking forward to reading your answers. Many thanks.
asked Jun 28 adamSw 11.3k points
1 vote
1 answer 23 views
Problem: Hello people I am doing python programming for a few days. I am still learning. Could anybody please help me by showing how can I import a text file and read five lines from it by my python program? If possible make it as easy words as possible for me. Thanks.
asked Mar 11 Gavin 15.2k points
0 votes
1 answer 783 views
Problem : MySample.csv contains the below details : NAME Id No Dept Tommy 1 12 CS Jimmy 2 35 EC Bonny 3 21 IT Franky 4 61 EE And my Python file contains the below code : import csv myifile = open('mysample.csv', "rb") read = csv.reader(myifile) for row in read : ... in for row in read : _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) How should I fix it?
asked Jan 2 alecxe 7.5k points
0 votes
1 answer 565 views
Problem : When I run my Python code I get the following exception: _csv.error: iterator should return strings, not bytes (did you open the file in text mode?)
asked Nov 18, 2019 peterlaw 6.9k points