• Register
1 vote
1.2k views

Problem :

I am getting following error while trying to read the CSV file with R.
in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : eof within quoted string
6 5 3
6,930 points

Please log in or register to answer this question.

2 Answers

0 votes

Solution :

Using read.csv() to read a file with text content is not a good idea.Disabling the quote as set quote="" is only a temporary solution it will only work with Separate quotation marks. There are other reasons which causes the warning, such as some special characters.

The permanent solution(using read.csv()), finding out what those special characters are and use a regular expression to eliminate them is the correct way..

Have you ever installed the package {data.table} and used fread() to read the file. it is much faster and will not bother you with this EOF warning. Please note that the file it loads it will be stored as a data.table object but not a data.frame object. The class data.table has many good features, but anyway, you can transform it using as.data.frame() if needed.

9 7 4
38,600 points
0 votes

Solution:

STEP 1: download and unzip the file

# download the file
site <- "http://www.informatics.jax.org/downloads/mgigff"
file <- "MGI.20170803.gff3.gz"
url <- paste0(site, "/", file)
if(!file.exists(file)) download.file(url, file)

# unzip to a temporary file
file <- sub(".gz$", "", file)
tmpfile <- tempfile()
remove_tmpfile <- FALSE
if(!file.exists(file)) { # need to unzip
    system(paste0("gunzip -c ", file, ".gz > ", tmpfile))
    remove_tmpfile <- TRUE
    file <- tmpfile
}

 

STEP 2:  read it into R with read.table().

tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE)

This gives a warning message:

Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

read.delim() vs read.table()

> read.delim
function (file, header = TRUE, sep = "\\t", quote = "\\"", dec = ".",
    fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
    dec = dec, fill = fill, comment.char = comment.char, ...)
tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE,
                  quote="", fill=FALSE)

You need to disable quoting

cit <- read.csv("citations.CSV", quote = "", 
                 row.names = NULL, 
                 stringsAsFactors = FALSE)

str(cit)
## 'data.frame':    112543 obs. of  13 variables:
##  $ row.names    : chr  "10.2307/675394" "10.2307/30007362" "10.2307/4254931" "10.2307/20537934" ...
##  $ id           : chr  "10.2307/675394\t" "10.2307/30007362\t" "10.2307/4254931\t" "10.2307/20537934\t" ...
##  $ doi          : chr  "Archaeological Inference and Inductive Confirmation\t" "Sound and Sense in Cath Almaine\t" "Oak Galls Preserved by the Eruption of Mount Vesuvius in A.D. 79_ and Their Probable Use\t" "The Arts Four Thousand Years Ago\t" ...
##  $ title        : chr  "Bruce D. Smith\t" "Tomás Ó Cathasaigh\t" "Hiram G. Larew\t" "\t" ...
##  $ author       : chr  "American Anthropologist\t" "Ériu\t" "Economic Botany\t" "The Illustrated Magazine of Art\t" ...
##  $ journaltitle : chr  "79\t" "54\t" "41\t" "1\t" ...
##  $ volume       : chr  "3\t" "\t" "1\t" "3\t" ...
##  $ issue        : chr  "1977-09-01T00:00:00Z\t" "2004-01-01T00:00:00Z\t" "1987-01-01T00:00:00Z\t" "1853-01-01T00:00:00Z\t" ...
##  $ pubdate      : chr  "pp. 598-617\t" "pp. 41-47\t" "pp. 33-40\t" "pp. 171-172\t" ...
##  $ pagerange    : chr  "American Anthropological Association\tWiley\t" "Royal Irish Academy\t" "New York Botanical Garden Press\tSpringer\t" "\t" ...
##  $ publisher    : chr  "fla\t" "fla\t" "fla\t" "fla\t" ...
##  $ type         : logi  NA NA NA NA NA NA ...
##  $ reviewed.work: logi  NA NA NA NA NA NA ...

I think is because of this kind of lines (check "Thorn" and "Minus")

readLines("citations.CSV")[82]
[1] "10.2307/3642839,10.2307/3642839\t,\"Thorn\" and \"Minus\" in Hieroglyphic Luvian Orthography\t,H. Craig Melchert\t,Anatolian Studies\t,38\t,\t,1988-01-01T00:00:00Z\t,pp. 29-42\t,British Institute at Ankara\t,fla\t,\t,"

In the R help section, as pointed out above, just disabling quoting altogether, by simply adding:

    quote = "" 

I also ran into this problem, and was able to work around a similar EOF error using:

read.table("....csv", sep=",", ...)

The readr package will fix this issue.

install.packages('readr')
library(readr)
readr::read_csv('yourfile.csv')

 

10 6 4
31,120 points

Related questions

0 votes
1 answer 6 views
6 views
Problem: I have a CSV file (24.1 MB) that I cannot fully read into my R session. When I open the file in a spreadsheet program I can see 112,544 rows. When I read it into R with read.csv I only get 56,952 rows and this warning: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : EOF within quoted string
asked Apr 23 ummeshani 9.5k points
0 votes
1 answer 18 views
18 views
Problem: in scan(file = file, what = what, sep = sep, quote = quote, dec = dec : eof within quoted string
asked Feb 23 Muneeb Saadii 130k points
0 votes
1 answer 34 views
34 views
Problem: How to fix this error: eof while scanning triple-quoted string literal HELP! I need help understanding this error: eof while scanning triple-quoted string literal Can someone please help fix this, I am new in python() : python plot line with slope and intercept
asked Mar 1 Mashhoodch 13k points
0 votes
1 answer 4 views
4 views
Problem: How to use EOF to run through a text file in C? PLease suggest a perfect answer. Your help will be appreciated?
asked Apr 10 Sana8989 8.9k points
0 votes
1 answer 2 views
2 views
Problem: Hi there! I am a new learner and I have written a program that is meant to add the corresponding items of two integer lists. When I run this program, I get the following error: SyntaxError: unexpected EOF while parsing I know this must be something very basic but I am ... var, "+", var2, "=", var + var2 Please give the solution to this error and also explain what does EOF mean? Thank you!
asked Apr 19 Code Learner 9.8k points
0 votes
1 answer 16 views
16 views
Problem: Hi please help me with this error. Syntaxerror: unexpected eof while parsing python
asked Mar 20 PkGuy 23.5k points
0 votes
1 answer 18 views
18 views
Problem: I am getting error while running this part of the code. tried some of the existing solutions, none of them helped elec_and_weather = pd.read_csv(r'C:\HOUR.csv', parse_dates=True,index_col=0) # Add historic DEMAND to each X vector for i in ... 24 for k in range(n_hours_advance,n_hours_advance+n_hours_window): elec_and_weather['DEMAND_t-%i'% k] = np.zeros(len(elec_and_weather['DEMAND']))'
asked Feb 17 Mashhoodch 13k points
1 vote
1 answer 102 views
102 views
Problem: Hello guys, I am facing an error from my Python program that says python unexpected EOF while parsing. I am a newbie in python. So, it has been tough to understand and decode the problem for me alone. Could any of you please explain this in simple ... this problem and got some solutions but I couldn&rsquo;t crack them at all. I am looking forward to reading your answers. Many thanks.
asked Jun 28, 2020 adamSw 11.3k points
1 vote
1 answer 13 views
13 views
Problem: Can some tell me that what is this error&raquo; Error in gzfile(file, "wb") : cannot open the connection.
asked Apr 27 PkGuy 23.5k points
0 votes
1 answer 14 views
14 views
Problem: I'm new to R, and after researching this error extensively, I'm still not able to find a solution for it. Here's the code. I've checked my working directory and made sure the files are in the right directory. The error I'm getting is below: Error in file(file, &ldquo;rt&rdquo;) : cannot open the connection Thanks
asked Mar 30 ummesalma 29.2k points