• Register
1 vote
611 views

Problem :

I am getting following error while trying to read the CSV file with R.
in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : eof within quoted string
6.9k points

2 Answers

0 votes

Solution :

Using read.csv() to read a file with text content is not a good idea.Disabling the quote as set quote="" is only a temporary solution it will only work with Separate quotation marks. There are other reasons which causes the warning, such as some special characters.

The permanent solution(using read.csv()), finding out what those special characters are and use a regular expression to eliminate them is the correct way..

Have you ever installed the package {data.table} and used fread() to read the file. it is much faster and will not bother you with this EOF warning. Please note that the file it loads it will be stored as a data.table object but not a data.frame object. The class data.table has many good features, but anyway, you can transform it using as.data.frame() if needed.

38.6k points
0 votes

Solution:

STEP 1: download and unzip the file

# download the file
site <- "http://www.informatics.jax.org/downloads/mgigff"
file <- "MGI.20170803.gff3.gz"
url <- paste0(site, "/", file)
if(!file.exists(file)) download.file(url, file)

# unzip to a temporary file
file <- sub(".gz$", "", file)
tmpfile <- tempfile()
remove_tmpfile <- FALSE
if(!file.exists(file)) { # need to unzip
    system(paste0("gunzip -c ", file, ".gz > ", tmpfile))
    remove_tmpfile <- TRUE
    file <- tmpfile
}

 

STEP 2:  read it into R with read.table().

tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE)

This gives a warning message:

Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

read.delim() vs read.table()

> read.delim
function (file, header = TRUE, sep = "\\t", quote = "\\"", dec = ".",
    fill = TRUE, comment.char = "", ...)
read.table(file = file, header = header, sep = sep, quote = quote,
    dec = dec, fill = fill, comment.char = comment.char, ...)
tab <- read.table(file, sep="\t", header=FALSE, comment.char="#",
                  na.strings=".", stringsAsFactors=FALSE,
                  quote="", fill=FALSE)

You need to disable quoting

cit <- read.csv("citations.CSV", quote = "", 
                 row.names = NULL, 
                 stringsAsFactors = FALSE)

str(cit)
## 'data.frame':    112543 obs. of  13 variables:
##  $ row.names    : chr  "10.2307/675394" "10.2307/30007362" "10.2307/4254931" "10.2307/20537934" ...
##  $ id           : chr  "10.2307/675394\t" "10.2307/30007362\t" "10.2307/4254931\t" "10.2307/20537934\t" ...
##  $ doi          : chr  "Archaeological Inference and Inductive Confirmation\t" "Sound and Sense in Cath Almaine\t" "Oak Galls Preserved by the Eruption of Mount Vesuvius in A.D. 79_ and Their Probable Use\t" "The Arts Four Thousand Years Ago\t" ...
##  $ title        : chr  "Bruce D. Smith\t" "Tomás Ó Cathasaigh\t" "Hiram G. Larew\t" "\t" ...
##  $ author       : chr  "American Anthropologist\t" "Ériu\t" "Economic Botany\t" "The Illustrated Magazine of Art\t" ...
##  $ journaltitle : chr  "79\t" "54\t" "41\t" "1\t" ...
##  $ volume       : chr  "3\t" "\t" "1\t" "3\t" ...
##  $ issue        : chr  "1977-09-01T00:00:00Z\t" "2004-01-01T00:00:00Z\t" "1987-01-01T00:00:00Z\t" "1853-01-01T00:00:00Z\t" ...
##  $ pubdate      : chr  "pp. 598-617\t" "pp. 41-47\t" "pp. 33-40\t" "pp. 171-172\t" ...
##  $ pagerange    : chr  "American Anthropological Association\tWiley\t" "Royal Irish Academy\t" "New York Botanical Garden Press\tSpringer\t" "\t" ...
##  $ publisher    : chr  "fla\t" "fla\t" "fla\t" "fla\t" ...
##  $ type         : logi  NA NA NA NA NA NA ...
##  $ reviewed.work: logi  NA NA NA NA NA NA ...

I think is because of this kind of lines (check "Thorn" and "Minus")

readLines("citations.CSV")[82]
[1] "10.2307/3642839,10.2307/3642839\t,\"Thorn\" and \"Minus\" in Hieroglyphic Luvian Orthography\t,H. Craig Melchert\t,Anatolian Studies\t,38\t,\t,1988-01-01T00:00:00Z\t,pp. 29-42\t,British Institute at Ankara\t,fla\t,\t,"

In the R help section, as pointed out above, just disabling quoting altogether, by simply adding:

    quote = "" 

I also ran into this problem, and was able to work around a similar EOF error using:

read.table("....csv", sep=",", ...)

The readr package will fix this issue.

install.packages('readr')
library(readr)
readr::read_csv('yourfile.csv')

 

31.1k points

Related questions

0 votes
1 answer 254 views
254 views
Problem : I am very new to the R, and after researching my error as below extensively, I am still unable to find the fix for it. I have already checked my working directory, and also made sure that the files are in a correct directory. Error in file(file, "rt") : ... sulfate", 1:10) In addition: Warning message: In file(file, "rt") : cannot open file './specdata001.csv': No such file or directory
asked Jan 4 alecxe 7.5k points
1 vote
1 answer 50 views
50 views
Problem: Hello guys, I am facing an error from my Python program that says python unexpected EOF while parsing. I am a newbie in python. So, it has been tough to understand and decode the problem for me alone. Could any of you please explain this in simple ... this problem and got some solutions but I couldn&rsquo;t crack them at all. I am looking forward to reading your answers. Many thanks.
asked Jun 28 adamSw 11.3k points
1 vote
1 answer 64 views
64 views
Problem: Hello people I am doing python programming for a few days. I am still learning. Could anybody please help me by showing how can I import a text file and read five lines from it by my python program? If possible make it as easy words as possible for me. Thanks.
asked Mar 11 Gavin 15.3k points
1 vote
2 answers 529 views
529 views
Problem: Hello Kodlogs, Myself is a pretty new student in programming. I am wondering if I can even send a file to a function as a parameter to check it&rsquo;s emptiness. Let&rsquo;s say, I have passed a file into a function as a parameter, could it check the file is ... ifstream to FILE. So, could you please tell me, how do you do this? I am looking forward having a solution of it. Thanks.
asked Apr 30 Gavin 15.3k points
0 votes
1 answer 10 views
10 views
Problem: How Open CSV file using FileReader object? How Create BufferedReader from FileReader? How Read file line by line using readLine() method?
asked Nov 6 Mashhoodch 1.2k points