Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV.jl fails to parse a file that DuckDB is fine with #1143

Open
asinghvi17 opened this issue Sep 30, 2024 · 1 comment
Open

CSV.jl fails to parse a file that DuckDB is fine with #1143

asinghvi17 opened this issue Sep 30, 2024 · 1 comment

Comments

@asinghvi17
Copy link

MWE:

import CSV, QuackIO
using DataFrames

file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")

# try QuackIO first
dataset = QuackIO.read_csv(DataFrame, file) # works

# now try CSV
CSV.read(file, DataFrame) # errors

The error:

ERROR: TaskFailedException

    nested task error: thread = 7 fatal error, encountered an invalidly quoted field while parsing around row = 4573, col = 12: ""03.10.2018: 2330 UTC: Posn: 38:49.2N – 118:14.5E, Tianjin Anchorage, China.
    ", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
    
    Stacktrace:
     [1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:590
     [2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:798
     [3] parserow
       @ ~/.julia/packages/CSV/cwX2w/src/file.jl:640 [inlined]
     [4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:550
     [5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:360
     [6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
       @ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:455
 [2] macro expansion
   @ ./task.jl:487 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:240
 [4] File
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:227 [inlined]
 [5] #File#32
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:223 [inlined]
 [6] CSV.File(source::String)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:162
 [7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:117
 [8] read(source::String, sink::Type)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:113
 [9] top-level scope
   @ REPL[223]:1
Some type information was truncated. Use `show(err)` to see complete types.

I tried tracking down the error, but everything in that area of the file (both the line mentioned and searching for the given text) seemed fine...

@AmeroIL
Copy link

AmeroIL commented Oct 20, 2024

Hi @asinghvi17

I ran the code above and did not find any errors, the final output was the DF itself.
perhaps this is an issue which is related to the installation of Julia?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants