Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid argument error when saving a loaded dataframe #170

Closed
Datseris opened this issue Feb 13, 2018 · 13 comments
Closed

Invalid argument error when saving a loaded dataframe #170

Datseris opened this issue Feb 13, 2018 · 13 comments

Comments

@Datseris
Copy link

(I am sure that this problem comes just because I am a noob but I don't really know what to do with it so please bear with me...)

ws = [10.0, 20.0, 30.0]

pottype = "linear"

simulations = CSV.read("pnjunction_simulations.tsv"; delim='\t') # returns a 54x8 DataFrame

for w in ws

parameters = Dict(
:E=>E,:leng=>leng, :wid=>wid, :w=>w, :B=>B,
:Dk=>Dk, :V0=>V0, :pottype=>pottype)


dataframe = DataFrame(; parameters...)
simulations = vcat(simulations, dataframe)

end


CSV.write("pnjunction_simulations.tsv", simulations; delim='\t')

The last line gives me error:
SystemError: opening file pnjunction_simulations.tsv: Invalid argument

(P.S.: If there is a more "standard" way to add rows to a dataframe, besides this vcat I am doing, you are welcomed to let me know)

@Datseris
Copy link
Author

Weird: changing the file extention to .csv worked, even when using \t as a delimter.

@nalimilan
Copy link
Member

Sounds really weird. What happens if you try with different file names, different extensions, different paths?

@Datseris
Copy link
Author

I won't have time to debug it, sorry :(

It worked after I changed the file extention, and now I just went with it.

@Datseris
Copy link
Author

Hi,

the problem is even worse actually.

  1. Why did this issue close? The issue was not a mistake. It clearly happened to me, I wasn't lying.
  2. It now happened again, with .csv ending instead.

Could it be that somehow CSV.read "occupies the file" so it can't be saved at the same name?

I've noticed that doesn't matter what file type I have, if I do:

data = CSV.read("somefile.ext")

and then modify data and try to do

CSV.write("somefile.ext", data)

it gives me the error

SystemError: opening file pnjunction_simulations.tsv: Invalid argument
in  at base\<missing>
in #write#57 at CSV\src\Sink.jl:139
in close! at CSV\src\Sink.jl:71
in open at base\iostream.jl:132
in open at base\iostream.jl:104
in systemerror at base\error.jl:64
in #systemerror#44 at base\error.jl:64 

I am on Windows 10, Julia 0.6.2. All packages on their latest stable version.

The file I read is inside a syncing folder like Google Drive or ownCloud. I cannot imagine how it should be related however.

@nalimilan
Copy link
Member

I closed the issue because you said you didn't have time to investigate it, and since we had no way to reproduce it...

It looks like the file isn't properly closed by CSV.read, which prevents further modifications. Indeed the function does not call close on the file. @quinnj Is this intentional?

As a workaround, you can open the file manually using open, pass the resulting stream to CSV.read, and close it manually.

@nalimilan nalimilan reopened this Feb 13, 2018
@quinnj
Copy link
Member

quinnj commented Feb 13, 2018

This is a windows-only issue; you can do CSV.read(file; use_mmap=false) and should be able to write to the same filename afterwards just fine. Alternatively, I think you can also do df = CSV.read(file); gc(); gc(); CSV.write(file, df) and that should work too.

@Datseris
Copy link
Author

Datseris commented Feb 13, 2018

You are right, using

    f = open(datadir*"/pnjunction_simulations.csv")
    simulations = CSV.read(f; delim='\t')
    close(f)

instead worked.

Is there any chance you can "fix" this, in the sense that I actually won't have to use the suggestions of quinnj or the fix by nalimilan?

From a users perspective any function that has "read" as its name should leave the file alone after reading it. Maybe modify CSV.read to not use use_mmap by default on windows?

@nalimilan
Copy link
Member

@quinnj Is there any reason not to call close from inside CSV.read?

@quinnj
Copy link
Member

quinnj commented Feb 13, 2018

I'm not sure what you mean. The error here is that when you Mmap.mmap(file) on windows, it's invalid to try and write to that file until it's "mapping" is completely closed (i.e. the mmapped array is finalized). This is in contrast to unix, where writing to an mmapped file is allowed. So there's nothing to close. I guess maybe we could try and call finalize on the array at the end of CSV.read? But then we might run into corruption issues if people are counting on accessing that byte array or anything.

@nalimilan
Copy link
Member

Ah, I forgot that the columns are mmaped. I think by default we should do something safe and convenient, so if that means not using mmap on Windows or copying the array once it's been parsed, let's do that. Parsing a CSV file shouldn't require so much thought, or we're going to look bad compared with other languages.

@rasmushenningsson
Copy link

+1 for making sure this doesn't require a workaround on Windows 10.
It is a very common use case - open CSV, modify, save. And it is difficult to figure out the solution from the error message.

@quinnj
Copy link
Member

quinnj commented Aug 29, 2018

This is fixed on master with the switch to CSV.File (CSV.read still relies on the old CSV.Source, but there are plans to switch it over soon). use_mmap is now false by default on windows.

Note for now, you can get a NamedTuple of Vectors on master by doing using Tables; table = CSV.File(file; kwargs...) |> columntable

@BenSiv
Copy link

BenSiv commented Jan 30, 2022

In my case, the problem was the location of the output directory. Like in the case of @Datseris it was a Google Drive directory. Once I change the targer directory the problem was fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants