Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on wine after building system image #4213

Closed
papamarkou opened this issue Sep 5, 2013 · 22 comments
Closed

crash on wine after building system image #4213

papamarkou opened this issue Sep 5, 2013 · 22 comments
Assignees
Labels
io Involving the I/O subsystem: libuv, read, write, etc. regression Regression in behavior compared to a previous version system:windows Affects only Windows
Milestone

Comments

@papamarkou
Copy link
Contributor

I thought it would be better to submit this as a separate issue in order to bring it to the attention of the wider community. Although a temporary fix for #3420 has been achieved and the Julia compilation on Windows was working fine, I get a new error after the latest update (compiling and linking works fine, the following error appears towards the end of make):

fixme:ntdll:NtFlushInstructionCache 0xffffffffffffffff 0xc80000 524288

abnormal program termination
*** This error is usually fixed by running 'make clean'. If the error persists, try 'make cleanall'. ***
make[1]: *** [/home/theodore/opt/julia-win64/usr/lib/julia/sys.ji] Error 1
make: *** [release] Error 2

This is apparently a Windows specific error, since Julia updates fine on Linux.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 6, 2013

This probably happens during the atexit callback, since the sys.ji file has already been successfully written (on my machine). I'm having trouble acquiring direct information on this because (a) it doesn't happen on the native build (b) I haven't figured out how to run winedbg on julia in months (c) I only see it with win32, not win64, which is strange since your report appears to be for win64.

All that not-with-standing, I believe this occurs because it is illegal on windows & linux to finalize an IOStream object while a libuv object still exists pointing to the same (or vice-versa). yet this can happen when we have both a libuv handle and an IOStream handle to a file. reverting STDIO Files to return an FS.File instead of an IOStream (in init_stdio at string.jl:207) seems to resolve this crash. @JeffBezanson or @loladiro can you evaluate the merits of ensuring we never open an IOStream for a handle under libuv's control or vice versa. That is, can we use a call to dup() here to make a safe file handle before using it a second time?

@papamarkou
Copy link
Contributor Author

Yes @vtjnash, I can confirm that in my case this happened with win64. Thanks for trying to sort it out.

@ghost ghost assigned JeffBezanson Sep 7, 2013
@papamarkou
Copy link
Contributor Author

P.S. I wanted also to confirm that this doesn't happen on the native build (I just built Julia on Windows natively without problems), so it appears only when cross-compiling.

@JeffBezanson
Copy link
Sponsor Member

The IOStream created in init_stdio is set up not to call close on the descriptor, so I'm not sure what's happening.

@JeffBezanson
Copy link
Sponsor Member

@vtjnash is there something other than calling close that is also problematic?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 10, 2013

what else is IOStream potentially also calling?

make seems to pass a rather deranged file descriptor, so various normal operations on it sometimes behave strangely.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 10, 2013

would IOStream care if fd == -1?

@JeffBezanson
Copy link
Sponsor Member

I believe IOStream works with fd==-1, and will simply avoid doing any operations on the fd. It doesn't call anything stranger than read, write, and close. It might also call lseek and ftruncate.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 11, 2013

I see you use lseek. This function exists on windows, but has been deprecated for a very long time (documentation for it doesn't appear to exist). It should be using the posix compliant _lseek

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 11, 2013

I think ftruncate has already been replaced by _chsize

@JeffBezanson
Copy link
Sponsor Member

Ok that sounds like a reasonable change.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 11, 2013

actually, make that lseek64 (posix name) or _lseeki64 (windows name). they should point the same function, but this will be necessary for supporting win64.

@papamarkou
Copy link
Contributor Author

I tried to cross-compile after your recent fixes and it's not there yet - it still gives the same error.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 12, 2013

I know, I was making other corrections, not fixing this.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 16, 2013

i see this on windows now -- make used to swallow it

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 18, 2013

here's the partial backtrace:

#0  0x000007fefe0d12bf in msvcrt!memcmp () from C:\Windows\system32\msvcrt.dll
#1  0x0000000069ef0312 in ios_write (s=0xc5edc0, data=0xc5ecf7 "\003", n=1) at ios.c:387
#2  0x0000000069ef1afc in ios_putc (c=3, s=0xc5edc0) at ios.c:892
#3  0x0000000069ec609e in jl_serialize_value_ ()
   from C:\users\julia\desktop\julia\usr\bin\libjulia.dll
#4  0x0000000000c5edc0 in ?? ()
#1  0x0000000069ef0312 in ios_write (s=0xc5edc0, data=0xc5ecf7 "\003", n=1) at ios.c:387
387             memcpy(s->buf + s->bpos, data, n);
(gdb) print *s
$7 = {buf = 0x179bc960 "pD-", bm = 12971720, errcode = 0, state = bst_wr, maxsize = 12971720,
  size = 12971632, bpos = 1776879983, ndirty = 12971760, fpos = 8791765356744,
  lineno = 1179665, fd = -1, readonly = 0 '\000', ownbuf = 0 '\000', ownfd = 0 '\000',
  _eof = 0 '\000', rereadable = 0 '\000', userdata = 520893120,
  local = "°?\020\037\000\000\000\000\003\000\000\000\000\000\000\000"îÅ\000\000\000\000\000\030îÅ\000\000\000\000\000`â+\000\000\000\000\000°3\020\037\000\000\000\000\000      éi\000"}

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 18, 2013

more complete backtrace:

(gdb) up
#1  0x0000000065e39c32 in ios_write (s=0xe2e5a0, data=0xe2e3f7 "\003", n=1) at ios.c:387
387             memcpy(s->buf + s->bpos, data, n);
(gdb)
#2  0x0000000065e3b41c in ios_putc (c=3, s=0xe2e5a0) at ios.c:892
892         return (int)ios_write(s, &ch, 1);
(gdb)
#3  0x0000000065e0c838 in writetag (s=0xe2e5a0, v=0x37afd0) at dump.c:79
79          write_uint8(s, (uint8_t)(ptrint_t)ptrhash_get(&ser_tag, v));
(gdb)
#4  0x0000000065e0dbaf in jl_serialize_value_ (s=0xe2e5a0, v=0x399ff0) at dump.c:372
372                 writetag(s, (jl_value_t*)jl_datatype_type);
(gdb)
#5  0x0000000065e0f2dc in jl_save_system_image (
    fname=0x1ffdcf80 "C:\\users\\julia\\desktop\\julia\\usr\\bin/../lib/julia/sys.ji")
    at dump.c:768
768         jl_serialize_value(&f, jl_array_type->env);
(gdb)
#6  0x0000000002b0ffc9 in ?? ()
(gdb)

@JeffBezanson can you offer any help?

@JeffBezanson
Copy link
Sponsor Member

The ios_t object here is very invalid. It's supposed to be a file, but fd is -1, so my guess is ios_file returned NULL (which we should check for) in jl_save_system_image. This would mean the call to open failed.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 18, 2013

cool. now i can stop tracking down that unrelated issue (forgetting to set JL_PRIVATE_LIBDIR) and get back to the real one (TTY throws a method error when exiting) :/

Base.MethodError(f=_uv_hook_writecb, args=(Base.TTY, 0x0ee8f2c8, 0))

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 18, 2013

well, the good new is that this is a bug in wine's emulation of windows. the bad news is that it is libuv's fault for hitting this bug (it closes the TTY handle for stdout, then thinks it can still use/close it for stderr)

@loladiro can you help me edit libuv to duplicate all handles it gets from outside sources which it will not fully own (e.g. after it calls handle=_getosfhandle(fd) on something)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 19, 2013

moved to 0.3, since this doesn't seem like it should be release-blocking

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 29, 2013

apparently this was nothing more than a foolish double-close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc. regression Regression in behavior compared to a previous version system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

3 participants