Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open file with Cyrillic filename [msys2/clang64] #2229

Closed
eddiezato opened this issue May 1, 2022 · 19 comments
Closed

Failed to open file with Cyrillic filename [msys2/clang64] #2229

eddiezato opened this issue May 1, 2022 · 19 comments
Labels
bug i18n Internationalisation windows Windows Specific issues

Comments

@eddiezato
Copy link

eddiezato commented May 1, 2022

Describe the bug
$ exiv2 "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg: Failed to open the file
To Reproduce

Build the latest version of exiv2 from main in the msys2/clang64 environment. My build script:

export CC=clang CXX=clang++
git clone --depth 1 https://github.com/Exiv2/exiv2.git
cd exiv2
cmake -B build -G Ninja -S ./ \
    -DBUILD_SHARED_LIBS=OFF \
    -DEXIV2_ENABLE_BMFF=ON \
    -DEXIV2_BUILD_SAMPLES=OFF \
    -DCMAKE_C_FLAGS='-ffunction-sections -fdata-sections -march=native -mtune=native -O3 -pipe' \
    -DCMAKE_CXX_FLAGS='-ffunction-sections -fdata-sections -march=native -mtune=native -O3 -pipe' \
    -DCMAKE_EXE_LINKER_FLAGS='-Wl,--gc-sections -static -liconv -lexpat -lz'
ninja -C build
Expected behavior

The program must be able to open a file with a Cyrillic filename.

$ exiv2 "Darth-Vader-SW-2304085.jpg"
File name       : Darth-Vader-SW-2304085.jpg
File size       : 324653 Bytes
MIME type       : image/jpeg
Image size      : 1920 x 1080
Darth-Vader-SW-2304085.jpg: No Exif data found in the file
Desktop (please complete the following information):
  • OS and version: Windows 11
  • Compiler and version: Clang 14.0.0
  • Compilation mode and/or compiler flags: see in "To Reproduce"
Additional context
@piponazo
Copy link
Collaborator

piponazo commented May 2, 2022

Hi @eddiezato . I just tried to copy & paste the filename you gave as an example (Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg) and I could open the file without problems. I tried with the following terminals on Windows 11:

  • normal CMD
  • bash terminal
  • MSYS UCRT64

I worked in solving this issue few months ago. You can see the work done in this PR: #2090.

You have mentioned that you are using msys2/clang64 and this makes me think that you probably need to use the UCRT64 MSYS2 environment to handle filepaths with UTF-8 characters properly.

Please let us know if you manage to handle your file correctly after the feedback provided.

@eddiezato
Copy link
Author

@piponazo thanks for your reply.

You can see the work done in this PR: #2090.

Yeah, I've read this and a couple of other topics. I use msys2/clang64 to avoid mixing files built by different compilers since I build everything with clang anyway, and this environment also uses the ucrt library. I also build other apps in the same way, such as flac, libjxl, libwebp, mozjpeg, etc. And they don't have this problem. Only exiv2 and also qimgv.

I've tried open the file without success on Windows 11 inside:

  • cmd
  • pwsh
  • msys2 clang64
  • msys2 ucrt64
  • and all of the above inside wt (windows terminal)

@eddiezato
Copy link
Author

eddiezato commented May 2, 2022

Build v0.27.5 with EXIV2_ENABLE_WIN_UNICODE=ON gives this:

msys2/clang64 terminal

$ exiv2/build/bin/exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg: Failed to open the file

cmd or pwsh

D:\msys2\home\user\exiv2\build\bin>exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
File name       : Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg
File size       : 324653 Bytes
MIME type       : image/jpeg
Image size      : 1920 x 1080
Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg: No Exif data found in the file

the file has no exif, but at least it's loaded.

@piponazo
Copy link
Collaborator

piponazo commented May 2, 2022

Alright, let's try to hunt this issue together then 😇 .

Juts as a piece proof that I was not lying before, I upload this screenshot showing how I can open a JPEG file which I renamed in my system with the filename you provided:
image

At the moment, I think we are not using clang from windows in CI, and I suspect this might be one possible reason for this problem. Up to now I always compile on windows either using Visual Studio or the Gcc compiler provided with MSYS2 or Cygwin. I can try to generate a clang build and check if I have the problem you are describing. Did you try to use a windows version generated with Visual Studio? I guess you could directly try with the nightly release available here:
https://github.com/Exiv2/exiv2/releases/tag/nightly

By the way, in your last comment you mentioned v0.27.5. Please note that #2090 was only merged into the main branch. But I guess you just mentioned that for comparison purposes 😉

@kmilos
Copy link
Collaborator

kmilos commented May 2, 2022

Did you build main inside MINGW64 environment/terminal (no unicode support) or UCRT64/CLANG64 for sure? Just setting CC=clang from MINGW64 will not work...

I have the CLANG64 environment set up here and will try to reproduce shortly...

@eddiezato
Copy link
Author

Previously I was building with clang in the default msys2/mingw64 environment without any problems. But a couple of days ago I just asked myself why use the dependencies created with gcc when I have the clang environment for me. Just for my inner perfectionist. 😜

Did you try to use a windows version generated with Visual Studio?

It works fine:

D:\Downloads\exiv2-1.0.0.9-2019msvc64\bin>exiv2.exe "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
File name       : Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
File size       : 324653 Bytes
MIME type       : image/jpeg
Image size      : 1920 x 1080
Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg: No Exif data found in the file

Please note that #2090 was only merged into the main branch. But I guess you just mentioned that for comparison purposes

Yep. 😉

@kmilos

Did you build main inside MINGW64 environment/terminal (no unicode support) or UCRT64/CLANG64 for sure?

Yeah, that's for sure:

2.9G    ./clang64
1.4G    ./home
0       ./ucrt64
0       ./mingw64
0       ./mingw32
0       ./clangarm64
0       ./clang32

@kmilos
Copy link
Collaborator

kmilos commented May 2, 2022

Indeed, it (main 19dc566) doesn't work for me either currently:

image

Same w/ UCRT64 build 😕

I haven't tried setting the system locale to UTF-8 yet.

@piponazo
Copy link
Collaborator

piponazo commented May 2, 2022

I'll try dig into this ASAP, but I am lately pretty busy with work and holidays. I'll probably take weeks before I can do some progress on this topic.

@kmilos
Copy link
Collaborator

kmilos commented May 2, 2022

Interestingly, UCRT64 does work from the default mintty+bash (i.e. what is used in CI):

image

but doesn't from CLANG64 although it supposedly also links to ucrt...

image

This will take time to figure out, could be missing flags to Clang on our part, could also be due to some MSYS2 CLANG64 libc++ configuration issue...

@kmilos
Copy link
Collaborator

kmilos commented May 2, 2022

@eddiezato Can you try removing -municode from app/CMakeLists.txt please?

@eddiezato
Copy link
Author

removing -municode from app/CMakeLists.txt

cmd

D:\msys2\home\user\exiv2\build\bin>exiv2.exe "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
Exiv2 exception in print action for file Darth-Vader-SW-:
Darth-Vader-SW-

msys2/clang64

user@host CLANG64 ~
$ exiv2/build/bin/exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
Darth-Vader-SW-: Failed to open the file

@kmilos
Copy link
Collaborator

kmilos commented May 2, 2022

Thanks. How about -DUNICODE -D_UNICODE instead of -municode?

@eddiezato
Copy link
Author

-DUNICODE -D_UNICODE instead of -municode

Same result.

@kmilos
Copy link
Collaborator

kmilos commented May 2, 2022

Thanks for checking, the search continues...

I also tried removing -municode and removing linking in this app/wmain.c hack, still a no-go...

@kmilos kmilos added the i18n Internationalisation label May 3, 2022
@eddiezato
Copy link
Author

I'm not an expert in c++. Just poke around in the code. 😜

So I found this:
fs::exists(path) - can't find Unicode path,
fs::exists(fs::u8path(path)) - can.

@piponazo
Copy link
Collaborator

piponazo commented May 6, 2022

This might actually be the answer to the issue, so even though you might not be an c++ expert you did a good investigation around! 😁

When I have some spare time, I would like to:
1- First reproduce the issue by my own.
2- Setup a new CI job to reproduce the issue on the cloud (if possible)
3- Apply a fix (possibly using fs::u8path)

@eddiezato
Copy link
Author

eddiezato commented May 6, 2022

I made a simple program for testing:

#include <iostream>
#include <filesystem>
#include <string>

namespace fs = std::filesystem;
using namespace std;

template <typename T> string f(T in) {
    if (fs::exists(in))
        return "exists          ";
    else
        return "doesn't exists  ";
}

int main() {
    setlocale(LC_CTYPE, ".utf8");
    
    string ss = "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg";
    fs::path ps = ss;
    fs::path pu = fs::u8path(ss);
    
    wstring ws = L"Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg";
    fs::path pw = ws;

    cout << "string              " << f(ss) << ss << endl;
    cout << "path from string    " << f(ps) << ps << endl;
    cout << "u8path from string  " << f(pu) << pu << endl << endl;

    cout << "wstring             " << f(ws); wcout << ws; cout << endl;
    cout << "path from wstring   " << f(pw) << pw << endl;
}

Then compiled it in clang64 and ucrt64 environments:

user@host CLANG64 ~/123
$ clang++ -Wall -std=c++17 123.cpp -o 123clang

user@host UCRT64 ~/123
$ clang++ -Wall -std=c++17 123.cpp -o 123ucrt

Output:

D:\msys2\home\user\123>123clang
string              doesn't exists  Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from string    doesn't exists  "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
u8path from string  exists          "Darth-Vader-SW-

wstring             exists          Darth-Vader-SW-
path from wstring   exists          "Darth-Vader-SW-

D:\msys2\home\user\123>123ucrt
string              exists          Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from string    exists          "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
u8path from string  exists          "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"

wstring             exists          Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from wstring   exists          "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"

@eddiezato
Copy link
Author

I opened a discussion in the msys2/mingw repo. The problem seems to be in libc++ itself, so I dunno if exiv2 should to have any adjustments specifically for the msys2/clang64 environment.

@kmilos kmilos added the windows Windows Specific issues label May 16, 2022
@neheb
Copy link
Collaborator

neheb commented Jul 20, 2023

seems this was fixed upstream. No release yet though. I'll close here since the bug is in libc++.

@neheb neheb closed this as completed Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug i18n Internationalisation windows Windows Specific issues
Projects
None yet
Development

No branches or pull requests

4 participants