Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #10487 - Paths containing characters other than 7-bit ASCII cause failure on windows #10619

Merged
merged 10 commits into from
Jul 25, 2024

Conversation

jmarrec
Copy link
Contributor

@jmarrec jmarrec commented Jul 23, 2024

Pull request overview

I have made a test release at https://github.com/jmarrec/EnergyPlus/releases/tag/v24.2.0-windows_non_ascii and @DaveInCaz tried it.

(namespace fs = std::filesystem)

As a reminder, Windows behaves differently than the rest of the world (Shocking, I know). Anyways, on all platforms we target, fs::path::value_type is char, except on windows where it's wchar_t.

With a char that has a 2 bytes unicode such as Б (Unicode 0411), it just plain crashes everytime you call fs::path::string().

The fix here is to avoid ever doing that. I narrow the path manually when either printing or passing it to for eg sqlite3 is needed.

Pull Request Author

Add to this list or remove from it as applicable. This is a simple templated set of guidelines.

  • Title of PR should be user-synopsis style (clearly understandable in a standalone changelog context)
  • Label the PR with at least one of: Defect, Refactoring, NewFeature, Performance, and/or DoNoPublish
  • Pull requests that impact EnergyPlus code must also include unit tests to cover enhancement or defect repair
  • Author should provide a "walkthrough" of relevant code changes using a GitHub code review comment process
  • If any diffs are expected, author must demonstrate they are justified using plots and descriptions
  • If changes fix a defect, the fix should be demonstrated in plots and descriptions
  • If any defect files are updated to a more recent version, upload new versions here or on DevSupport
  • If IDD requires transition, transition source, rules, ExpandObjects, and IDFs must be updated, and add IDDChange label
  • If structural output changes, add to output rules file and add OutputChange label
  • If adding/removing any LaTeX docs or figures, update that document's CMakeLists file dependencies

Reviewer

This will not be exhaustively relevant to every PR.

  • Perform a Code Review on GitHub
  • If branch is behind develop, merge develop and build locally to check for side effects of the merge
  • If defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
  • If feature, test running new feature, try creative ways to break it
  • CI status: all green or justified
  • Check that performance is not impacted (CI Linux results include performance check)
  • Run Unit Test(s) locally
  • Check any new function arguments for performance impacts
  • Verify IDF naming conventions and styles, memos and notes and defaults
  • If new idf included, locally check the err file and other outputs

@jmarrec jmarrec added the Defect Includes code to repair a defect in EnergyPlus label Jul 23, 2024
@jmarrec jmarrec self-assigned this Jul 23, 2024
Comment on lines 316 to 320
std::string toString(fs::path const &p);

std::string toGenericString(fs::path const &p);

fs::path appendSuffixToPath(fs::path const& outputFilePrefixFullPath, const std::string &suffix);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New helpers

Comment on lines +325 to +353
// Add a custom formatter for fmt
template <> struct fmt::formatter<fs::path>
{
// Presentation format: 's' - string, 'g' - generic_string.
char presentation = 's';

// Parses format specifications of the form ['s' | 'g'].
constexpr auto parse(format_parse_context &ctx) -> decltype(ctx.begin())
{
// Parse the presentation format and store it in the formatter:
auto it = ctx.begin(), end = ctx.end();
if (it != end && (*it == 's' || *it == 'g')) {
presentation = *it++;
}

// Check if reached the end of the range:
if (it != end && *it != '}') {
throw format_error("invalid format");
};

// Return an iterator past the end of the parsed range:
return it;
}

template <typename FormatContext> auto format(const fs::path &p, FormatContext &ctx) -> decltype(ctx.out())
{
return format_to(ctx.out(), "{}", presentation == 'g' ? EnergyPlus::FileSystem::toGenericString(p) : EnergyPlus::FileSystem::toString(p));
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new formatter for fs::path, instead of using the std::string one.

Two presentation types are accepted: s (default), and g.

s uses toString (and is similar to fs::path::string() but with narrowing) and g uses toGenericString (and is sililar to fs::path::generic_string())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. I'm like 23% surprised you had to make your own formatter for this. So 77% not surprised because of wide char nonsense.

Comment on lines +104 to +106
#ifdef _WIN32
result.make_preferred();
#else
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in makeNativePath, avoid going to a string() if not needed (and will potentially crash)

Comment on lines +123 to +130
#ifdef _WIN32
std::wstring pathStr = path.native();
if (!pathStr.empty()) {
while ((pathStr.back() == DataStringGlobals::pathChar) || (pathStr.back() == DataStringGlobals::altpathChar)) {
pathStr.erase(pathStr.size() - 1);
}
}
#else
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in getParentDirectoryPath...

Comment on lines +359 to +365
const std::uintmax_t file_size = fs::file_size(filePath);
std::ifstream file(filePath, mode);
if (!file.is_open()) {
throw FatalError(fmt::format("Could not open file: {}", filePath));
}
std::string result(file_size, '\0');
file.read(result.data(), file_size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change readFile and readJSON to use ifstream (that handles fs::path) + avoid using path.string()

Comment on lines +406 to +431
std::string toString(fs::path const &p)
{
if constexpr (std::is_same_v<typename fs::path::value_type, wchar_t>) {
return CLI::narrow(p.wstring());
} else {
return p.string();
}
}

std::string toGenericString(fs::path const &p)
{
if constexpr (std::is_same_v<typename fs::path::value_type, wchar_t>) {
return CLI::narrow(p.generic_wstring());
} else {
return p.generic_string();
}
}

fs::path appendSuffixToPath(fs::path const &outputFilePrefixFullPath, const std::string &suffix)
{
if constexpr (std::is_same_v<typename fs::path::value_type, wchar_t>) {
return {outputFilePrefixFullPath.wstring() + CLI::widen(suffix)};
} else {
return {outputFilePrefixFullPath.string() + suffix};
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New helpers definitions

Comment on lines 2617 to 2623
if constexpr (debug) {
// std::cout << "errorStream=" << errorStream << ", dbName=" << dbName << std::endl;
// std::cout << "dbName.string()=" << dbName.string() << std::endl;
// std::cout << "dbName.generic_string()=" << dbName.generic_string() << std::endl;
std::wcout << "dbName.generic_wstring()=" << dbName.generic_wstring() << std::endl;
std::cout << "narrow(dbName.generic_wstring())=" << FileSystem::toGenericString(dbName) << std::endl;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to clean that up...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 48eb2eb

Comment on lines +2683 to +2690
if (fs::is_regular_file(dbName)) {
std::error_code ec;
if (!fs::remove(dbName, ec)) {
// File operation failed. SQLite connection is not in an error state.
*m_errorStream << "SQLite3 message, can't remove old database. code=" << ec.value() << ", error: " << ec.message()
<< std::endl;
ok = false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of calling the C stdio.h remove, use the fs::remove one

@@ -2665,7 +2696,7 @@ SQLiteProcedures::SQLiteProcedures(std::shared_ptr<std::ostream> const &errorStr

if (ok) {
// Now open the output db for the duration of the simulation
rc = sqlite3_open_v2(dbName.string().c_str(), &m_connection, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, nullptr);
rc = sqlite3_open_v2(dbName_utf8.c_str(), &m_connection, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When opening the sqlite database, same thing: don't call string(), but our narrowing one.

@@ -490,7 +490,7 @@ namespace ScheduleManager {
ShowSevereError(state, error);
}
}
ShowContinueError(state, fmt::format("Error Occurred in {}", state.files.TempFullFilePath.filePath.string()));
ShowContinueError(state, fmt::format("Error Occurred in {}", state.files.TempFullFilePath.filePath));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the new fmt::formatterfs::path everywhere, so we don't call fs::path::string()

@jmarrec jmarrec requested a review from Myoldmopar July 23, 2024 04:59
Copy link
Member

@Myoldmopar Myoldmopar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'll pull develop and test things out. :) :g :)

Comment on lines +325 to +353
// Add a custom formatter for fmt
template <> struct fmt::formatter<fs::path>
{
// Presentation format: 's' - string, 'g' - generic_string.
char presentation = 's';

// Parses format specifications of the form ['s' | 'g'].
constexpr auto parse(format_parse_context &ctx) -> decltype(ctx.begin())
{
// Parse the presentation format and store it in the formatter:
auto it = ctx.begin(), end = ctx.end();
if (it != end && (*it == 's' || *it == 'g')) {
presentation = *it++;
}

// Check if reached the end of the range:
if (it != end && *it != '}') {
throw format_error("invalid format");
};

// Return an iterator past the end of the parsed range:
return it;
}

template <typename FormatContext> auto format(const fs::path &p, FormatContext &ctx) -> decltype(ctx.out())
{
return format_to(ctx.out(), "{}", presentation == 'g' ? EnergyPlus::FileSystem::toGenericString(p) : EnergyPlus::FileSystem::toString(p));
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. I'm like 23% surprised you had to make your own formatter for this. So 77% not surprised because of wide char nonsense.

@Myoldmopar
Copy link
Member

Everything is totally happy locally, but I had to apply Clang Format. I'll push that up real quick and let CI confirm that, and then merge.

@Myoldmopar
Copy link
Member

CI seems all happy with it, and the only thing I did was apply Clang Format, so I'm not waiting on Decent to finish. Merging. Thanks @jmarrec

@Myoldmopar Myoldmopar merged commit f2cf440 into develop Jul 25, 2024
10 of 15 checks passed
@Myoldmopar Myoldmopar deleted the 10487_Windows_NonASCII_paths branch July 25, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Defect Includes code to repair a defect in EnergyPlus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Paths containing characters other than 7-bit ASCII cause failure
7 participants