Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crazy hack and problems with building on Windows with MSys2 #2

Closed
wyldckat opened this issue Aug 14, 2016 · 24 comments
Closed

Crazy hack and problems with building on Windows with MSys2 #2

wyldckat opened this issue Aug 14, 2016 · 24 comments
Assignees
Labels

Comments

@wyldckat
Copy link
Member

wyldckat commented Aug 14, 2016

For the builds of OpenFOAM 4.x, we had to implement a workaround for the following reason:

It's a really crazy error message that is happening with {{surfaceAutoPatch}}:

  collect2.exe: error: ld returned 5 exit status
  /home/ofuser/blueCFD/OpenFOAM-4.x/wmake/makefiles/general:155: recipe for target '/home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/bin/surfaceAutoPatch.exe' failed
  make[1]: *** [/home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/bin/surfaceAutoPatch.exe] Error 1
  /home/ofuser/blueCFD/OpenFOAM-4.x/wmake/makefiles/apps:39: recipe for target 'surfaceAutoPatch' failed

This problem seems to occur possibly due to the anti-virus butting in while the linker is still updating the executable file with new objects and features. For example, the manifest file object is added after most of the objects have already been added to the final binary; nonetheless, the executable booting header section is only added at the very end of the linking process.


The workaround was this:

Status: A very crazy hack was needed to sort this out, which is apparently a bug in GNU ld. Details provided in the file applications/utilities/surface/surfaceAutoPatch/Make/options:

  #if defined(WIN32) || defined(WIN64)

  /*
     This hack was needed due to a weird LD limitation in GNU Binutils
     2.26.20160125, as provided in the GCC build:
        "(Rev2, Built by MSYS2 project) 5.3.0
  */
  EXE_LIBS += -L/mingw64/lib/gcc/x86_64-w64-mingw32/5.3.0

  #endif

This possibly will be solved in future versions of GNU ld that are provided with MSys2, therefore we should check back on this after updating and/or using the latest MSys2 software stack.

@wyldckat
Copy link
Member Author

This crazy hack will be removed and replaced with the strategy used is the one described on the wiki page: https://github.com/blueCFD/Core/wiki/Quick-notes-on-how-to-update-build#updating-the-build-of-openfoam-4x-in-bluecfd-core-2016

@wyldckat
Copy link
Member Author

It seems like we can't undo this hack for blueCFD-Core-2016-2.

It also looks like this issue is related to the same issue as reported here: arduino/Arduino#2989 - therefore this is something that affects everyone.

@wyldckat
Copy link
Member Author

Not including the version object seems to help...

@wyldckat
Copy link
Member Author

OK, it seems that I've figured it out. By padding out with some semi-random bytes (a string table), it gives a larger stack for ld to process and therefore be able to handle the existing binary payload.

This has worked with surfaceAutoPatch, but I still need to do a full rebuild again to double-check.

But for future reference, the sizes of the binary objects for the version_of_build.o are mostly under the 1500 byte count, with a few above this mark and also another few below the 1400 byte count. With the hack used for padding out this file, the count goes over 2200 byte.

@wyldckat wyldckat self-assigned this Oct 16, 2016
@wyldckat
Copy link
Member Author

OK, it seems that the padding needs to be more controlled. There were a couple of utilities that failed when the object file was sized 2238. This gives off the vibe that we're padding out for a specific size, but I haven't figured out what should be the magical size.

Using the new padding should have fixed the issue, but instead it lead to other utilities not being linked in the first run, instead of the ones we had at first.

@wyldckat
Copy link
Member Author

Partial fix done in this commit: blueCFD/OpenFOAM-dev@419b22b

@wyldckat
Copy link
Member Author

Fixed in this commit: blueCFD/OpenFOAM-dev@6d8da04

@wyldckat
Copy link
Member Author

After running Alltest, a few issues were spotted that are documented in issue #27, among which was that the following applications: surfaceClean, surfaceCoarsen, surfaceHookUp, surfacePointMerge, writeCentres, autoRefineMesh, equilibriumFlameT, foamUpgradeCyclics, patchSummary... give the error:

bash: /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/bin/insert_name_here.exe: Accessing a corrupted shared library

where insert_name_here is any of the names above.

@wyldckat
Copy link
Member Author

wyldckat commented Nov 6, 2016

Here is the command we can use for checking the binaries:

cd $FOAM_APPBIN
for a in *.exe; do ldd $a 1>/dev/null 2>&1 || echo $a; done

Here is the command that rebuilds only these broken ones:

cd $FOAM_APPBIN
for a in *.exe; do ldd $a 1>/dev/null 2>&1 || echo ${a%%.exe}; done | \
  xargs -I {} find $FOAM_APP -name {} | while read line
do
    wclean $line
    wmake $line
    wmakeVerifyExeDependencies $line || wmake $line
done

@wyldckat
Copy link
Member Author

wyldckat commented Nov 20, 2016

OK, so at least writeCellCentres is broken even when we rebuild it manually. This means that we can use it as a test case for trying to fix the issue through re-balanced padding.

Shortening the padding by 20 bytes solved the issue... sizes of the version object file:

  • original padding: 2486
  • with less 20 bytes of padding: 2446
    • OK, so what happens is that the 20 bytes we injected are actually being interpreted using Unicode 16, hence the 40 byte difference.
  • with more 11 (22) bytes of padding it also builds: 2502
    • which effectively also added another entry to the table...
  • This odd... removed only one byte (2 Unicode 16) and the size when down from 2486 to 2478...

Each padding table row has 31 bytes (62 in Unicode 16) in the previous commits... and it appears that technically it also accounts for the fact that it has to fill up to 8 bytes chunks.

Using a more strict padding strategy, namely to use 24 bytes (48 Unicode 16) per row, it gets us a bit more fine control about the final byte size, making it actual multiples of 8 for the total object file size.

  • This kind of padding seems to have solved the issue for writeCellCentres, but it still needs further testing, given that other applications may still have issues with this...
  • Also fixed for surface utilities...
  • Also fixed for equilibriumFlameT ones...
  • Currently ongoing test with rebuilding only the utilities, so see if it spurs any problems...
    • More details after the comment below...

@wyldckat
Copy link
Member Author

For detailed reference:

$ find /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/ -name version_of_build.o | xargs ls -l | awk '{print $6 " " $10}' | sort -n | grep -e surfaceClean -e surfaceCoarsen -e surfaceHookUp -e surfacePointMerge -e writeCentres -e autoRefineMesh -e equilibriumFlameT -e foamUpgradeCyclics -e patchSummary

2462 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/miscellaneous/patchSummary/version_of_build.o
2462 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/surface/surfaceClean/version_of_build.o
2462 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/surface/surfaceHookUp/version_of_build.o
2470 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/mesh/advanced/autoRefineMesh/version_of_build.o
2470 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/surface/surfaceCoarsen/version_of_build.o
2486 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/surface/surfacePointMerge/version_of_build.o
2486 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/thermophysical/equilibriumFlameT/version_of_build.o
2494 /home/ofuser/blueCFD/OpenFOAM-4.x/platforms/mingw_w64GccDPInt32Opt/applications/utilities/preProcessing/foamUpgradeCyclics/version_of_build.o

@wyldckat
Copy link
Member Author

wyldckat commented Nov 20, 2016

After using the following padding block:

STRINGTABLE
BEGIN
   0x00000001, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000002, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000003, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000004, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000005, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000006, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000007, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000008, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x00000009, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x0000000A, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x0000000B, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x0000000C, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x0000000D, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x0000000E, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   0x0000000F, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
END

the following utilities failed to build:

postChannel
kivaToFoam
moveDynamicMesh
writeMeshObj
refineMesh
zipUpMesh

which have the following sizes:

2222  kivaToFoam
2222  postChannel
2246  moveDynamicMesh
2238  writeMeshObj
2222  refineMesh
2214  zipUpMesh

What seems to be common is that these are not multiples of 8... all ending in .75 when dividing by 8...

@wyldckat
Copy link
Member Author

wyldckat commented Nov 20, 2016

The names changed in the template resource file are:

  • tmpintname (10/20 bytes) x2
    • plus 47/94 bytes for the first one, for text and zero-byte end line.
    • plus 1/2 for the second one for the zero-byte end line.
  • tmporigname (11/22 bytes) x1
    • plus 1/2 for the zero-byte end line.

This means that when the 3 are replaced with kivaToFoam.exe (14/28) bytes, it resulted in an object file with 2222 bytes, of which:

  1. 28+94 = 122 <- missing 6 for 128
  2. 28 + 2 = 30 <- missing 2 for 32
  3. 28 + 2 = 30 <- missing 2 for 32

But this still doesn't justify why padding the end sorts it out in some cases... specially because without the last 4(8) bytes on the padding, kivaToFoam builds just fine... even if the object file is 2214 bytes, which is not a multiple of 8.

@wyldckat
Copy link
Member Author

Interesting... even after padding out each entry for moveDynamicMesh for filling all entries to the brim, it still would not build.

@wyldckat
Copy link
Member Author

Nope, I'm unable to find a proper pattern. The issue can be triggered either:

  1. when building in parallel;
  2. or it can be triggered when building in serial mode;
  3. or it might only be triggered when trying to run the application, i.e. the binary was built but was corrupted.

It could be because the total object map has a very small chunk of it occupied... but there is no clear pattern.

The best I can do is to use a specific environment variable that is set per application that is affected, so that it knows ahead of time what is the padding size needed.

@wyldckat
Copy link
Member Author

wyldckat commented Nov 20, 2016

OK, after changing the padding mechanism a bit and added the environment variable WM_BLUECFDCORE_PADDING for Allwmake scripts to use, it at least cleared up the list of problematic utilities in mesh/manipulation... at least it seemed like it...

See commits below for more details about the implementation.

The Allwmake script should have something like this:

#!/bin/sh
cd ${0%/*} || exit 1    # Run from this directory

if onWin
then
    : #export WM_BLUECFDCORE_PADDING=22
fi

wmake

#------------------------------------------------------------------------------

wyldckat added a commit to blueCFD/OpenFOAM-dev that referenced this issue Nov 20, 2016
…ed for building an application.

This is directly related to blueCFD/Core#2
@wyldckat wyldckat reopened this Nov 20, 2016
@wyldckat
Copy link
Member Author

wyldckat commented Nov 20, 2016

OK, using the new padding method, the currently broken utilities are:

adiabaticFlameT
foamyQuadMesh
surfaceClean
surfaceCoarsen
surfaceFeatureConvert
surfaceToPatch

Will need to use the Allwmake script to re-pad these.

Problems:

  • adiabaticFlameT will not reproduce the compilation problem when we only rebuild the thermophysical folder.
    • Actually, it does, but instead of giving a compiler error, it makes a mess with the linking.
    • It even triggered equilibriumFlameT to also break...
  • On the other hand, rebuilding on the surface utilities did not have problems... unable to reproduce errors with them.
    • surfaceCoarsen is triggered if we use wmakeVerifyExeDependencies and run wmake again.
  • It's not reproducible... after attempting to reproduce the build, the list is different:
    adiabaticFlameT
    changeDictionary
    dsmcInitialise
    equilibriumFlameT
    foamyHexMesh
    plot3dToFoam
    viewFactorsGen
    
    The advantage was that none crashed while building... so there is at least that...
  • After using use wmakeVerifyExeDependencies and running wmake again, the following were broken:
    adiabaticFlameT
    autoRefineMesh
    dsmcInitialise
    equilibriumFlameT
    patchSummary
    surfaceCoarsen
    

@wyldckat
Copy link
Member Author

wyldckat commented Nov 21, 2016

So my suspicion is that accessing the default manifest object is giving us these problems.

Namely the one mentioned here: https://sourceforge.net/p/mingw-w64/wiki2/default_manifest/

We're not the only ones with problems with this: msys2/MSYS2-packages#454

@wyldckat
Copy link
Member Author

wyldckat commented Nov 21, 2016

Padding out the whole 1348 bytes that /mingw64/x86_64-w64-mingw32/lib/default-manifest.o might be a plausible way to sort this out... or reverse padding towards it... it would be 28x48 + 4 bytes...

Currently trying with a padding of 2256, just to be sure if this isn't enough... because from what I inspected at the end of the damaged executable, the padding is actually prefixed to the version information... and I have confirmed that the manifest is appended at the very end of the binary.

OK it seems like the massive padding did the trick, without the need for extra Allwmake scripts.

wyldckat added a commit to blueCFD/OpenFOAM-dev that referenced this issue Nov 21, 2016
…ing needed for building an application."

This is no longer needed, as documented in blueCFD/Core#2

This reverts commit 2009f48.
@wyldckat
Copy link
Member Author

wyldckat commented Nov 22, 2016

The massive padding seems to have failed with only one application: createExternalCoupledPatchGeometry

Apparently it was because the padding when overboard:

3906 ./postProcessing/noise/version_of_build.o
...
4026 ./surface/surfaceSplitNonManifolds/version_of_build.o
4082 ./preProcessing/createExternalCoupledPatchGeometry/version_of_build.o

But it's still too soon to tell, because this happened after there were some building problems, because I was messing with the git tree while it was trying to build things... will do another clean rebuild to confirm if it's working or not.

The next rebuild gave an error with icoUncoupledKinematicParcelFoam... during the first pass, with an object file sized 4058. It worked just fine on the second pass, after it forcefully added all dependency libraries.

There are a few other solvers with long names, but they didn't trigger any linking errors and the final builds can be monitored with ldd as intended.

So, maybe we should at least trim out 1 row or 2 from the padding...

Nope, trimming out 2 rows resulted in extrudeToRegionMesh not building in the first pass and it was the only one. Well, whatever... we will have to live with this bug for now. The trick is to build with the -k option in Allwmake or run wmNonStop before doing a full build.

@wyldckat wyldckat reopened this Nov 22, 2016
@wyldckat wyldckat modified the milestones: blueCFD-Core-2016-3, blueCFD-Core-2016-2 Dec 7, 2016
@wyldckat
Copy link
Member Author

wyldckat commented Dec 7, 2016

Sigh... now it's icoUncoupledKinematicParcelDyMFoam that isn't building after running wmakeVerifyExeDependencies...

@wyldckat
Copy link
Member Author

OK, looks like that using /bin/dash as the default shell for GNU Make, fixes this issue, because it's not encumbered by a lot of bash'ian stuff. This worked at least for icoUncoupledKinematicParcelDyMFoam, therefore I'll do another full build to test if this is working as intended or not.

wyldckat added a commit to blueCFD/OpenFOAM-dev that referenced this issue Dec 22, 2016
@wyldckat
Copy link
Member Author

Next time we upgrade to the latest MSys2 stack, namely for blueCFD-Core 2017-1, we should have this fixed automatically, because the patch has been included on their stack. So all that will be left for this task will later on to remove the hacks that have been introduced above.

@wyldckat
Copy link
Member Author

Have reverted the padding in commit blueCFD/OpenFOAM-dev@4eb2a93 for blueCFD-Core 2017-1.

For people trying to compile OpenFOAM 5.x on blueCFD-Core 2016-1 stack, please run the following commands to undo this change:

foam
git revert 4eb2a93149a5fbc509166be897df4d1532ca6bcb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant