Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgc: update libgc to commit 6d372272 #21822

Merged
merged 6 commits into from
Aug 4, 2024
Merged

Conversation

kimshrier
Copy link
Contributor

I am looking in to why the vlib/v/slow_tests/keep_args_alive_test.c.v is flaky.

On FreeBSD, on amd64 CPUs, this test fails approximately 80% of the time. FreeBSD on arm64 does not have this problem and this test always succeeds.

I have also tested with macos on a m1 processor and this test always succeeds.

The keep_args_alive_test on amd64 does not actually fail, it just never finishes and I have to kill it. To get a feel for what is going on, I collected some user-level stack traces at 997 Hz, using dtrace, in order to generate a flamegraph so I could see where it was spending time. The resulting image is:
flame keep_args_alive_test before

Comparing the version in thirdparty/libgc with the version in the master branch at https://github.com/ivmai/bdwgc, I saw that there were some changes in the marking logic. This pull request updates libgc to the latest version of the master branch.

Unfortunately, this did not improve the situation on amd64 processors. However, it does not appear to have harmed anything. Running v test-full on FreeBSD arm64 and macos m1 still function properly. On FreeBSD amd64, the keep_args_alive_test still fails about 80% of the time and everything else succeeds.

A flamegraph after the update looks like the same problem still exists.

flame keep_args_alive_test after

While this update does not fix the problem I was originally looking in to, I thought it might be a good idea to do the update so as to get more real-world exercise of the current libgc code.

I used the amalgamator.js script mentioned in thirdparty/libgc/amalgamation.txt to generate the gc.c file. I had to drop 'include/extra' from the search_dirs list in order to run it against the latest version from master.

As an aside, is there interest in having a v script to do this amalgamation?

@kimshrier
Copy link
Contributor Author

Looks like I need to make sure I built gc.c properly.

@spytheman
Copy link
Member

As an aside, is there interest in having a v script to do this amalgamation?

In my personal opinion, automation for building things, is always welcomed.
If done well, requiring little to no human input, it will ease the testing process, and allow us to make future libgc updates more frequent.

@spytheman
Copy link
Member

Unfortunately, this did not improve the situation on amd64 processors.

It is better for amd64 too, just may be not for that particular case.

Me and @larpon tested with his https://github.com/larpon/shy examples 2-3 weeks ago, and the new upstream bgwgc/libgc version, does work better for it.

@kimshrier
Copy link
Contributor Author

Just to make sure I am doing the right thing, what were the parameters, if any, that were passed in to the configure script for libgc. I'm currently doing the following:

./configure --enable-gc-debug

@spytheman
Copy link
Member

The last time, that I compiled a local clone of bdwgc, to test with Shy's examples, according to my history, I've used:

git clone --depth 1 https://github.com/ivmai/bdwgc
cd bdwgc/
./autogen.sh
CC=gcc-11 CFLAGS='-Os -mtune=generic -fPIC' LDFLAGS='-Os -fPIC' \
./configure \
--enable-handle-fork=yes \
--enable-gc-debug \
--enable-sigrt-signals \
--enable-rwlock \
--enable-threads=pthreads \
--enable-static \
--enable-shared=no \
--enable-single-obj-compilation \
--enable-parallel-mark
make
sudo make install

@spytheman
Copy link
Member

spytheman commented Jul 10, 2024

Then, to use it with V, I added -d dynamic_boehm to my V program compilation options, so that it would link the produced executables, to the newly compiled and installed in /usr/local, static .a file.

@kimshrier kimshrier changed the title libgc: update libgc to commit 09fc015b78 libgc: update libgc to commit 865c593 Jul 21, 2024
@kimshrier
Copy link
Contributor Author

I see that I am having problems with tcc in CI. Previously, I was able to see details but now, the only thing that shows up when I click on the Details link is just the word Error: with no additional information.

I have 3 environments I can test on, FreeBSD 14.1 on AMD64, FreeBSD Current on aarch64, and macos on M1, none of which have the sorts of errors I was seeing a few weeks ago when I clicked on the Details link. I do see other problems in my environment and I have been attributing them to the fact that tccbin hasn't been updated in a while for these OS/architecture combinations.

I am a little confused about what is going on with respect to libgc. I notice build scripts in the different OS/architecture branches of tccbin for libgc.a. I am building up-to-date tcc for the environments I have and I was wondering if I should also be building libgc.a in addition to the build artifacts from the build.sh scripts in tccbin. Even though the build scripts for libgc.a finish by copying the libgc.a file into thirdparty/tcc/lib, is this the same libgc.a library that clang or gcc would link against? I have /usr/local/lib/libgc.a already installed on my machines as some of the other software I have installed, uses it.

It's not my intent to create a bigger problem than what I was originally trying to solve. It does seem that libgc and tcc are intertwined in ways that I was not aware of and perhaps we should just close this merge request until I get more familiar with the interconnections. If/When that happens, I can put forward a better merge request.

As an aside, I did write a v program that can handle the amalgamate functionality that amalgamator.js did and the one I wrote is not specific to libgc but can be controlled with command line parameters. Although, it does take several parameters to accomplish the same results that amalgamator.js produces. And, I am unsure of the reasoning behind some of the decisions made in amalgamator.js.

If I pursue this further, I probably need to have some discussion with people more knowledgeable than me on what is actually desired.

And, one last thing. I notice source files with a .vv suffix. I don't see any documentation on what the intent of that suffix is. Enlightenment would be appreciated.

@JalonSolov
Copy link
Contributor

tcc and gcc/clang sometimes output different objects, especially debug info. That's why it is recommend that if you want to debug with gdb, that you ensure you compiled with gcc rather than tcc. Just as one example.

I was under the impression that libgc.a was built separately for each compiler used, for this very purpose. However, @spytheman would know for certain.

As for .vv files - they are regular V files, but named that way so they are NOT picked up by normal builds. They are used during testing.

@kimshrier kimshrier changed the title libgc: update libgc to commit 865c593 libgc: update libgc to commit 6d372272 Aug 1, 2024
@spytheman
Copy link
Member

I think the most recent failure here https://github.com/vlang/v/actions/runs/10230615523/job/28305629335?pr=21822 is due to thirdparty/tcc/lib/libgc.a being out of date with the updated ./thirdparty/libgc/include/gc/gc.h (which does have a reference to GC_noop1_ptr, which the older libgc.a lacks).

@spytheman
Copy link
Member

If I do the following, I can get it to compile and run locally, even with the outdated libgc.a:

#0 13:26:48 ^ update-libgc /v/oo>./v -showcc -cc tcc -no-retry-compilation -stats vlib/v/slow_tests/keep_args_alive_test.c.v
> C compiler cmd: '/space/v/oo/thirdparty/tcc/tcc.exe' '@/tmp/v_1000/keep_args_alive_test.01J4EETXRYAE2W52SBFY7VD11F.tmp.c.rsp'
> C compiler response file "/tmp/v_1000/keep_args_alive_test.01J4EETXRYAE2W52SBFY7VD11F.tmp.c.rsp":
  -fwrapv -o "/space/v/oo/vlib/v/slow_tests/keep_args_alive_test" -D GC_BUILTIN_ATOMIC=1 -D GC_THREADS=1 -I "/space/v/oo/thirdparty/libgc/include" -I "/space/v/oo/thirdparty/stdatomic/nix" -L "/usr/lib/gcc/x86_64-linux-gnu/6" -L "/usr/lib/gcc/x86_64-linux-gnu/7" -L "/usr/lib/gcc/x86_64-linux-gnu/8" -L "/usr/lib/gcc/x86_64-linux-gnu/9" -L "/usr/lib/gcc/x86_64-linux-gnu/10" -L "/usr/lib/gcc/x86_64-linux-gnu/11" -L "/usr/lib/gcc/x86_64-linux-gnu/12" -L "/usr/lib/gcc/x86_64-linux-gnu/13" -L "/usr/lib/gcc/x86_64-linux-gnu/14" -L "/usr/lib/gcc/x86_64-redhat-linux/6" -L "/usr/lib/gcc/x86_64-redhat-linux/7" -L "/usr/lib/gcc/x86_64-redhat-linux/8" -L "/usr/lib/gcc/x86_64-redhat-linux/9" -L "/usr/lib/gcc/x86_64-redhat-linux/10" -L "/usr/lib/gcc/x86_64-redhat-linux/11" -L "/usr/lib/gcc/x86_64-redhat-linux/12" -L "/usr/lib/gcc/x86_64-redhat-linux/13" -L "/usr/lib/gcc/x86_64-redhat-linux/14" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/6" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/7" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/8" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/9" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/10" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/11" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/12" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/13" -L "/usr/lib/gcc/x86_64-pc-linux-gnu/14" -I "/space/v/oo/vlib/v/slow_tests" "/tmp/v_1000/keep_args_alive_test.01J4EETXRYAE2W52SBFY7VD11F.tmp.c" -std=gnu99 -D_DEFAULT_SOURCE -bt25 "/space/v/oo/thirdparty/tcc/lib/libgc.a" -ldl -lpthread -latomic
        V  source  code size:      29495 lines,     789510 bytes,   269 types,    21 modules,   147 files
generated  target  code size:      26666 lines,     970685 bytes
compilation took: 460.776 ms, compilation speed: 64011 vlines/s
running tests in: vlib/v/slow_tests/keep_args_alive_test.c.v
      OK     363.705 ms     1 assert  | main.test_keep_args_alive_attribute()
     Summary for running V tests in "keep_args_alive_test.c.v": 1 passed, 1 total. Elapsed time: 363 ms.
#0 13:26:51 ^ update-libgc /v/oo>
#0 13:26:51 ^ update-libgc /v/oo>git diff
diff --git vlib/v/slow_tests/keep_args_alive_test_c.h vlib/v/slow_tests/keep_args_alive_test_c.h
index 262e32624..b3e4a0f42 100644
--- vlib/v/slow_tests/keep_args_alive_test_c.h
+++ vlib/v/slow_tests/keep_args_alive_test_c.h
@@ -42,3 +42,17 @@ static int calc_expr_after_delay(int* a, int b, int* c) {
        free(keep);
        return z;
 }
+
+
+
+GC_API void GC_CALL GC_noop1_ptr(volatile void *p) {
+   # if CPP_PTRSZ > CPP_WORDSZ
+   #   if defined(AO_HAVE_store) && defined(THREAD_SANITIZER)
+         GC_cptr_store(&GC_noop_sink_ptr, (ptr_t)CAST_AWAY_VOLATILE_PVOID(p));
+   #   else
+         GC_noop_sink_ptr = (ptr_t)CAST_AWAY_VOLATILE_PVOID(p);
+   #   endif
+   # else
+       GC_noop1((u64)p);
+   # endif
+}
#0 13:26:53 ^ update-libgc /v/oo>

@spytheman
Copy link
Member

spytheman commented Aug 4, 2024

The reason that GC_noop1_ptr is needed with the newest libgc, is that the GC_reachable_here macro expands to GC_noop1_ptr in it.

A call to GC_reachable_here() is generated by V in
vlib/v/gen/c/fn.v:2595:g.writeln('GC_reachable_here(__tmp_arg_${tmp_cnt_save + i});')

and the test vlib/v/slow_tests/keep_args_alive_test.c.v is the one, that tests that codegen situation.

@spytheman
Copy link
Member

spytheman commented Aug 4, 2024

Something like the following, also works.
It is also not specific to vlib/v/slow_tests/keep_args_alive_test.c.v.

#0 13:50:18 ^ update-libgc /v/oo>git diff
diff --git thirdparty/libgc/include/gc.h thirdparty/libgc/include/gc.h
index 568f4222d..1ec6ada7b 100644
--- thirdparty/libgc/include/gc.h
+++ thirdparty/libgc/include/gc.h
@@ -1,2 +1,6 @@
 /* This file is installed for backward compatibility. */
 #include "gc/gc.h"
+
+__attribute__ ((weak)) GC_API void GC_CALL GC_noop1_ptr(volatile void *p) {
+   GC_noop1((u64)p);
+}
#0 13:50:19 ^ update-libgc /v/oo>

@spytheman
Copy link
Member

Another option would be to update the libgc.a copy in the tccbin repo .

@spytheman spytheman merged commit 49f5ebf into vlang:master Aug 4, 2024
56 checks passed
@spytheman
Copy link
Member

I did both, and I intend to gradually update all precompiled libgc.a files in the tccbin repos (currently only https://github.com/vlang/tccbin/tree/thirdparty-linux-amd64 is updated).

@kimshrier
Copy link
Contributor Author

I was in the process of picking up more commits from https://github.com/ivmai/bdwgc, there have been about 20 since 6d372272. This is because my original intent of this exercise, was to make the vlib/v/slow_tests/keep_args_alive_test.c.v test less flaky. I am continuing to test with newer versions of bdwgc and looking at that code to see if I can track down the problem.

When I think I have something working on the machine/os combinations I have access to, it is obvious that I need to coordinate with you. I don't have any access to machines running Windows and the machines I do have access to that run Linux are not owned by me so I am hesitant to use them for personal projects. As a result, I would need to rely on the kindness of others to provide the precompiled libraries for libgc.a on machine/os combinations I don't have.

Going forward, what is the best way to coordinate?

@spytheman
Copy link
Member

Going forward, what is the best way to coordinate?

Not sure about the best way, since currently, there are too many moving parts, that have to be adjusted manually, which require coordination indeed, and is not easily amenable to automated tests and tries.

The ideal variant for frequent testing of new versions of bdwgc, would perhaps be, to have a bunch of build scripts, in the main V repo, in .github/workflows/, for the supported platforms, that can prepare libgc_CI.a corresponding to the current platform, that could be ran on the CI, then modifying vlib/builtin/builtin_d_gcboehm.c.v to use that new libgc_CI.a if available, in preference to the one in thirdparty/tcc/lib/libgc.a . Then testing with a new version, will boil down to just updating the subtree hash, then making a PR, and waiting for the CI to run. It will be also nice, to have a separate vlang/libgc repo, with successfully prebuilt binaries for that library, for the supported platforms, so that it can be independent from thirdparty/tcc.

That however is complex, and I do not want to do it, while changing the version at the same time, since there are many moving parts, and thus things to break.

Perhaps for now, it is enough to just make a PR when you are ready with the updated version, then I can test it on my personal machines, running macos 14.5, Windows 11 and Ubuntu 20.04 , just like I did the one here.

@spytheman
Copy link
Member

I've started work on that in #21996 (it contains only the script for building TCC for thirdparty-linux-amd64 for now).

@Ekopalypse
Copy link
Contributor

Ekopalypse commented Aug 22, 2024

@spytheman

__attribute__ is not known to msvc, so I assume it should have a guard like

/* This file is installed for backward compatibility. */
#include "gc/gc.h"

#ifdef _MSC_VER
// I'm not sure there isn't something to do here!
#else
__attribute__ ((weak)) GC_API void GC_CALL GC_noop1_ptr(volatile void *p) {
   GC_noop1((u64)p);
}
#endif

@spytheman
Copy link
Member

@Ekopalypse thanks, you are right. It should be fixed in latest V.

I've tested with v -gc boehm -cc msvc -showcc examples/hello_world.v .

@Ekopalypse
Copy link
Contributor

Yes, can confirm that my project is compiling again. Internal tests passed. Thank you for the quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants