-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symbol pollution for pr8132 #8262
Conversation
@bosilca Last I recall, "treematch" is an import from another code project, yes? I seem to recall that we are not supposed to make changes here for it, and there is no symbol "leakage" issue since it isn't an OMPI code. |
Yes, treematch is an import, we should avoid or minimize the changes that might put our version at odds with the original. |
@@ -45,7 +45,7 @@ BEGIN_C_DECLS | |||
extern int mca_common_monitoring_output_stream_id; | |||
extern int mca_common_monitoring_enabled; | |||
extern int mca_common_monitoring_current_state; | |||
extern opal_hash_table_t *common_monitoring_translation_ht; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution would have been to move the mca_common_monitoring_get_world_rank function into the .c file (it is too large to be inlined anyway), and then this symbol could be restricted to the scope of the .c file (and can therefore be completely removed from the header file).
@@ -179,14 +179,14 @@ void ADIOI_GPFS_ReadStridedColl(ADIO_File fd, void *buf, int count, | |||
/* One-sided aggregation needs the amount of data per rank as well because the difference in | |||
* starting and ending offsets for 1 byte is 0 the same as 0 bytes so it cannot be distiguished. | |||
*/ | |||
if ((gpfsmpio_read_aggmethod == 1) || (gpfsmpio_read_aggmethod == 2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This provides little conveniency while increasing the burden on the maintainer of the ROMIO integration. I would carefully assess the cost.
@@ -37,13 +37,13 @@ long bglocklessmpio_f_type; | |||
int gpfsmpio_bg_nagg_pset; | |||
int gpfsmpio_pthreadio; | |||
int gpfsmpio_p2pcontig; | |||
int gpfsmpio_write_aggmethod; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing that some of these symbols have now a different prefix than the rest.
@@ -47,7 +47,7 @@ struct mca_mtl_request_t; | |||
* These are called internally by the library when the send | |||
* is completed from its perspective. | |||
*/ | |||
extern void (*send_completion_callbacks[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this array is necessary anymore.
@@ -2,7 +2,7 @@ | |||
#include <stdio.h> | |||
#include "IntConstantInitializedVector.h" | |||
|
|||
int intCIV_isInitialized(int_CIVector * v, int i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As indicated by @rhc54 this is imported code that should mostly be left as is. How is this symbol even spilling out ? It seems to be used only in this file and it is not declared as extern in the header !?
@@ -183,13 +183,13 @@ static int mca_vprotocol_pessimist_component_finalize(void) | |||
int mca_vprotocol_pessimist_enable(bool enable) { | |||
if(enable) { | |||
int ret; | |||
if((ret = vprotocol_pessimist_sender_based_init(_mmap_file_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand why this symbol is globally visible ?
Thanks for the review, if treematch isn't to be modified, then I think PR8132 has got to leave it out of libmpi.so, since it's by far the largest offender in terms of globally exported symbols with no reasonable prefix in front of them. I'll back out some of these changes, especially treematch and romio and see where the test stands then. I don't really object to allowing "gpfsmpio_" for example as one of our acceptable prefixes. Let me see what remains after reverting some of these. |
Why? Someone is linking against the "treematch" library (even if it is thru us), so of course the "treematch" symbols are exposed. Your criteria seems somewhat arbitrary here - "gpfsmpio" is okay because it is an IBM symbol? Or is there some other reason why that prefix should be considered okay? Perhaps the better approach would be for the community to decide what symbols represent an issue and which don't, and then generate a PR to deal with them? I don't recall such a discussion taking place - perhaps I missed it? |
If treematch was only exporting names like I have another PR opened for a testcase which is my proposal for what exported symbol names should be allowed/disallowed. If we're just being practical, then I'd allow a pretty long list of acceptable prefixes. If we're trying to be really clean and easy to document I'd shorten the list more. |
You're missing my point, Mark - my point is that this PR is making rather arbitrary delineations between what is and isn't an acceptable prefix. This is something the community needs to discuss in one of its meetings, and not on a low-bandwidth PR. It needs to factor in how we are handling 3rd-party code bases in general. We understand how to deal with our own internal code - but treematch and ROMIO aren't in that category. |
I agree that those types of symbols are too generic to be exporting, and should be cleaned up. Maybe we can target those such symbols and minimize the pr? |
I don't disagree with that assessment - but I would defer further work until the community can figure out how they want such things addressed. Again, this is a 3rd-party package - it isn't part of the OMPI code base, and we aren't free to edit it at will. Let's put it on the schedule for next Tues and see what we can figure out. |
Why not taking a drastically different approach, and allow the linker to only export the MPI API ? As an example libtool has -export-symbols to provide such a capability. |
Interesting idea - how would it impact the dll's? I thought one reason symbols are remaining visible is so that the components can access symbols in the core library? I suppose if we are going to "slurp" all the components into the core lib, then maybe all that goes away and it won't really matter - but that would (I should think) mean that all components have to be absorbed. |
I understand the objection to changing external code, that's why I said I intend to remove all the treematch and romio changes from this PR. The part I was objecting to with my example symbols is it sounded like you were saying since we're linking against treematch that a user should expect all the treematch symbols to be exposed and users should thus know not to conflict with them. @boslica how does -export-symbols that work with symbols used internally but across multiple .c files? If the exports can be controlled that strongly that would be great. |
It is a point of concern, I agree - a user should expect to see 3rd-party symbols, but may not realize the full roster of packages being pulled in by OMPI. We've run into this before, particularly with the conflict between libev and libevent. Not sure how one goes about fully advertising to users "here are ALL the symbols you might see, depending upon which plugins are active". I don't know the right answer, frankly. Historically, our approach has been pretty simple:
Lately, we have been moving away from embedding 3rd-party packages. I'd expect to no longer see libevent, hwloc, or PMIx pretty in v6. PRRTE might stick around for another major release, though it seems to be getting picked up by downstream packagers and may no longer require it after v5. This is why I'm advocating for a discussion. If we are going to "un-embed" all 3rd-party packages, then many of these problems go away. It should at least be factored into our near-term approach. |
I'm not sure about the portability of this, but I at least like the functionality of the GCC 4.0 stream's Eg I made a play model with: Anyway the above only worked if |
That's exactly what our |
Symbol name pollution fix: adding "ompi_" or similar prefixes to a bunch of symbols, or making some symbols static if they were very isolated. Signed-off-by: Mark Allen <markalle@us.ibm.com>
8d75629
to
83bf160
Compare
Right now any function that's not static is globally visible, so if it spans multiple .c files and thus can't be static, it's out there for potential conflict with an app. So for a lot of those symbols above where you were asking how it's getting accessed from outside its module -- it wasn't really outside the module, it just was used in two different .c files within the module and therefore isn't static, and at least with our current default build that means it's globally visible. I kind of like an approach that's more explicit and just exporting I just repushed with treematch and romio removed. For now it still has the same approach in |
We are discussing this in context of #8132. |
There's a few issues at play here:
|
@jsquyres about how many libfoo.so, I believe it's still one. I made a toy program to see empirically and that's how it went anyway. And you can have some messes in the symbol binding there too, like suppose
If you just have this situation:
But as soon as libmpi.so or even componentA.so is linked against a libsomething.so that has generic symbols, then collisions become possible |
@markalle Is this still an issue on master? |
Yes, I just built on master to see what the current state of the world is and now that PR #8132 is in, libmpi.so is indeed filled with global symbols with generic names that can too-easily conflict with user apps. This specific PR I think inspired a handful of fixes into treematch, but many conflicts remain. How to see the problem: *** For a simple by-hand first pass to see approximately what's there: This shows all the global symbols, and removes some basics like MPI_Foo, ADIOI_Foo, ompi_foo, that should be considered acceptable as exports. Just the first few lines of global symbol name pollution:
*** making a testcase that fails, to demonstrate why pollution is a problem I think any global export lacking some kind of prefix should be considered a bug. Here's an example of why they matter in case it's not clear why symbol name pollution causes bugs (ie, I'm not raising an issue about aesthetics here, it's about failing code). If I browse the list of functions being exported and pick "append_frag_to_ordered_list" there's no reason a user writing an Application wouldn't potentially use that exact function name, and if they do, maybe creating an app that looks like this: #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mpi.h>
static void
myprint(char *mesg)
{
char str[4096];
printf("*** %s -- in App's function\n", mesg);
fflush(stdout);
sprintf(str,
"gdb -q -batch -ex bt -p %d < "
"/dev/null 2>&1 | grep -v "
" -e '^Missing separate debuginfo for ' "
" -e '^Warning:.*No such file or directory' "
" -e 'Inferior .*process .*detached' "
" -e ' zypper install -C ' | "
"sed -e 's/^/[p%d] /'", getpid(), getpid());
system(str);
fflush(stdout);
}
void append_frag_to_ordered_list() { myprint("append_frag_to_ordered_list"); }
int
main(int argc, char *argv[])
{
int myrank, nranks;
char myhost[MPI_MAX_PROCESSOR_NAME];
int len;
char *sbuf, *rbuf;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &nranks);
MPI_Get_processor_name(myhost, &len);
sbuf = malloc(10000000);
rbuf = malloc(10000000);
MPI_Bcast(sbuf, 100, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Bcast(sbuf, 1000, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Bcast(sbuf, 10000, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Bcast(sbuf, 100000, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Bcast(sbuf, 1000000, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Bcast(sbuf, 10000000, MPI_CHAR, 0, MPI_COMM_WORLD);
MPI_Sendrecv(sbuf, 10, MPI_CHAR, (myrank+1)%nranks, 99,
rbuf, 10, MPI_CHAR, (myrank+nranks-1)%nranks, 99,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Sendrecv(sbuf, 1000, MPI_CHAR, (myrank+1)%nranks, 99,
rbuf, 1000, MPI_CHAR, (myrank+nranks-1)%nranks, 99,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Sendrecv(sbuf, 100000, MPI_CHAR, (myrank+1)%nranks, 99,
rbuf, 100000, MPI_CHAR, (myrank+nranks-1)%nranks, 99,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Sendrecv(sbuf, 10000000, MPI_CHAR, (myrank+1)%nranks, 99,
rbuf, 10000000, MPI_CHAR, (myrank+nranks-1)%nranks, 99,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Finalize();
free(sbuf);
free(rbuf);
printf("past finalize at rank %d/%d (on %s)\n", myrank, nranks, myhost);
exit(0);
} And run with something like
The stack trace comes up as deep in OMPI code where suddenly it thinks it's calling OMPI's "append_frag_to_ordered_list" but it's actually calling the user Application's redefinition.
There are multiple ways to solve it, but I think the simplest is make sure all the globally exported symbols have some prefix like "ompi_" in front of them, and also have a testcase running that looks for un-prefixed exports. I checked in a testcase that I guess isn't actually being run at open-mpi/ompi-tests-public#3 that would error about the state of today's OMPI master |
@open-mpi/ompi-gatekeeper-5-0-x The issues discussed on this PR have implications for the v5.0.0 release. Bottom line: since components are now, by default, slurped into the base libraries, we are seeing a lot of symbol pollution. Technically, this has likely been around for a long time, but it may not have been noticed since the default was to build components as DSOs. This should probably be resolved, or at least documented. |
See also #10708. |
Closing this - thanks for opening #10708. @markalle if you see anything missing from this PR in the efforts to clean up these symbols from @drwootton please port them over from this pr. Also: I fixed the labels to correctly show the target branch. |
There's an incoming PR
#8132
that makes more MCAs get built into the main library, and that brings more potential symbol name pollution.
My testcase
open-mpi/ompi-tests-public#3
identified a bunch of symbols without recognized prefixes when I built 8132, so this PR adds a bunch of prefixes and/or makes symbols static if they didn't look like they were used anywhere.
I think 2/3 of these changes are in treematch, but they span other files too.