Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelism in Sage: just use value of 'MAKE' #12016

Closed
jhpalmieri opened this issue Nov 12, 2011 · 67 comments
Closed

parallelism in Sage: just use value of 'MAKE' #12016

jhpalmieri opened this issue Nov 12, 2011 · 67 comments

Comments

@jhpalmieri
Copy link
Member

The various parallel aspects of Sage should be controlled by setting the -j (possible also -l) flags in MAKE or MAKEFLAGS. That is, if MAKE='make -j16', then

  • running make will build spkg's in parallel, using 16 processes (this was done in Remove the necessity to set SAGE_PARALLEL_SPKG_BUILD #11959). This is standard make behaviour, but we need to patch spkg/standard/deps to ensure that make recognizes that we are doing a recursive make.

  • running make ptestlong or sage -tp 0 <files> will doctest in parallel using 16 threads. If the -j flag in MAKE is not set, then determine the number of threads as before: min(8, cpu_count()).

  • running ./sage -b will build the Sage library using 16 threads. If the -j flag in MAKE is not set, then use only 1 thread.

Testing this ticket: you can set the environment variable SAGE_NUM_CORES to the number of cores you want to pretend to have. For example, running

SAGE_NUM_CORES=24 make ptestlong

should run 8 threads (see sage-num-threads.py; this is undocumented because the only purpose I see is for testing this ticket).

Notes:
With the patches applied, building spkgs in parallel works well, except for race conditions in:

Apply:

  1. attachment: 12016-root.patch to the SAGE_ROOT repository.
  2. attachment: 12016-base.patch to spkg/base.
  3. attachment: 12016-scripts.patch and attachment: trac_12016-scripts-ref.patch to the SCRIPTS repository.
  4. attachment: 12016-sage.patch to the Sage library.

See also: #6495 to implement the same behavior for doc building.

Dependencies: sage-4.8.alpha4

CC: @jdemeyer @nexttime

Component: build

Author: John Palmieri, Jeroen Demeyer

Reviewer: John Palmieri, Jeroen Demeyer

Merged: sage-4.8.alpha5

Issue created by migration from https://trac.sagemath.org/ticket/12016

@jdemeyer
Copy link

comment:1

We should remove NUM_THREADS from the top-level Makefile.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

Changed author from John Palmieri to John Palmieri, Jeroen Demeyer

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

comment:6

John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sage-sage or sage-env to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES (which the user could set by hand; if not set, the value comes from MAKE or MAKEFLAGS; if no -j option is given, set to 1).

@jhpalmieri
Copy link
Member Author

comment:7

Replying to @jdemeyer:

John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sage-sage or sage-env to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES

Sounds okay.

(which the user could set by hand; if not set, the value comes from MAKE or MAKEFLAGS; if no -j option is given, set to 1).

If you run "sage -tp ", should you use 1 process or more than 1? The "-tp" option means "parallel", so perhaps the default should be more than 1 in this case. In other cases (like docbuilding, for example), the default should be 1.

@jhpalmieri
Copy link
Member Author

comment:8

For something like make -j16 ptestlong, how do we recover the number 16? If I execute this command (with MAKE unset), I see

MAKEFLAGS= --jobserver-fds=3,4 -j
MFLAGS=- --jobserver-fds=3,4 -j

but I don't see "16" anywhere in the listing of the environment variables.

@jdemeyer
Copy link

comment:9

Replying to @jhpalmieri:

Replying to @jdemeyer:

John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sage-sage or sage-env to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES

Sounds okay.

(which the user could set by hand; if not set, the value comes from MAKE or MAKEFLAGS; if no -j option is given, set to 1).

If you run "sage -tp ", should you use 1 process or more than 1? The "-tp" option means "parallel", so perhaps the default should be more than 1 in this case. In other cases (like docbuilding, for example), the default should be 1.

Sure, that is what I meant. We should compute the value once, but in sage -tp we can still decide to use the number of processes.

@jdemeyer
Copy link

comment:10

Replying to @jhpalmieri:

For something like make -j16 ptestlong, how do we recover the number 16? If I execute this command (with MAKE unset), I see

MAKEFLAGS= --jobserver-fds=3,4 -j
MFLAGS=- --jobserver-fds=3,4 -j

but I don't see "16" anywhere in the listing of the environment variables.

You are right. I had not tried this before. So let's scrap that idea.

@jhpalmieri
Copy link
Member Author

Attachment: trac_12016-root.v2.patch.gz

Attachment: trac_12016-sage.v2.patch.gz

@jhpalmieri

This comment has been minimized.

@jhpalmieri
Copy link
Member Author

comment:11

Here are new patches. These use SAGE_NUM_THREADS if it is set, and otherwise try to extract a number from MAKE. (My method for doing this is probably not ideal, but the options This is done in sage-env. Running sage -b should use this setting now, also.

I don't know how to get the number of threads from

make -j16 ptestlong

so I removed that from the "to do" list in the ticket description.

In the file sage-ptest, I removed the "FIXME" comment in

    try:
        # FIXME: Nice, but <NUMTHREADS> should immediately follow '-tp' etc.,
        #        i.e., be the next argument. We might have file or directory
        #        names that properly convert to an int...
        numthreads = int(argv[1])
        infiles = argv[2:]
    except ValueError: # can't convert first arg to an integer: arg was probably omitted
        numthreads = 1

The script sage-ptest doesn't get a "tp" argument; it is instead called by sage-sage, and the way it is called, the first argument to sage-ptest is precisely what ever came after "-tp". So I don't think anything needs fixing. If we ever rewrite sage-sage (#21) to properly parse arguments, we can make sure that "-tp" has a default numerical argument of zero.

@jdemeyer
Copy link

comment:12

Attachment: trac_12016-scripts.v2.patch.gz

  1. If you are going to use the string "auto" for automatic, you might as well use "infinite" for infinite, instead of zero.

1b) Alternatively: use 0 for automatic (as is sage -tp 0) and 999999 for unlimited. This would mean less special-case code, since a value like 999999 is more than what a user would normally specify (for the forseeable future).

  1. In sage-ptest, unlimited really should be unlimited. Not max(8, # of cpus).

  2. We should also do the following long-needed fix here: setting MAKE to make -j16 is very standard in Sage circles, but not actually the prefered way according to the GNU make folks. One really should use MAKEFLAGS instead (similar to the distinction between CC and CFLAGS). This is why you often see an error like "make -jN forced in sub-make. Disabling job server mode" (freely quoted from my mind). So, when MAKEFLAGS exists, assume that make understands the flags and do not pass flags in MAKE.

  3. Why did you change

sage-build "$@" || exit $?

to

sage-build "$@"

in the sage_build() function in sage-sage?

  1. You reverted a lot of changes that I made to doc/en/developer/doctesting.rst. Why? I actually tried all the examples in the documentation and pasted the exact output I got (on sage.math.washington.edu). Surely, this is better than keeping the outdated (and in many cases totally wrong) output.

I am planning to work further on this, so don't change any code yet. But please give your opinion.

@jhpalmieri
Copy link
Member Author

comment:13

Replying to @jdemeyer:

  1. If you are going to use the string "auto" for automatic, you might as well use "infinite" for infinite, instead of zero.

1b) Alternatively: use 0 for automatic (as is sage -tp 0) and 999999 for unlimited. This would mean less special-case code, since a value like 999999 is more than what a user would normally specify (for the forseeable future).

Sounds good to me.

  1. In sage-ptest, unlimited really should be unlimited. Not max(8, # of cpus).

Okay.

  1. We should also do the following long-needed fix here: setting MAKE to make -j16 is very standard in Sage circles, but not actually the prefered way according to the GNU make folks. One really should use MAKEFLAGS instead (similar to the distinction between CC and CFLAGS). This is why you often see an error like "make -jN forced in sub-make. Disabling job server mode" (freely quoted from my mind). So, when MAKEFLAGS exists, assume that make understands the flags and do not pass flags in MAKE.

I'm willing to try that, especially if you write the patch instead of me :)

  1. Why did you change
sage-build "$@" || exit $?

to

sage-build "$@"

in the sage_build() function in sage-sage?

That was a mistake.

  1. You reverted a lot of changes that I made to doc/en/developer/doctesting.rst. Why? I actually tried all the examples in the documentation and pasted the exact output I got (on sage.math.washington.edu). Surely, this is better than keeping the outdated (and in many cases totally wrong) output.

Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library". So I started from scratch, at which point I just put in the changes that I felt were relevant to the ticket or easy for me to change. Probably I should have started with your patch and added the section (with modifications) back in.

It looks like #9739 broke doctesting of .sage files. We should fix that (not on this ticket).

@jhpalmieri
Copy link
Member Author

comment:14

Replying to @jhpalmieri:

It looks like #9739 broke doctesting of .sage files. We should fix that (not on this ticket).

See #12069.

@jdemeyer
Copy link

comment:15

Replying to @jhpalmieri:

Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library".

I removed that because it totally didn't work. But this is probably #12069. How about we leave the last section of the documentation alone in this ticket but then change the documentation in #12069?

@jhpalmieri
Copy link
Member Author

comment:16

Replying to @jdemeyer:

Replying to @jhpalmieri:

Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library".

I removed that because it totally didn't work. But this is probably #12069. How about we leave the last section of the documentation alone in this ticket but then change the documentation in #12069?

Okay, sounds fine to me.

@jdemeyer
Copy link

jdemeyer commented Dec 9, 2011

comment:36

Replying to @jhpalmieri:

  • zlib on OS X (2 cores) fails most of the time with MAKE='make -j -l3'. Here's a log.
  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Since my patches properly implement parallel building, it also means that more packages are actually being built in parallel. So I think we are simply triggering bugs in the various packages. For example, I never had problems with Python before, but I did have problems with this patch (fixed in #12096).

I cannot explain why "make -j -lN" would fail but "make -jN" would work.

@jdemeyer
Copy link

jdemeyer commented Dec 9, 2011

comment:37

Replying to @jhpalmieri:

  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Well, singular is in the list of fishy packages, see the ticket description.

@jhpalmieri
Copy link
Member Author

comment:38

Just for fun, I modified deps so singular would build all by itself in the build process (I made it depend on linbox and scipy, so it was the last package to be built before the sage package). Then it built fine using make -j -l30.

@jdemeyer
Copy link

jdemeyer commented Dec 9, 2011

comment:39

I think it is truly a coincidence that "make -j -lN" fails. I managed to make singular fail with just "make -jN", hopefully fixed by #12138.

@jdemeyer
Copy link

jdemeyer commented Dec 9, 2011

comment:40

Replying to @jhpalmieri:

This looks good to me. Is it ready for review? Am I allowed to review it since I wrote early drafts of some of the patches?

For a future ticket, it would be nice if you could set MAKE='make -j -lN', for some reasonable choice of N, and have it work. When I try this, I have problems with the following spkgs, and I'm not sure why:

  • zlib on OS X (2 cores) fails most of the time with MAKE='make -j -l3'. Here's a log.
  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Are these about builds of the total Sage source in which these fail, or are these separate installs like sage -f ...?

@jdemeyer
Copy link

jdemeyer commented Dec 9, 2011

comment:41

When testing with sage -f, the proper way to test is using

MAKEFLAGS="j50" ./sage -f ...

@jdemeyer
Copy link

Changed dependencies from sage-4.8.alpha3 + #12096 to sage-4.8.alpha3 + #12096, #12137, #12138

@jhpalmieri
Copy link
Member Author

comment:43

In the following lines from sage-spkg

# Handle -n, -t, -q options for recursive make 
# See Trac #12016. 
if echo "$MAKE $MAKEFLAGS -$MAKEFLAGS" |grep -e ' -[A-Za-z]*[qnt]' >/dev/null; then 
    if echo "$MAKE $MAKEFLAGS -$MAKEFLAGS" |grep -e ' -[A-Za-z]*q' >/dev/null; then 
        exit 1 
    else 
        exit 0 
    fi 
fi 

do we also need to handle the long versions? (I don't think so, but I thought I would ask.)

More importantly, on OpenSolaris, or at least on David Kirkby's machine hawk, the default 'grep' command doesn't take a -e option. Can we just omit it? The command still seems to function on sage.math, on OS X, and on OpenSolaris.

@jhpalmieri
Copy link
Member Author

comment:44

I cannot explain why "make -j -lN" would fail but "make -jN" would work.

One reason is that make -j -lN puts a limit on starting new processes, and that might be what's causing the problems. I could force the old zlib spkg to fail on sage.math by running MAKEFLAGS='j -l2' ./sage -f ... but not with MAKEFLAGS='j -l30' .... I don't know if setting MAKE="$MAKE -j1 -l in spkg-install is the right way to fix this for problematic spkgs (like singular?), but it might be worth trying.

@jdemeyer
Copy link

comment:45

Replying to @jhpalmieri:

In the following lines from sage-spkg

# Handle -n, -t, -q options for recursive make 
# See Trac #12016. 
if echo "$MAKE $MAKEFLAGS -$MAKEFLAGS" |grep -e ' -[A-Za-z]*[qnt]' >/dev/null; then 
    if echo "$MAKE $MAKEFLAGS -$MAKEFLAGS" |grep -e ' -[A-Za-z]*q' >/dev/null; then 
        exit 1 
    else 
        exit 0 
    fi 
fi 

do we also need to handle the long versions? (I don't think so, but I thought I would ask.)

Well, this would only be needed if the user does something very silly like

MAKE="make --dry-run" ./sage -f ...

More importantly, on OpenSolaris, or at least on David Kirkby's machine hawk, the default 'grep' command doesn't take a -e option. Can we just omit it?

Probably yes, but it might be safer to replace the leading space by a [ ].

@jdemeyer
Copy link

Attachment: 12016-scripts.patch.gz

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

comment:46

Attachment: 12016-base.patch.gz

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

Changed dependencies from sage-4.8.alpha3 + #12096, #12137, #12138 to sage-4.8.alpha3 + #12096, #12137, #12138, #12139

@jhpalmieri

This comment has been minimized.

@jhpalmieri
Copy link
Member Author

comment:49

I'm happy with this except for a few small changes I want to make in sage-num-threads.py: we should use subprocess instead of popen, since popen has been deprecated. Also, we should catch errors if sysctl fails to run -- it's not present on all platforms. Finally, we might as well search for max-load in addition to load-average. See the referee patch. If you're happy with that, the whole thing can get a positive review.

@jhpalmieri
Copy link
Member Author

Attachment: trac_12016-scripts-ref.patch.gz

scripts repo

@jdemeyer
Copy link

comment:50

Looks good to me.

I am still slightly worried about the intermittent sage0.py doctest failures though...

@jdemeyer
Copy link

Changed dependencies from sage-4.8.alpha3 + #12096, #12137, #12138, #12139 to sage-4.8.alpha4

@jdemeyer
Copy link

Merged: sage-4.8.alpha5

@jdemeyer

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants