Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99127: Allow some features of syslog to the main interpreter only #99128

Merged
merged 23 commits into from
Nov 29, 2022

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Nov 5, 2022

@corona10
Copy link
Member Author

corona10 commented Nov 5, 2022

(.oss) ➜  cpython git:(gh-99127) ✗ ./python.exe -m test test_syslog -R 3:3
Raised RLIMIT_NOFILE: 256 -> 1024
0:00:00 load avg: 3.65 Run tests sequentially
0:00:00 load avg: 3.65 [1/1] test_syslog
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 704 ms
Tests result: SUCCESS

Copy link
Contributor

@ronaldoussoren ronaldoussoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider: what happens when multiple sub interpreters use the syslog extension. The current implementation always calls openlog(3) at least once before calling syslog(3), see the check in syslog_syslog_impl.

That results in interference between sub interpreters when in one of them the python code explicitly calls syslog.openlog and the other doesn't.

IMHO the check in syslog_syslog_impl is not necessary because it results in a call to openlog(3) with parameters that match the default behaviour of the syslog library. Removing it is a fairly minor behaviour change though (syslog.syslog will currently override any calls to openlog that were done in C before the first call to syslog.syslog and will stop doing that when removing the check in syslog_syslog_impl).

Modules/syslogmodule.c Outdated Show resolved Hide resolved
@corona10 corona10 restored the gh-99127 branch November 5, 2022 13:57
@corona10 corona10 reopened this Nov 5, 2022
@corona10
Copy link
Member Author

corona10 commented Nov 5, 2022

Something to consider: what happens when multiple sub interpreters use the syslog extension. The current implementation always calls openlog(3) at least once before calling syslog(3), see the check in syslog_syslog_impl.

Thanks that's why I got a headache which makes me close the issue but thanks to suggest the solution.

IMHO the check in syslog_syslog_impl is not necessary because it results in a call to openlog(3) with parameters that match the default behaviour of the syslog library.

Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right?
Please let me know if I understood it wrongly.

@corona10
Copy link
Member Author

corona10 commented Nov 5, 2022

Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right?

Or did you intended implicit call of openlog(3) while calling syslog(3)?

@corona10 corona10 requested a review from vstinner November 6, 2022 03:17
@corona10 corona10 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 6, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 952ff46 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 6, 2022
@corona10
Copy link
Member Author

corona10 commented Nov 6, 2022

Failure tests from buildbot/PPC64LE Fedora were not related to this change.

@erlend-aasland erlend-aasland removed their request for review November 6, 2022 19:09
@erlend-aasland
Copy link
Contributor

Unfortunately I don't have time to review this right now. Maybe @serhiy-storchaka can have a look? He recently did some work on this module.

@ronaldoussoren
Copy link
Contributor

Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right?
Please let me know if I understood it wrongly.

At the C level calling openlog(3) before calling syslog(3) is not necessary, although the spec is not entirely clear about this. The linux manual page does clearly mention that openlog(3) is optional, and matches what I remember from older Unix systems.

The call to openlog in syslog_syslog_impl basically resets the default system state (use the process name for the ident and log using the LOG_USER facility). Leaving that out should therefore be harmless, but this is technically a user-visible change of behaviour:

  • C code calls openlog(...) with same parameters
  • Python code calls syslog.syslog

Currently the Python code resets the values set in the first step, with my proposal it would no longer do this. I'd consider this a good change, but there's bound to be someone affected by this.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stateless syslog C API doesn't seem to be designed to be consumed by multiple interpreters. I'm not sure that making S_ident_o and S_log_open global varibles per interpreter is safe.

Modules/syslogmodule.c Outdated Show resolved Hide resolved
Modules/syslogmodule.c Outdated Show resolved Hide resolved
Modules/syslogmodule.c Outdated Show resolved Hide resolved
{
_syslog_state *state = get_syslog_state(module);
if (state->S_log_open) {
closelog();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if interpreter A calls closelog() and interpreter B (same process!) still uses the Python syslog module, so after closelog() has been called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSIX syslog will transparently call openlog() with default values.

@corona10
Copy link
Member Author

This limitation seems to be new in Python 3.12: can you document this incompatible change?
Also, can you please document in the Python syslog module documentation the behavior/limitations of sub-interpreters?

Oh I missed this, I will push the documentation ASAP.

@corona10
Copy link
Member Author

@vstinner Updated! PTAL

@ericsnowcurrently
Would you like to take a look at the documentation update?
I am not sure it will be natural for native speakers :)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm not good to review English, but the code LGTM :-)


* :func:`syslog.openlog` and :func:`syslog.closelog` are only available from the main interpreter not subinterpreter.
:func:`syslog.syslog` is only available to subinterpreters if :func:`syslog.openlog` was already called from the main interpreter.
(Contributed by Dong-hee Na in :gh:`99127`.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I think that the "Porting to Python 3.12" is a better section for these changes. The "syslog" section is more for new features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can syslog.syslog only be used when the main interpreter has called syslog.openlog? That's a restriction that is not present in the underlying library.

Copy link
Member Author

@corona10 corona10 Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronaldoussoren cc @ericsnowcurrently
This might be the discussion part for this change, from the view of managing for open/close state, subinterpreter should not intervene in changing the state of open/close. And this is why syslog.openlog and syslog.closelog are not available to the subinterpreter side. so if the subinterpreter can call syslog.syslog() without involving the main interpreter, it means that subinterpreter intervenes in the state of open/close so I think that it should be prevented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling openlog (and closelog) is optional in syslog(3) with sensible defaults. There is no need to call openlog at all unless a program wants to change some of the default settings, and that could be done outside of the Python world.

For a use-case like embedding Python in a web server (like mod_python it is more likely that the embedding program does the syslog setup than that this is done in Python code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling openlog (and closelog) is optional in syslog(3) with sensible defaults. There is no need to call openlog at all unless a program wants to change some of the default settings, and that could be done outside of the Python world.

In summary for your suggestion might be:

  • syslog.syslog(): Allow for both the main interpreter and subinterpreters at any condition.
  • syslog.openlog()/closelog(): Allow only for the main interpreter

But if the subinterpreter calls syslog.syslog() after the main interpreter calls the syslog.closelog(), the subinterpreter will neutralize what the main interpreter did. Who will close the syslog.syslog() that is opened by subinterpreter?

cc @ericsnowcurrently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two factors here:

  1. most process-global resources should be managed by the app
  2. we are storing a str object in a static variable (S_ident_o), which is tricky when multiple interpreters are involved

Regarding the first one, this is perhaps something that we need not be so strict about. In Python, "app" means the __main__ module, running in the main interpreter (starting in the main thread). This implies that the global resources modifed via syslog.openlog() and syslog.closelog() should only be managed in the main interpreter. This would be similar to how signal handlers are supported only in the main thread of the main interpreter. However, this could be an overly strict interpretation for syslog. It makes sense to me, but "consenting adults", etc.

Regarding the second factor, we have been (and still are) working hard to get interpreters isolated from one another, especially to avoid problems that arise when they step on each others' toes. [1] (Isolation also provides new implementation opportunities.) For example, an object created in one interpreter should never be modified (even just the refcount) by another interpreter. [2] In the case of syslog, the object in the static variable (set in syslog.openlog()) should be managed only by the interpreter that owns it. Here are ways to enforce this:

  • the interpreter that calls openlog() should be the only one allowed to call closelog() (simplest way is to restrict to the main interpreter)
  • if another interpreter calls closelog(), then it would call Py_AddPendingCall() to have the owning interpreter decref the string

Obviously we went with the simplest form of the first one.

On top of the existing isolation concerns, there are additional ones that become more complicated under a per-interpreter GIL due to thread-safety. These are a non-issue if we restrict to a per-interpreter GIL.

[1] Such problems would be amplified by a per-interpreter GIL (see PEP 684, which has not been accepted yet), but they still exist even now.
[2] Except for a small set of objects managed by the process-global runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, if the concern is backward compatibility, we can easily apply the restriction only if the subinterpreter was created via Py_NewInterpreter() (or, under a per-interpreter GIL, is otherwise sharing the GIL with the main interpreter).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling openlog (and closelog) is optional in syslog(3) with sensible defaults. There is no need to call openlog at all unless a program wants to change some of the default settings, and that could be done outside of the Python world.

For a use-case like embedding Python in a web server (like mod_python it is more likely that the embedding program does the syslog setup than that this is done in Python code.

FWIW, both concerns are valid, but pre-date this PR.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some recommendations for the docs.

Doc/library/syslog.rst Outdated Show resolved Hide resolved
Doc/library/syslog.rst Outdated Show resolved Hide resolved
Doc/library/syslog.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.12.rst Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
@corona10
Copy link
Member Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ericsnowcurrently, @vstinner: please review the changes made to this pull request.

@vstinner
Copy link
Member

@ronaldoussoren:

Why can syslog.syslog only be used when the main interpreter has called syslog.openlog? That's a restriction that is not present in the underlying library.

The limitation comes from Python which stores a Python object in a global variable:

/*  only one instance, only one syslog, so globals should be ok  */
static PyObject *S_ident_o = NULL;                      /*  identifier, held by openlog()  */
static char S_log_open = 0;

We must hold a strong reference because we pass a pointer into the internal UTF-8 encoded flavor of the Unicode string, PyUnicode_AsUTF8():

    /* At this point, ident should be INCREF()ed.  openlog(3) does not
     * make a copy, and syslog(3) later uses it.  We can't garbagecollect it.
     * If NULL, just let openlog figure it out (probably using C argv[0]).
     */
    if (ident) {
        ident_str = PyUnicode_AsUTF8(ident);
        if (ident_str == NULL) {
            Py_DECREF(ident);
            return NULL;
        }
    }

I understand that openlog() doesn't copy the ident string. Maybe this assumption is wrong and the whole change is not needed. I don't know. I don't want to check the implementation of openlog() on all supported platforms, and run a real test.

S_ident_o global variable was added in 1998 by commit ae94cf2. Previously, it was a static PyObject *ident_o = NULL; in openlog() added in 1995 by commit c1822a4.

Ok, I'm curious and I looked into the GNU glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=misc/syslog.c;h=f67d4b58a448cabdf63f629f72a9df6e009286cf;hb=HEAD#l295

The ident string is not copied, only the pointer is copied (LogTag = ident;). So if the pointer becomes a dangling pointer, syslog() will just crash when reading ident (LogTag)

@vstinner
Copy link
Member

One problem with sub-interpreters is that Python objects should not travel from one interpreter to another. But static PyObject *S_ident_o can become a memory copy of the UTF-8 encoded string instead, so we don't pass objects anymore.

For closelog(), I proposed earlier a counter increased by syslog.openlog() and decreased by syslog.closelog(). The C function closelog() would only be called when the counter reach zero. One problem is if two interepreters use different identifiers, the latest call to syslog.openlog() wins: it overrides the previous ident (and log level).

os.chdir() has a similar problem: it's a process-wide parameter affecting all threads and all interpreters. But it's a simpler problem. The current working directory is stored in the kernel, not in Python, and no resource have to be released explicitly at exit :-)

@serhiy-storchaka
Copy link
Member

Agree, keeping a copy of ident in the raw dynamically allocated array will resolve the issue.

But I do not think we need a reference counter. Just call closelog() and clear the saved ident string. If your program calls syslog.closelog() multiple times, it perhaps has a trouble. But at least this will not cause a crash.

@ericsnowcurrently
Copy link
Member

What about keeping a static buffer and copying the str object's C string into it, then giving the buffer to openlog()? Then there's no deallocation to fuss with, we can drop S_log_open and both syslog.syslog() and syslog.closelog() become simpler. I'm pretty sure the only downside is that we would introduce a hard-limit on the length of the identifier.

#define MAX_IDENT 256
static char S_ident[MAX_IDENT+1];

...

static PyObject *
syslog_openlog_impl(...)
{
    ....
    const char *ident_str = NULL;
    if (ident != NULL) {
        Py_ssize_t size = PyUnicode_GET_LENGTH(ident);
        if (size > MAX_IDENT) {
            PyErr_SetString(...);
            return NULL;
        }
        else if (size > 0) {
            strncpy(S_ident, PyUnicode_AsUTF8(ident), size);
            ident_str = S_ident;
        }
    }
    openlog(ident_str, logopt, facility);
    Py_RETURN_NONE;
}

static PyObject *
syslog_syslog_impl(...)
{
    if (PySys_Audit("syslog.syslog", "is", priority, message) < 0) {
        ...
    }

#ifdef __APPLE__
    // gh-98178: On macOS, libc syslog() is not thread-safe
    syslog(priority, "%s", message);
#else
    Py_BEGIN_ALLOW_THREADS;
    syslog(priority, "%s", message);
    Py_END_ALLOW_THREADS;
#endif
    Py_RETURN_NONE;
}

static PyObject *
syslog_closelog_impl(PyObject *module)
{
    ...
    closelog();
    Py_RETURN_NONE;
}

There would still be a race under per-interpreter GIL, though, if subinterpreters are allowed to call syslog.openlog() and we don't have a process-global lock for the module.

@corona10
Copy link
Member Author

@ronaldoussoren @vstinner @ericsnowcurrently

As the first step for the subinterpreter project, I would like to suggest maintaining restrictions for subinterpreter as the current PR.
If there are requests which require allowing all of the features from subinterpreter side, let's discuss the way to support it.
Eric's suggestion might be a good way to solve it but let's delay the decision for this moment.

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This is a good baseline. I'm okay with circling back with improvements (or loosening restrictions) afterward.

@corona10 corona10 merged commit 8bb2303 into python:main Nov 29, 2022
@corona10 corona10 deleted the gh-99127 branch November 29, 2022 22:58
carljm added a commit to carljm/cpython that referenced this pull request Dec 1, 2022
* main: (112 commits)
  pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)
  pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613)
  Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)
  pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)
  pythongh-89189: More compact range iterator (pythonGH-27986)
  bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)
  pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)
  pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)
  pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)
  pythonGH-99877)
  Fix typo in exception message in `multiprocessing.pool` (python#99900)
  pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)
  pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)
  pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)
  pythonGH-81057: remove static state from suggestions.c (python#99411)
  Improve zip64 limit error message (python#95892)
  pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)
  pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)
  pythongh-82836: fix private network check (python#97733)
  Docs: improve accuracy of socketserver reference (python#24767)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants