Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do NOT use logger .disabled attribute until python 3.8 #389

Open
Mityuha opened this issue Jul 17, 2020 · 2 comments
Open

Do NOT use logger .disabled attribute until python 3.8 #389

Mityuha opened this issue Jul 17, 2020 · 2 comments

Comments

@Mityuha
Copy link

Mityuha commented Jul 17, 2020

Describe the issue

There is a method called isEnabledFor() inside python logging package's Logger class. Since python 3.8 this method takes into account self.disabled attrubute:
https://github.com/python/cpython/blob/3.8/Lib/logging/__init__.py#L1681
But before python 3.8 (e.g. 3.6) it does not:
https://github.com/python/cpython/blob/3.6/Lib/logging/__init__.py#L1681
What all written above does mean for vertica-python library?
Cursor iterate() and as a result fetchall() methods use fetchone() method under the hood. So the latter calls connection's read_message() method.
By default, there are no options to connection options, so the next line inside connection ctor executes:

self._logger.disabled = True

And the next line inside read_message() method, that simple does log, works good since python 3.8 (I suggest):

self._logger.debug('<= %s', message)

because it almost immediately return.
But! We still remember, that before python 3.8 Logger's disable attribute does not make sense inside Logger's isEnabledFor().
It does take into account in other place, e.g. for python 3.6:
https://github.com/python/cpython/blob/3.6/Lib/logging/__init__.py#L1453
But there is a huge overhead before this line executes. The cProfile log (sorted by cumtime)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4164489   11.793    0.000  167.207    0.000 connection.py:597(read_message)
        1    0.710    0.710  160.348  160.348 cursor.py:302(fetchall)
  4164483    1.966    0.000  159.638    0.000 cursor.py:364(iterate)
  4164483    5.852    0.000  157.672    0.000 cursor.py:262(fetchone)
  4164513    4.173    0.000   83.062    0.000 __init__.py:1286(debug)
  4164519    4.590    0.000   75.693    0.000 __init__.py:1421(_log)
  4164519    2.885    0.000   58.682    0.000 __init__.py:1406(makeRecord)
  4164519   22.484    0.000   55.797    0.000 __init__.py:251(__init__)
 12493494   11.139    0.000   52.267    0.000 connection.py:639(read_bytes)
...

We see that logging takes 83 seconds, albeit it disabled.
And if I pass log_level=logging.WARNING (best with log_path=None) to Connection options, profiling now looks like this:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4164489    9.334    0.000   70.271    0.000 connection.py:597(read_message)
        1    0.521    0.521   65.378   65.378 cursor.py:302(fetchall)
  4164483    1.508    0.000   64.857    0.000 cursor.py:364(iterate)
  4164483    4.869    0.000   63.349    0.000 cursor.py:262(fetchone)
 12493494    8.735    0.000   42.143    0.000 connection.py:639(read_bytes)
  4164501   24.965    0.000   24.965    0.000 {method 'recv' of '_socket.socket' objects}
...

As we see, no logging overhead. It looks nice.

To reproduce

with vertica_python.connect(**conn_info) as conn:
    cursor = conn.cursor()
    cursor.execute(sql)
    all_data = cursor.fetchall()

Workaround (before python 3.8)

with vertica_python.connect(**conn_info, log_level=logging.WARNING, log_path=None) as conn:
    cursor = conn.cursor()
    cursor.execute(sql)
    all_data = cursor.fetchall()

Your version of the Python

python3 --version
Python 3.6.9

Your version of vertica-python

python3 -m pip show vertica-python
Name: vertica-python
Version: 0.10.4
Summary: A native Python client for the Vertica database.
Home-page: https://github.com/vertica/vertica-python
Author: Justin Berka, Alex Kim, Siting Ren
Author-email: justin.berka@gmail.com, alex.kim@uber.com, sitingren@hotmail.com
License: Apache License 2.0
Location: /home/dmitry/projects/cron-jobs/env/lib/python3.6/site-packages
Requires: python-dateutil, six
Required-by:

Additional context

Logging disable attribute
https://bugs.python.org/issue36318
Possibly connected with #146

@sitingren
Copy link
Member

@Mityuha Thanks for your report. I did some performance tests as you suggested on Py2.7, Py3.5, Py3.6, Py3.7, Py3.8. What I see is the performance improvement comes from the usage of cache since Py3.7. Not checking self.disabled attrubute in Py3.8.

cumulative time of logging/__init__.py(debug) for Py2.7, Py3.5, Py3.6, Py3.7, Py3.8 = 2 : 2 : 2 : 1 : 1

Passing log_level=logging.WARNING (best with log_path=None) to Connection options will reduce the cumulative time for about 10%. So Py2.7, Py3.5, Py3.6 still take much longer time than Py3.7 and Py3.8.

I also tried other two solutions for disable logging mentioned in https://bugs.python.org/issue36318. The performance (for all python versions) is even worse than using disable attribute.

So I think vertica-python is doing the right thing so far. There is no performance enhancement we can do.

@Mityuha
Copy link
Author

Mityuha commented Jul 29, 2020

@sitingren Thank you for your answer. It seems you are right about performance improvements since py3.7, but not since py3.8. Can you clarify some things?

  1. You say that "Passing log_level=logging.WARNING (best with log_path=None) to Connection options will reduce the cumulative time for about 10%". But my performance tests show >50% time reducing with 4164513 ncalls to logging. Can you attach some prove (source code, database scheme, record number to fetch, actual number of ncalls to logging etc)?
  2. You say "So Py2.7, Py3.5, Py3.6 still take much longer time than Py3.7 and Py3.8". Perhaps, you are right, because, there is nothing faster that just simple return call just like in Py3.7 and Py3.8. But everything is relative and I think that it is a thing. What I try to say is maybe setting logging level to logging.WARNING (e.g.) is still takes much longer that simple .disable=True since Py3.7, but it takes much less time than, for example, read_bytes() function from connection.py for all python version before Py3.7. See my logs. And as for point 1, can you show your performance tests' results?
  3. "The performance (for all python versions) is even worse than using disable attribute". Please, attach source code.
  4. "So I think vertica-python is doing the right thing so far". If I understood you right, you say something like "Yes, there are performance issues before Py3.7, just use python versions since 3.7". Correct me please if I'm wrong.

So the last but not least thing I wanted to mention is very similar to pp. 1 and 2. The idea is that vertica is performed "as storage platform designed to handle large volumes of data". The key is large volumes. You mentioned, that there is no performance improvements you can do. I totally agree that it might be for not large volume of data. Is it correct about such amount of data for which vertica stated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants