Text plugin markdown encoding is slow #5369

stephanwlee · 2021-10-09T02:32:29Z

TensorBoard currently, in its Python process, encode a text tensor into a markdown (and sometimes to a table).

Recently, we have noticed that it is notably slower on large tensors which we have attributed largely to the markdown parser. While debugging, there was one particular hotspot which we have noticed--hint: it is not the markdown parser!

Test script

Faster

import time
from tensorboard import plugin_util

for i in range(5000, 50000, 5000):
    start = time.time()
    plugin_util.markdown_to_safe_html(
        ("[" + " ".join([str(i) for i in range(i)]) + "]")
    )
    end = time.time()
    print("Converting %d length string took %f" % (i, end - start))

# 
# Converting 5000 length string took 0.016349
# Converting 10000 length string took 0.033403
# Converting 15000 length string took 0.053684
# Converting 20000 length string took 0.070949
# Converting 25000 length string took 0.090817
# Converting 30000 length string took 0.108568
# Converting 35000 length string took 0.129561
# Converting 40000 length string took 0.147728
# Converting 45000 length string took 0.169784

Slower

import time
from tensorboard import plugin_util

for i in range(5000, 50000, 5000):
    start = time.time()
    plugin_util.markdown_to_safe_html(
        ("[" + " ".join([str(i) for i in range(i)]) + "]").encode("utf8")
    )
    end = time.time()
    print("Converting %d length string took %f" % (i, end - start))

# 
# Converting 5000 length string took 0.110732
# Converting 10000 length string took 0.428144
# Converting 15000 length string took 0.969050
# Converting 20000 length string took 1.694072
# Converting 25000 length string took 2.639599
# Converting 30000 length string took 3.787034
# Converting 35000 length string took 5.172249
# Converting 40000 length string took 6.771216
# Converting 45000 length string took 8.655072

When expected further, it looks like one innocuous line is contributing to the most of the time spent in the binary mode.

tensorboard/tensorboard/plugin_util.py

Line 138 in 65371ef

source = source_decoded.replace("\x00", "")

For 45000 length string, specifically, we have spent about 7.8s on just replacement which is quite surprising to me.

Decode: 0.140072,
Plus Replace: 7.920270
Plus Convert: 7.945538

In fact, when removed the replace line, I was able to restore the comparable performance.

Now, this is NOT the behavior I get when tested under the hosted service so there might be something that I am missing with this test set or may be because of version skew of markdown library we are using.

Interestingly, when I pinned to markdown==2.6.9, it exhibited wildly different behavior where markdown conversion took a very long time while spending virtually no time on replace.

Converting 5000 length string took 6.921765
Converting 10000 length string took 29.544734
Converting 15000 length string took 77.178752
Converting 20000 length string took 147.837926
Converting 25000 length string took 229.252741
Converting 30000 length string took 338.251870
Converting 35000 length string took 470.788789
Converting 40000 length string took 624.641989
Converting 45000 length string took 835.322498

At this point, I am suspecting that old version of markdown library has performance problem and it might be patching replace function that it had caused unwanted side effects; more updates after experiments to follow.

Now, I can no longer reproduce the issue where Python took a long time replacing a null character from the string.

At this point, perhaps we can simply conclude that the performance issue can be solved by upgrading markdown parser library internally.

The text was updated successfully, but these errors were encountered:

UsharaniPagadala · 2021-10-27T05:38:42Z

@stephanwlee
Closing this issue as the PR is already merged.Thanks

nfelt · 2021-10-27T17:32:59Z

This issue was intentionally left open because the PR doesn't fully address the issue; it is only a workaround.

stephanwlee · 2021-10-27T17:35:47Z

FWIW, I am quite okay with closing the issue since the issue is less severe in OSS using the newer version of python-markdown package.

nfelt · 2021-10-27T18:24:26Z

Well, we don't actually require a newer version of the package though, so this still seems like an issue for OSS users. But ok, if you are fine closing it we can close it.

UsharaniPagadala assigned UsharaniPagadala and unassigned UsharaniPagadala Oct 11, 2021

nfelt mentioned this issue Oct 14, 2021

text: add checkbox to control server-side markdown conversion #5378

Merged

UsharaniPagadala assigned UsharaniPagadala and unassigned UsharaniPagadala Oct 21, 2021

UsharaniPagadala added the type:feature label Oct 27, 2021

UsharaniPagadala closed this as completed Oct 27, 2021

nfelt reopened this Oct 27, 2021

nfelt added plugin:text theme:performance Performance, scalability, large data sizes, slowness, etc. type:bug and removed type:feature labels Oct 27, 2021

nfelt closed this as completed Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text plugin markdown encoding is slow #5369

Text plugin markdown encoding is slow #5369

stephanwlee commented Oct 9, 2021 •

edited

Loading

Faster

Slower

UsharaniPagadala commented Oct 27, 2021

nfelt commented Oct 27, 2021

stephanwlee commented Oct 27, 2021

nfelt commented Oct 27, 2021

Text plugin markdown encoding is slow #5369

Text plugin markdown encoding is slow #5369

Comments

stephanwlee commented Oct 9, 2021 • edited Loading

Faster

Slower

UsharaniPagadala commented Oct 27, 2021

nfelt commented Oct 27, 2021

stephanwlee commented Oct 27, 2021

nfelt commented Oct 27, 2021

stephanwlee commented Oct 9, 2021 •

edited

Loading