Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore markdown codeblocks for tags #66

Merged
merged 3 commits into from
May 21, 2023

Conversation

elmodor
Copy link
Contributor

@elmodor elmodor commented May 8, 2023

Hello,

I took a quick look on how to exclude tags inside markdown codeblocks.
This should help out for #35

I'm using a second regex to remove all markdown codeblocks.
However, this will also remove the markdown from the return value of extract_tags.

I tested the code with various markdown codeblocks. The tag #TagIgnore does not show up when searching for tags:tagignore while tags:tagok shows up.
Tested with following file:

Testing

Testing more
#TagIgnore me please
and this one too #TagIgnore

#TagOk

More to #TagIgnore
One more to #TagIgnore
#TagIgnore

What is going on? #TagOk
EOF

@dullage
Copy link
Owner

dullage commented May 9, 2023

Thanks for giving this a go. When I get some time I'll take a look and run some tests 👍

@dullage
Copy link
Owner

dullage commented May 13, 2023

Hi @elmodor, I've had a look at this this morning and have some thoughts.

My first thought is that MARKDOWN_RE should probably be named CODEBLOCK_RE and content_ex_markdown should be content_ex_codeblocks. A small point but help with the diagram below.

I also think that codeblock content should still be indexed and therefore searchable. Just not as tags. So I think we need to something along these lines:

image

@elmodor
Copy link
Contributor Author

elmodor commented May 13, 2023

I agree, I should have named it codeblock obviously.

Regarding your diagram, I think there should not be a line from tags to content_ex_tags?
As far as I see, we would use content_ex_codeblocks to obtain the tags but use content to obtain content_ex_tags just as before?

content_ex_codeblock = re.sub(cls.CODEBLOCK_RE, '', content)
_, tags = re_extract(cls.TAGS_RE, content_ex_codeblock)
content_ex_tags, _ = re_extract(cls.TAGS_RE, content)

@dullage
Copy link
Owner

dullage commented May 13, 2023

I had in mind to use tags to know what to remove from content to get content_ex_tags. We could, as you propose, just use the TAGS_RE again but this would mean that comments in codeblocks wouldn't get indexed at all (either as content or tags).

Of course, I can imagine the most efficient way of doing all of this would be to update TAGS_RE to simply ignore anything within a codeblock. Then we'd only need to run regex once (as it is now). My concern would be that the TAGS_RE would become unwieldy (even more than it is now). So splitting up the process as proposed above may be the better (more understandable) way of doing things. Even if not the most efficient.

@elmodor
Copy link
Contributor Author

elmodor commented May 20, 2023

I wasn't able to get one line TAGS_RE to work so I updated it to my last suggested code change.
If someone can wrap their head around the regex maybe they will be able to figure it out :D

@dullage
Copy link
Owner

dullage commented May 21, 2023

The changes look good. When I get a chance I just want to run a couple of tests on my dataset. I'm interested to see if there's much of a change to the indexing speed and I'm also keen to see the difference in tags before and after this change.

Copy link
Owner

@dullage dullage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please bump INDEX_SCHEMA_VERSION constant to 4. This will force an index rebuild when flatnotes is updated.

@dullage
Copy link
Owner

dullage commented May 21, 2023

Index rebuild times seem unchanged and having compared my list of indexed tags before and after the changes, I can confirm the hashtags found in code blocks are no longer indexed. Just one change requested and then this is ok to merge.

@elmodor elmodor requested a review from dullage May 21, 2023 16:08
@dullage dullage merged commit 9d24d42 into dullage:master May 21, 2023
@elmodor elmodor deleted the tags_ignore_markdown branch May 25, 2023 16:41
Gedulis12 pushed a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
* Ignore markdown codeblocks for tags

* Changed `content_ex_tags` to use `content` instead of `content_ex_codeblock`

* Bumped `INDEX_SCHEMA_VERSION`
Gedulis12 added a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
Gedulis12 added a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
Gedulis12 added a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
* Revert "3.2.2"

This reverts commit 6b601fe.

* Revert "Corrected group variable typo in entrypoint.sh"

This reverts commit 3cb86a1.

* Revert "Updated README to resolve dullage#100"

This reverts commit 05fc652.

* Revert "3.2.1"

This reverts commit 4651830.

* Revert "Prevent recently modified from showing in read_only mode"

This reverts commit a3aa2a8.

* Revert "Various config settings are no longer required in read_only mode"

This reverts commit 3535978.

* Revert "3.2.0"

This reverts commit ee328bc.

* Revert "Updated README"

This reverts commit 253155a.

* Revert "Updated README"

This reverts commit b731a1c.

* Revert "Implement read-only mode"

This reverts commit 608a414.

* Revert "Fix theme issues. Resolves dullage#94."

This reverts commit 20725e8.

* Revert "Added _edit_only auth types"

This reverts commit 0e2a3f5.

* Revert "Remove async from path functions"

This reverts commit 161978c.

* Revert "3.1.1"

This reverts commit 8881755.

* Revert "Fix issue clicking links in unordered lists"

This reverts commit c3a073d.

* Revert "Disable checkboxes in view mode. Resolves dullage#81."

This reverts commit 1548b66.

* Revert "Formatting changes only"

This reverts commit 3aeb903.

* Revert "Assign ID to headers with CustomHTMLRenderer (dullage#84)"

This reverts commit 26e6b56.

* Revert "Ensure root knows when note title changes. Fixes dullage#82."

This reverts commit 851c640.

* Revert "Override default font-family for code blocks to fix dullage#78"

This reverts commit f9af501.

* Revert "Added VOLUME and EXPOSE keywords to Dockerfile"

This reverts commit 44e2b7a.

* Revert "3.1.0"

This reverts commit 8aa641c.

* Revert "Enabled extended autolinks"

This reverts commit 4b477f6.

* Revert "Ignore markdown codeblocks for tags (dullage#66)"

This reverts commit f7d206c.

* Revert "Fix get_tags method"

This reverts commit 99c2909.

* Revert "Updated README"

This reverts commit 39d54fc.

* Revert "3.0.3"

This reverts commit 5acf117.

* Revert "Secret key is now printed to console if using TOTP allowing for manual entry"

This reverts commit 5b74804.

* Revert "3.0.2"

This reverts commit 15d93af.

* Revert "Table headers in WYSIWYG editor now honour theme color"

This reverts commit 050b6d8.

* Revert "PR comments"

This reverts commit efdf4bf.

* Revert "Update site.webmanifest"

This reverts commit 285ab5d.
Gedulis12 pushed a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
* Ignore markdown codeblocks for tags

* Changed `content_ex_tags` to use `content` instead of `content_ex_codeblock`

* Bumped `INDEX_SCHEMA_VERSION`
Gedulis12 pushed a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
3.2.2

Revert "Corrected group variable typo in entrypoint.sh"

This reverts commit 3cb86a1.

Revert "Updated README to resolve dullage#100"

This reverts commit 05fc652.

Revert "3.2.1"

This reverts commit 4651830.

Revert "Prevent recently modified from showing in read_only mode"

This reverts commit a3aa2a8.

Revert "Various config settings are no longer required in read_only mode"

This reverts commit 3535978.

Revert "3.2.0"

This reverts commit ee328bc.

Revert "Updated README"

This reverts commit 253155a.

Revert "Updated README"

This reverts commit b731a1c.

Revert "Implement read-only mode"

This reverts commit 608a414.

Revert "Fix theme issues. Resolves dullage#94."

This reverts commit 20725e8.

Revert "Added _edit_only auth types"

This reverts commit 0e2a3f5.

Revert "Remove async from path functions"

This reverts commit 161978c.

Revert "3.1.1"

This reverts commit 8881755.

Revert "Fix issue clicking links in unordered lists"

This reverts commit c3a073d.

Revert "Disable checkboxes in view mode. Resolves dullage#81."

This reverts commit 1548b66.

Revert "Formatting changes only"

This reverts commit 3aeb903.

Revert "Assign ID to headers with CustomHTMLRenderer (dullage#84)"

This reverts commit 26e6b56.

Revert "Ensure root knows when note title changes. Fixes dullage#82."

This reverts commit 851c640.

Revert "Override default font-family for code blocks to fix dullage#78"

This reverts commit f9af501.

Revert "Added VOLUME and EXPOSE keywords to Dockerfile"

This reverts commit 44e2b7a.

Revert "3.1.0"

This reverts commit 8aa641c.

Revert "Enabled extended autolinks"

This reverts commit 4b477f6.

Revert "Ignore markdown codeblocks for tags (dullage#66)"

This reverts commit f7d206c.

Revert "Fix get_tags method"

This reverts commit 99c2909.

Revert "Updated README"

This reverts commit 39d54fc.

Revert "3.0.3"

This reverts commit 5acf117.

Revert "Secret key is now printed to console if using TOTP allowing for manual entry"

This reverts commit 5b74804.

Revert "3.0.2"

This reverts commit 15d93af.

Revert "Table headers in WYSIWYG editor now honour theme color"

This reverts commit 050b6d8.

Revert "PR comments"

This reverts commit efdf4bf.

Revert "Update site.webmanifest"

This reverts commit 285ab5d.

Update site.webmanifest

PR comments

Table headers in WYSIWYG editor now honour theme color

3.0.2

Secret key is now printed to console if using TOTP allowing for manual entry

3.0.3

Updated README

Fix get_tags method

Ignore markdown codeblocks for tags (dullage#66)

* Ignore markdown codeblocks for tags

* Changed `content_ex_tags` to use `content` instead of `content_ex_codeblock`

* Bumped `INDEX_SCHEMA_VERSION`

Enabled extended autolinks

3.1.0

Added VOLUME and EXPOSE keywords to Dockerfile

Override default font-family for code blocks to fix dullage#78

Ensure root knows when note title changes. Fixes dullage#82.

Assign ID to headers with CustomHTMLRenderer (dullage#84)

Formatting changes only

Disable checkboxes in view mode. Resolves dullage#81.

Fix issue clicking links in unordered lists

3.1.1

Remove async from path functions

Added _edit_only auth types

Fix theme issues. Resolves dullage#94.

Implement read-only mode

Updated README

Updated README

3.2.0

Various config settings are no longer required in read_only mode

Prevent recently modified from showing in read_only mode

3.2.1

Updated README to resolve dullage#100

Corrected group variable typo in entrypoint.sh

3.2.2

Revert "Master" (#3)

* Revert "3.2.2"

This reverts commit 6b601fe.

* Revert "Corrected group variable typo in entrypoint.sh"

This reverts commit 3cb86a1.

* Revert "Updated README to resolve dullage#100"

This reverts commit 05fc652.

* Revert "3.2.1"

This reverts commit 4651830.

* Revert "Prevent recently modified from showing in read_only mode"

This reverts commit a3aa2a8.

* Revert "Various config settings are no longer required in read_only mode"

This reverts commit 3535978.

* Revert "3.2.0"

This reverts commit ee328bc.

* Revert "Updated README"

This reverts commit 253155a.

* Revert "Updated README"

This reverts commit b731a1c.

* Revert "Implement read-only mode"

This reverts commit 608a414.

* Revert "Fix theme issues. Resolves dullage#94."

This reverts commit 20725e8.

* Revert "Added _edit_only auth types"

This reverts commit 0e2a3f5.

* Revert "Remove async from path functions"

This reverts commit 161978c.

* Revert "3.1.1"

This reverts commit 8881755.

* Revert "Fix issue clicking links in unordered lists"

This reverts commit c3a073d.

* Revert "Disable checkboxes in view mode. Resolves dullage#81."

This reverts commit 1548b66.

* Revert "Formatting changes only"

This reverts commit 3aeb903.

* Revert "Assign ID to headers with CustomHTMLRenderer (dullage#84)"

This reverts commit 26e6b56.

* Revert "Ensure root knows when note title changes. Fixes dullage#82."

This reverts commit 851c640.

* Revert "Override default font-family for code blocks to fix dullage#78"

This reverts commit f9af501.

* Revert "Added VOLUME and EXPOSE keywords to Dockerfile"

This reverts commit 44e2b7a.

* Revert "3.1.0"

This reverts commit 8aa641c.

* Revert "Enabled extended autolinks"

This reverts commit 4b477f6.

* Revert "Ignore markdown codeblocks for tags (dullage#66)"

This reverts commit f7d206c.

* Revert "Fix get_tags method"

This reverts commit 99c2909.

* Revert "Updated README"

This reverts commit 39d54fc.

* Revert "3.0.3"

This reverts commit 5acf117.

* Revert "Secret key is now printed to console if using TOTP allowing for manual entry"

This reverts commit 5b74804.

* Revert "3.0.2"

This reverts commit 15d93af.

* Revert "Table headers in WYSIWYG editor now honour theme color"

This reverts commit 050b6d8.

* Revert "PR comments"

This reverts commit efdf4bf.

* Revert "Update site.webmanifest"

This reverts commit 285ab5d.

Revert "Revert "Master" (#3)"

This reverts commit 44e3fc4.
Gedulis12 added a commit to Gedulis12/flatnotes that referenced this pull request Aug 7, 2023
* Update site.webmanifest

* PR comments

* Table headers in WYSIWYG editor now honour theme color

* 3.0.2

* Secret key is now printed to console if using TOTP allowing for manual entry

* 3.0.3

* Updated README

* Fix get_tags method

* Ignore markdown codeblocks for tags (dullage#66)

* Ignore markdown codeblocks for tags

* Changed `content_ex_tags` to use `content` instead of `content_ex_codeblock`

* Bumped `INDEX_SCHEMA_VERSION`

* Enabled extended autolinks

* 3.1.0

* Added VOLUME and EXPOSE keywords to Dockerfile

* Override default font-family for code blocks to fix dullage#78

* Ensure root knows when note title changes. Fixes dullage#82.

* Assign ID to headers with CustomHTMLRenderer (dullage#84)

* Formatting changes only

* Disable checkboxes in view mode. Resolves dullage#81.

* Fix issue clicking links in unordered lists

* 3.1.1

* Remove async from path functions

* Added _edit_only auth types

* Fix theme issues. Resolves dullage#94.

* Implement read-only mode

* Updated README

* Updated README

* 3.2.0

* Various config settings are no longer required in read_only mode

* Prevent recently modified from showing in read_only mode

* 3.2.1

* Updated README to resolve dullage#100

* Corrected group variable typo in entrypoint.sh

* 3.2.2

---------

Co-authored-by: Abraham Elias <46859202+abe6@users.noreply.github.com>
Co-authored-by: Abraham Elias <abraham@manjaro>
Co-authored-by: Adam Dullage <adam@dullage.com>
Co-authored-by: Adam Dullage <adam.dullage@itris.co.uk>
Co-authored-by: elmodor <elmodor@users.noreply.github.com>
Co-authored-by: Pietro Bonaldo Gregori <45976792+pbogre@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants