Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📦[WIP]: generate index from HTML meta #3

Merged
merged 7 commits into from
Apr 20, 2020
Merged

📦[WIP]: generate index from HTML meta #3

merged 7 commits into from
Apr 20, 2020

Conversation

hypervillain
Copy link
Collaborator

@hypervillain hypervillain commented Apr 13, 2020

This PR replaces the currently generated searchable text with an array of weighted searchable items. It adds these keys to each item:

  • title, extracted from tag
  • description, extracted from tag
  • keywords, extracted from tag and transformed to array
  • headings, an array of headings found in document
  • text, a lightweight version of previous html-to-text output
  • image, image url extracted from og:image meta
  • from, an array of folders and subfolders the file is stored in

These keys are then ordered by weight (see Fuse weighted search) and passed to generated function. Note that these are default values and should be overridable by the user.

Added config option: an ignore list of paths to be dismissed from search. I'm also planning to use the from key to filter out the results, in order for the user to build features like "search in [...]"

Directory read is now handled by globby.

About replacing html-to-text with retext: I mainly used it to keep the unifiedJS processor approach, but also because it could open the plugin to more features, eg. retext-keywords. I don't know exactly the pros html-to-text, so maybe it does things better.. Let me know!

Not everything has been carefully tested yet, but you can preview the result right away by running the (light) test suite 😊

@swyxio
Copy link
Owner

swyxio commented Apr 13, 2020

super fancy! i like it a lot.

is this good to merge or you still have some stuff todo? pls feel free to merge on your own but happy to give any opinions u want. its your project now as well as mine. you can also publish on npm now

@hypervillain
Copy link
Collaborator Author

Awesome, that's really nice of you 🙌
I'll clean things up and hit merge.

Also, 2 last small things:

  • having some issue with 2FA atm, so I'll (kindly) ask you to publish on NPM
  • do you know what's up with the Netlify build issue?

@swyxio
Copy link
Owner

swyxio commented Apr 13, 2020

ah. in order to get this to work i had to patch read-dir using patch-package
image

(i documented this in the readme i think, somewhere in the project)

read-dir was a dep of copy-template-dir. which isnt well maintained. i was already leaning towards ripping it out but it does exactly what i needed it to do. up to you on how you want to treat this dependency.. we might just need a better altnerative

@hypervillain
Copy link
Collaborator Author

Hey 👋

I think I'm done with the PR, @sw-yx!
Since your last read, I

  • replaced read-dir with globby
  • got rid of patch-package
  • renamed ignore argument into exclude
  • added an image field to each item created
  • found a bug with Netlify build and reported it here

Feel free to review and merge if you will!
Hugo

@swyxio swyxio merged commit 49f6e70 into master Apr 20, 2020
@swyxio swyxio deleted the html-parse-meta branch April 20, 2020 08:02
@swyxio
Copy link
Owner

swyxio commented Apr 20, 2020

published as v0.1.0! havent tested myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants