Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make org format customizable #8

Open
ag91 opened this issue Oct 1, 2017 · 3 comments
Open

Make org format customizable #8

ag91 opened this issue Oct 1, 2017 · 3 comments

Comments

@ag91
Copy link

ag91 commented Oct 1, 2017

Hello,

Thanks very much for the great package: I use it with org-feed and finally getting feeds content is much more reliable!

About the issue: currently downloading a link results in something like

  • [[link][title]] :website:
    timestamp
    ** Article
    contents

For my use case I do not need the ** Article heading.
This is enforced in org-web-tools--url-as-readable-org in this bit here:

...
    (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert converted)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string))))

I have the feeling that this can be abstracted in a function format article-contents which defaults to your template, but that can be configured by the user. Something along the lines of:

...
    (format converted))))

(defun format (contents)
  "formats the article contents with title, timestamp, article heading"
  (with-temp-buffer
      (org-mode)
      ;; Insert article text
      (insert contents)
      ;; Demote in-article headings
      (org-web-tools--demote-headings-below 2)
      ;; Insert headings at top
      (goto-char (point-min))
      (insert "* " link " :website:" "\n\n"
              timestamp "\n\n"
              "** Article" "\n\n")
      (buffer-string)))

Would that make sense? For now I am using a modified version of org-web-tools--url-as-readable-org, but I really would like to not miss any future enhancement of this nice package :)
Thanks very much for the time spent in this!

@alphapapa
Copy link
Owner

Hi there,

Thanks very much for the great package: I use it with org-feed and finally getting feeds content is much more reliable!

That's very interesting! I wasn't aware of org-feed. That is very interesting. So you use that as a feed reader, like instead of elfeed or something else? I'd never thought of that.

I have the feeling that this can be abstracted in a function format article-contents which defaults to your template, but that can be configured by the user.

Yeah, that makes sense. The code that manipulates the contents after insertion can be moved into a function and called with a hook.

Thanks very much for the time spent in this!

Thanks for your feedback! I will try to get to this soon. :)

@alphapapa alphapapa self-assigned this Oct 2, 2017
@ag91
Copy link
Author

ag91 commented Oct 2, 2017

Hi,
yes, I do. I like to read on my ereader and with a bit of set up you can convert the org-feed file into an epub (or whatever you like).
I very much appreciate elfeed, but I found easier to hack/extend org-feed with what I needed.

Thanks again for the work on this!

P.S:

The bit of my init that does that sets up org-feed (it is hacky -- I changed guid to be the article weblink):

(defun my/org-feed-parse-rss-feed (buffer)
    "Parse BUFFER for RSS feed entries.
     Returns a list of entries, with each entry a property list,
     containing the properties `:guid' and `:item-full-text'."
    (require 'xml)
    (let ((case-fold-search t)
          entries beg end item guid entry)
      (with-current-buffer buffer
        (widen)
        (goto-char (point-min))
        (while (re-search-forward "<item\\>.*?>" nil t)
          (setq beg (point)
                end (and (re-search-forward "</item>" nil t)
                         (match-beginning 0)))
          (setq item (buffer-substring beg end)
                guid (if (string-match "<link\\>.*?>\\(.*?\\)</link>" item) ;; we use the link instead as guid
                         (xml-substitute-special (match-string-no-properties 1 item))))
          (message "%s" (concat "the guid-link is:" guid))
          (setq entry (list :guid guid :item-full-text item))
          (push entry entries)
          (widen)
          (goto-char end))
        (nreverse entries))))
(defun my/org-feed-parse-rss-entry (entry)
  "Parse the `:item-full-text' field for xml tags and create new properties."
  (require 'xml)
  (let ((guid (plist-get entry :guid)))
    (with-temp-buffer
    (insert (plist-get entry :item-full-text))
    (goto-char (point-min))
    (while (re-search-forward "<\\([a-zA-Z]+\\>\\).*?>\\([^\000]*?\\)</\\1>"
			      nil t)
      (setq entry (plist-put entry
			     (intern (concat ":" (match-string 1)))
			     (xml-substitute-special (match-string 2))))
      (setq entry (plist-put entry
			     :guid
			     guid)))
    (goto-char (point-min))
    ))
  entry)
(defun my/org-get-content-html-as-org (url)
    "Returns the contents of URL as org mode without the heading"
    (if (not (string-equal (file-name-extension url) "pdf")) ;; we exclude the download of pdfs because we do not need them
        (condition-case err
            (s-join "\n" (cdr (cdr (s-lines (org-web-tools--url-as-readable-org url)))))
          (error (concat "Org-web-tools failed with: " (error-message-string err))))
      "This was not a html page."))
(defun my/get-feed-content (new)
    "Adds the contents of the article (grabbing the html page and
       converting it to org) in the description of the feed."
    (progn
      (setq new-formatted
            (mapcar
             (lambda (e)
               (progn
                 (setq article-contents
                       (org-get-content-html-as-org (plist-get e :link)))
                 (setq e1 (plist-put e :description article-contents))
                 (org-feed-format-entry e1 my-for-org-feed/tag-template nil)))
             new))
      (org-feed-add-items (point) new-formatted)))

  (setq org-feed-alist
        `(
          ("Hacker News"
           "https://news.ycombinator.com/rss"
           "/tmp/Feeds.org" "Hacker News"
           :parse-feed my/org-feed-parse-rss-feed
           :parse-entry my/org-feed-parse-rss-entry
           :new-handler my/get-feed-content
           ))

@alphapapa alphapapa removed their assignment Dec 20, 2023
@alphapapa alphapapa added this to the Future milestone Dec 20, 2023
@alphapapa
Copy link
Owner

I don't plan to work on this myself, but if someone else is interested in contributing it, I'll be glad to consider merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants