Roadmap

This file holds notes, ideas, sketches, brainstorming-output related to the project. This is the «personal playground» of Karl and his ideas on lazyblorg.

NOTE: Most headings that are marked as DONE or CANCELED are moved to the corresponding archive-file Roadmap.org_archive which can’t be redered at GitHub because GitHub does not recognize the file extension as Org-mode yet. You have to check out the Git source of the Wiki. This is a great idea anyway because of the limitations of the orgmode visualization of GitHub Wiki, this file is best viewed in GNU/Emacs directly (after downloading it): TODO keywords, tags, drawers,…

Roadmap of lazyblorg

spike: time-ordered-index data structure

State “DONE” from “STARTED” [2013-08-20 Tue 15:02]

problem:
- keep a sorted list of elements like [ [<time-stamp>,<id>], […] ]
http://wiki.python.org/moin/HowTo/Sorting/
- sorting by index or named attribute!

Sorting by age using tuples:

>>> student_tuples = [
        ('john', 'A', 15),
        ('jane', 'B', 12),
        ('dave', 'B', 10),
]
>>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

Sorting by attribute of a class:

>>> class Student:
        def __init__(self, name, grade, age):
                self.name = name
                self.grade = grade
                self.age = age
        def __repr__(self):
                return repr((self.name, self.grade, self.age))

>>> student_objects = [
        Student('john', 'A', 15),
        Student('jane', 'B', 12),
        Student('dave', 'B', 10),
]
>>> sorted(student_objects, key=lambda student: student.age)   # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

paper: sketch main page

State “DONE” from “STARTED” [2014-02-01 Sat 15:02]

top: public voit banner (as usual)
main content: 7 most recent blog entries
- only up to first HR or heading
  - if HR/heading is found, add “read while article…” as link below
side-bar
1. “about” (persistent page)
  - about this blog
    - SW being used
    - how to follow
    - link to FEED
  - about Karl Voit
    - Twitter
    - github
2. “tags” (persistent page)
  - explaining why I am using tags
  - auto-tags
  - overview on tags
3. 2011, 2012, 2013, …: yearly overview pages
  - of all years that contain blog articles
4. “follow me”: get updates via FEED (persistent page)
  - explaining the methods I provide

Defining the content in a special template heading:

main page as “** Mainpage”

side-bar as “*** Sidebar”
- list of elements
- HR as separator (as shown above)

paper: sketch overview pages

State “DONE” from “STARTED” [2014-02-01 Sat 15:03]

see paper from [2014-01-31 Fri]

First list is an implicit article update list showing a changelog GH-issue

test links in captions

Not scheduled, was “[2018-04-14 Sat]” on [2019-02-15 Fri 15:17]
Rescheduled from “[2018-04-07 Sat]” on [2018-04-08 Sun 21:49]
Rescheduled from “[2018-04-01 Sun]” on [2018-04-03 Tue 21:39]
Rescheduled from “[2018-03-18 Sun]” on [2018-03-18 Sun 20:21]

[2018-03-15 Thu 08:00]

Wiki: document the auto-tags feature

jpegoptim - optimizing image files

[2018-03-28 Wed 13:01]

https://github.com/tjko/jpegoptim
https://www.garron.me/en/bits/optimize-compress-jpg-png.html
- How JPEG and PNG could be optimized

publish an Install setup Screencast

[2018-09-25 Tue 20:16]

Redesign CSS/layout to use CSS Grid

awesome talk https://www.youtube.com/watch?v=7kVeCqQCxlk&feature=youtu.be
simple example: https://codepen.io/mor10/pen/QvmLpd
https://gridbyexample.com/

warn alternative filename: only print debug message as long as file is found

Not scheduled, was “[2018-05-21 Mon]” on [2018-11-12 Mon 14:26]
Rescheduled from “[2018-05-20 Sun]” on [2018-05-20 Sun 23:33]
Rescheduled from “[2018-05-13 Sun]” on [2018-05-13 Sun 17:53]
Rescheduled from “[2018-04-14 Sat]” on [2018-05-10 Thu 12:28]
Rescheduled from “[2018-04-07 Sat]” on [2018-04-08 Sun 21:50]
Rescheduled from “[2018-04-01 Sun]” on [2018-04-03 Tue 21:39]
Rescheduled from “[2018-03-18 Sun]” on [2018-03-18 Sun 20:21]
Rescheduled from “[2018-03-14 Wed]” on [2018-03-13 Tue 23:21]
Rescheduled from “[2018-03-03 Sat]” on [2018-03-03 Sat 20:41]
Rescheduled from “[2018-02-25 Sun]” on [2018-02-25 Sun 21:31]
Rescheduled from “[2018-02-11 Sun]” on [2018-02-11 Sun 21:06]
Rescheduled from “[2018-02-04 Sun]” on [2018-02-04 Sun 10:09]
Rescheduled from “[2018-01-20 Sat]” on [2018-01-21 Sun 11:47]
Rescheduled from “[2018-01-14 Sun]” on [2018-01-14 Sun 19:22]
Rescheduled from “[2018-01-07 Sun]” on [2018-01-07 Sun 18:55]
Rescheduled from “[2018-01-01 Mon]” on [2018-01-03 Wed 17:16]
Rescheduled from “[2017-12-30 Sat]” on [2017-12-30 Sat 22:50]

[2017-12-27 Wed 22:36]

Suche von alternativen Dateinamen, der nur aus YYYY-MM-DD + Name besteht

Suche in Schleife von vorne weg, bis
- nur ein eindeutiger Hit gefunden wird oder
- keiner mehr gefunden wird oder
- der String zu Ende ist

move or copy “Archive” link to “published on …”

supress multiple tags of one blog entry

issue: when an article is tagged twice with “foo”, it appears twice in the tag page and so forth

fix: internal links do not get sanitized in quote blocks

file:~/src/lazyblorg/testdata/end_to_end_test/result/2017/09/30/link-test/index.html

move feed generator from htmlizer in own module

tag pages: Atom-feed for each tag

link-only feeds

selected tag-based feeds (english, german, bicycle,…)

add “read more articles with tag FOOBAR” to bottom

[2014-10-25 Sa. 09:22]

fix images in Atom feeds

Not scheduled, was “[2017-08-04 Fri]” on [2017-08-04 Fri 11:13]
Rescheduled from “[2017-08-01 Tue]” on [2017-08-02 Wed 09:08]
Rescheduled from “[2017-07-26 Wed]” on [2017-07-28 Fri 08:50]
Rescheduled from “[2017-07-21 Fri]” on [2017-07-22 Sat 08:28]
Rescheduled from “[2017-07-20 Thu]” on [2017-07-21 Fri 08:02]

Working examples

https://www.baty.net/index.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Jack Baty&#39;s Blog</title>
    <link>https://www.baty.net/</link>
    <description>Recent content on Jack Baty&#39;s Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <lastBuildDate>Mon, 22 Oct 2018 07:34:52 -0400</lastBuildDate>
        <atom:link href="https://www.baty.net/index.xml" rel="self" type="application/rss+xml" />


    <item>
      <title>Selecting a content directory with Easy-Hugo</title>
      <link>https://www.baty.net/2018/selecting-a-content-directory-with-easy-hugo/</link>
      <pubDate>Mon, 22 Oct 2018 07:34:52 -0400</pubDate>

      <guid>https://www.baty.net/2018/selecting-a-content-directory-with-easy-hugo/</guid>

        <description>&lt;p&gt;&lt;a href=&#34;https://github.com/masasam/emacs-easy-hugo&#34;&gt;Easy Hugo&lt;/a&gt; is a handy Emacs mode for posting to Hugo-based blogs. One difficulty I had was that I have many content directories, and easy hugo only included methods for moving through directories one at a time.&lt;/p&gt;

&lt;p&gt;In a related bug report, I suggested that being able to quickly select the content directory would be useful, and the author, &lt;a href=&#34;https://github.com/masasam&#34;&gt;masasam&lt;/a&gt;, just &lt;a href=&#34;https://github.com/masasam/emacs-easy-hugo/issues/42#issuecomment-431795287&#34;&gt;added the feature&lt;/a&gt;. Works great.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://www.baty.net/img/2018/2018-10-22_easy-hugo-select-postdir.png&#34; alt=&#34;screenshot&#34; /&gt;&lt;/p&gt;
</description>

    </item>

  </channel>
</rss>

How is Hugo doing things?

https://github.com/gohugoio/hugo/issues/3473
- it’s better to drop RSS support and use Atom → Arguments listed!
https://github.com/lingxz/er
- a Hugo theme implementing Atom feeds

fix: Blog entries can not start with a list

https://github.com/novoid/lazyblorg/issues/10

generate tagtrees of level 2 (and more)

current situation
- DOMAIN/tags/tag1
- DOMAIN/tags/tag2
- DOMAIN/tags/tag3
this feature provides:
- DOMAIN/tags/tag1/tag2
- DOMAIN/tags/tag1/tag3
- DOMAIN/tags/tag2/tag1
- DOMAIN/tags/tag3/tag1
- … when there are:
  - articles tagged with tag1, tag2, and tag3
  - no articles tagged with tag2 and tag3 combined
tag cloud with links whose names are: “tag1+tag2” “tag3+tag2” “tag4+tag2” … on page of tag2 (and so on)
list of all articles that have (at least) the matching tags
[ ] set level of depth in config.py; default = 1

tagstore proof of concept: tagstore_add_entry.py

## generate all permutations
def generate_permutation(str):
    if len(str) <=1:
        yield str
    else:
        for perm in generate_permutation(str[1:]):
            for i in range(len(perm)+1):
                yield perm[:i] + str[0:1] + perm[i:]

tagstore > store.py

def __build_store_navigation(self, link_name, tag_list, current_path):
    """
    builds the whole directory and link-structure (describing & categorising nav path) inside a stores filesystem
    """
    link_source = self.__watcher_path + "/" + link_name

    for tag in tag_list:
        self.__file_system.create_dir(current_path + "/" + tag)
        self.__file_system.create_link(link_source, current_path + "/" + tag + "/" + link_name)
        recursive_list = [] + tag_list
        recursive_list.remove(tag)
        self.__build_store_navigation(link_name, recursive_list, current_path + "/" + tag)

blog-format.org: explain all replacement strings at top

[#C] remove orgmode-id from entry-page-header

It holds a random ID. No need for the orgmode-id on the entry page.

fix generating publishing time for empty tag pages

orgparser: refactor so that spaces before drawer lines are OK

link autotag pages

“language:english” should point to tag page of “language” (and not “language:english”)

README: write «How to upgrade lazyblorg»

document: what elements are rendered/converted via pypandoc?

Not scheduled, was “[2016-08-06 Sat]” on [2016-11-18 Fri 20:22]
Rescheduled from “[2016-02-08 Mon]” on [2016-07-23 Sat 12:20]

id:implemented-org-elements

refractor blog data Metadata to single dict

why
- list of data strutures is getting longer and longer
- adding a new data structure is getting tedious: unit tests, …
- simplification?
data structures to add to single data structure-dict:
- blog_data
- metadata
- FIXXME: more?

analyze and steal CSS for columns from URL

State “DONE” from “NEXT” [2015-06-27 Sat 19:06]

http://endlessparentheses.com/endless.css

interesting part is marked with /* Responsiveness */

reponsive design: different settings with different height/width:

/* Responsiveness */
@media (max-height: 34rem) {
    .left-sidebar-ad {display:none;}
    .se-flair {display:none;}
}
@media (min-height: 34.1rem) {
    .left-sidebar-ad {display:initial;}
    .se-flair {display:initial;}
}

@media (max-width: 62.2rem) {
    .post-ad-mobile {
        display: initial;
        width: 320px;
        height: 100px;
        margin-left: auto;
        margin-right: auto;
    }

    .post-ad {display: none;}
    .left-sidebar-ad {display:none;}

    .masthead-links {
        margin-top: .3rem;
        margin-left: 0;
        margin-right: 0;
    }
    .masthead-links li {
        margin:0;
        margin-bottom:.3rem;
        text-align: center;
        font-size:100%;
        width:32%;
        display:none;
    }
    /* Only 3 links fit in mobile. */
    .masthead-links li:nth-child(1),
    .masthead-links li:nth-child(2),
    .masthead-links li:nth-child(3) {
        display:inline-block;
    }

    .pagination {
        width: 99%;
    }
}
@media (min-width: 62.3rem) {

    .container {
        /* overflow:hidden; */
        max-width: 52rem;
        /* width:34rem; */
        margin-left:  auto;
        margin-right: auto;
        /* background-color:red; */
    }

    .masthead {
        margin-top:2rem;
        margin-right:11rem;
        margin-bottom: 0;
        /* text-align:right; */
        position:fixed;
        top:0;
        right:50%;
        display:block;
        width:13rem;
        height:100%;
    }
    .post-with-comments {
        /* float:left; */
        margin-left:20rem;
        /* width:34rem; */
        display:inline-block;
        /* overflow:scroll; */
    }

    .post-ad-mobile {display: none;}
}

@media (min-width: 76.3rem) {
    .container {
        /* overflow:hidden; */
        max-width: 31rem;
        /* width:34rem; */
        margin-left:  auto;
        margin-right: auto;
        /* background-color:red; */
    }

    .post-with-comments {
        margin-left: 0;
        margin-right: 0;
    }

    .right-sidebar {
        font-family: "Droid Serif", serif;
        padding-bottom: 0;
        margin-right: 0rem;
        margin-top:1.3rem;
        margin-left:20rem;
        margin-bottom: 0;
        /* text-align:right; */
        position:fixed;
        top:0;
        left:50%;
        display:inline-block;
        width:15rem;
        height:100%;
        /* background-color:blue; */
    }

    .masthead {
        /* background-color:blue; */
        margin-right:20rem;
        /* text-align:right; */
    }
}

steal font from https://kungsgeten.github.io/yankpad.html

only if the font is not loaded from a third party server (privacy!)
[X] find out how to set this font
- https://kungsgeten.github.io/static_about.html
  
  This blog is created with Emacs using the amazing org-mode. Most of the text is set in Alegreya — except for code snippets and other monospace text, which is set in Cousine. Much of the typography — the use of sidenotes and sections separated by whitespace, starting the sentence with small caps — is inspired by Edward Tufte’s work.
  - https://www.google.com/fonts/specimen/Alegreya
[ ] Is it possible to host this font on my own server?
- https://www.google.com/fonts#UsePlace:use/Collection:Alegreya
  - this uses Google servers
- [ ] https://developers.google.com/fonts/
- [ ] http://michaelboeke.com/blog/2013/09/10/Self-hosting-Google-web-fonts/
- [ ] https://github.com/majodev/google-webfonts-helper

add “estimated time to read” at top (+ autotag?)

https://www.phase2technology.com/blog/implementing-an-estimated-read-time-on-articles/
http://cs.stackexchange.com/questions/57285/how-to-calculate-an-accurate-estimated-reading-time-of-text
- algorithm
http://niram.org/read/
- 200 words/min
http://marketingland.com/estimated-reading-times-increase-engagement-79830
- «Research varies, but generally, the average adult reads 200-250 words in one minute.»
- arguments against using this feature (makes people angry)

remove list-itemize (and mytable?) from code and docu

it was replaced by pypandoc

make show-sidebar-text work: show sidebar on small displays

Not scheduled, was “[2016-02-26 Fri]” on [2016-02-26 Fri 18:49]
Rescheduled from “[2015-10-24 Sat]” on [2016-02-14 Sun 08:49]
Rescheduled from “[2015-09-25 Fri]” on [2015-09-24 Thu 19:58]
Rescheduled from “[2015-07-25 Sat]” on [2015-07-25 Sat 18:37]
Rescheduled from “<2015-07-19 Sun>” on [2015-07-20 Mon 19:19]
Rescheduled from “<2015-07-17 Fri>” on [2015-07-17 Fri 18:57]
Rescheduled from “<2015-06-28 Sun>” on [2015-07-11 Sat 11:51]

[#A] branch: replace htmlizer with pypandoc

why?
- Org-syntax elements like lists or tables are hard to parse and htmlize correctly
- third party Org-mode parser and htmlizer would be great
  - need to check third party library using unit tests!
this attempt:
- keep control of the basic parsing/htmlizing process
- convert blocks using pypandoc library
- reduce complexity of current parser/htmlizer
process
1. [X] get a rough overview what needs to be changed
2. [X] write unit-test for basic test of pypandoc!
3. [X] document pypandoc requirement and its test
4. [X] include test_pypandoc.py in testall
5. [X] thinking of: keeping parser/htmlizer and using pypandoc only for hard to parse/htmlize blocks?
  - [X] test naïve pypandoc lazyblorg for public-voit before
  - decision: I stick to my own parser so far and might convert some Org-mode syntax elements to pypandoc later on
6. [ ] data model (file_blog_data)
  - simplified: instead of blocks and so on, there will only be meta-data from the headings and blobs of Org-mode blocks
  - decision: not feasible because I need to much insider-information on the blocks in order to do all the magic
7. [X] implement exception handling when pypandoc is not found/installed
8. [X] implement pypandoc_test.py testcase with all used Org-mode syntax elements
9. [-] implement tables using pypandoc
  - in order to get experience of the possibilities
  - don’t forget sanitizing
  - [X] write parser
  - [X] write parser tests
  - [X] write htmlizer
  - [X] write htmlizer tests
  - [ ] add CSS for tables
10. [-] implement lists using pypandoc
  - don’t forget sanitizing
  - [X] write parser
  - [ ] write parser tests
  - [X] write htmlizer
  - [X] write htmlizer tests with complicated lists
  - [ ] add CSS for lists
11. [X] pypandoc as fall-back for any content which has no special treatment
12. lazyblorg.py
  - [ ] generate_output()
    - using a different output module!
      - pandocizer.py (or similar)
  - [ ] keep old orgparser for templates?
13. htmlizer.py -copy-> pandocizer.py (or similar)
  - [ ] feed generator definitions
    - feedentry += ‘\n’.join(blog_data_entry[‘content’])
    - feedentry += ‘\n’.join(blog_data_entry[‘htmlteaser’])
  - [ ] generate_entry_page()
  - [ ] sanitize_and_htmlize_blog_content() -> completely obsolete?
  - [ ] htmlize_simple_text_formatting() -> completely obsolete?
  - [ ] sanitize_html_characters() -> completely obsolete?
  - [ ] _generate_temporal_article()
  - [ ] _generate_persistent_article()

CSS: blocks like src: remove lines from right/top/bottom & add color gradient to right

fix: ~-escaping

https://github.com/novoid/lazyblorg/issues/5
see id:2014-05-09-managing-digital-photographs

All portrait photographs are rotated using [[http://www.sentex.net/~mwandel/jhead/][jhead]]. Also
with jhead, I generate file-name time-stamps from the Exif header
time-stamps. Using [[https://github.com/novoid/date2name][date2name]] I add time-stamps also to the movie
files. After processing all those files, they get moved to the
destination folder for new digicam files: ~$HOME/tmp/digicam/tmp/~.

… will be transformed into:

… which is wrong

get ordered lists of blog entries

[X] time-ordered by last modification (for FEED and main page)
- Newest entry of entry[‘finished-timestamp-history’] is the time-stamp of the last update
- for each entry in entries
  - get newest entry of entry[‘finished-timestamp-history’]
  - store to a sorted list (newest first or last)
[ ] time-ordered by issue day (for overview pages)
- Oldest entry of entry[‘finished-timestamp-history’] is the publication time-stamp!
- for each entry in entries
  - get oldest entry of entry[‘finished-timestamp-history’]
  - store to a sorted list (newest first or last)

add: –include-archived-entries

[ ] define, what “archived entries” is
- tag :ARCHIVE:
- file.org_archive
- FIXXME
[ ] check if archived tag gets removed
[ ] add command line parameter for adding archived entries
- by default, archived entries do not get added to the blog

fix tilde in URL

https://github.com/novoid/lazyblorg/issues/5

http://sd.wareonearth.com/~phil/xdu/examp1.gif

… gets messed up to:

http://sd.wareonearth.com/</code>phil/xdu/examp1.gif

on https://karl-voit.at/2014/03/25/xdu

[ ] add unit test to htmlizer
[ ] fix bug
[ ] test

escape <> in blocks

https://github.com/novoid/lazyblorg/issues/6

mark integration points with “## INTEGRATION: ”

Tag statistics page

10 most frequently used tags (with occurrence)
10 leastd frequently used tags (with occurrence; if occurrence == 1 → show link to article)

create pull-request on https://github.com/hober/planet.emacsen.org/ for my emacs/english feed

Rescheduled from “[2016-05-22 Sun]” on [2016-05-22 Sun 18:58]
Rescheduled from “[2016-05-15 Sun]” on [2016-05-19 Thu 13:42]
Rescheduled from “[2016-05-13 Fri]” on [2016-05-14 Sat 10:34]

HTML: manually create archive overview page

[[Public Voit]] > Archive

[from year of oldest entry to year of newest entry]

|     | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
| Jan |      |      |      |      |      |      |
| Feb |      |      |      |      |      |      |
| Mar |[[1]] |      |      |      |      |      |
| Apr |      |      |      |      |      |      |
| May |[[5]] |      |      |      |      |      |
| Jun |      |      |      |      |      |      |
| Jul |      |      |      |      |      |      |
| Aug |[[2]] |      |      |      |      |      |
| Sep |      |      |      |      |      |      |
| Oct |      |      |      |      |      |      |
| Nov |      |      |      |      |      |      |
| Dec |      |      |      |      |      |      |



|      | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
| 2009 |     |     |[[1]]|     |[[5]]|     |     |[[2]]|     |     |     |     |
| 2010 |     |     |     |     |     |     |     |     |     |     |     |     |
| 2011 |     |     |     |     |     |     |     |     |     |     |     |     |
| 2012 |     |     |     |     |     |     |     |     |     |     |     |     |
| 2013 |     |     |     |     |     |     |     |     |     |     |     |     |
| 2014 |     |     |     |     |     |     |     |     |     |     |     |     |

tasks
- [ ] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python

HTML: manually create year overview page

[[Public Voit]] > [[Archive]]: 2014

January: 2 [is link to monthly overview]
February:
March: 4
...

tasks
- [ ] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python

HTML: manually create month overview page

Not scheduled, was “2014-03-01 Sat” on [2014-03-01 Sat 21:01]

[[Public Voit]] > [[2014]] - 01

- 2014-01-17: Title of the blog article
- 2014-01-21: Another title

tasks
- [X] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python

HTML: manually create day overview page

Not scheduled, was “2014-03-01 Sat” on [2014-03-01 Sat 21:01]

like monthly overview but only for the day

tasks
- [ ] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python
https://docs.python.org/2/library/calendar.html

redirection to other ID

why?
- to re-direct an old ID/entry when there is a new one
- to enable short jump-pages like id:aproject-about -> id:2016-10-04-foo-bar-project-about-page
idea
- entry with title «redirect: id:foo-bar» results in a simple (minimal) redirect page
[X] how to generate a simple redirect page?
- http://stackoverflow.com/questions/5411538/redirect-from-an-html-page
- http://karl-voit.at/test.html

[ ] implement in lazyblorg

fix issue where an article can’t end with a list item

https://github.com/novoid/lazyblorg/issues/4
example: id:2015-05-24-browser-keywords

catch all exceptions and create a log entry

Glossary: variable names and so on

Refactor: unify all variables according to glossary

[#C] move blog-tag(s) to CLI parameter

filter Org-mode articles with parameter of one or more tags
allows for generating different blogs (or sub-blogs) just with different commands

[#C] CSS: if page is less than one page, place footer at bottom

[#C] research: Python Jinja as template system

[#C] source code: replace pre with suitable environment

htmlizer.py -> “## FIXXME: replace pre with suitable source code environment!”

ID of sub-headings get stored and processed to anchors

sub-headings within blog articles can have ID-property as well
parser indexes those IDs
HTML template adds anchor-ID
sanitize internal links resolves those links as well
I want to
- refer to any ID of any blog article heading or blog article sub-heading using the same method:

[[id:any-id][anchor text]]

lazyblorg has to be able to derive following according to any ID:
- get the URL of a blog entry
- get the ID/HREF of any sub-heading of any blog entry

mark updates on entries

heading gets a blog entry with a unique :ID:
setting “Update 1/2/3/…” for each one of those:

:LOGBOOK:
- State "DONE"       from "NEXT"       [2011-10-07 Fri 15:40]
:END:

ALTERNATIVELY: set “Update YYYY-MM-DD for last one of those (from above)
heading with known unique ID and no state DONE
- should stay the same until state changes back to DONE
- this requires something which remembers states
- this requires keeping old entries
body:
- manual section:
  - Updates:
    1. YYYY-MM-DD: short description
    2. YYYY-MM-DD: short description

add option to tweet title/url for new/updated articles

e.g., publish new stuff on a “public-voit”-Twitter-account
- probably there is a cloud service that translates RSS to Twitter?
- probably more RSS-to-something-translators?

link to day in Wikipedia

historic context
YYYY-MM-DD -> links to Wikipedia-entries of days
- https://en.wikipedia.org/wiki/Portal:Current_events/2010_August_26

auto-tag entries [0/3]

auto-tags are visually separated from manual tags to make it clear that they are automatically generated (and might be bogus sometime)
[ ] add to about-page
[ ] add to documentation (README, …)
[ ] syntax of auto-tags
- or only highlight with different background color of “tags”?
feeds for auto-tags
- [ ] feeds/lazyblorg-shorts.*
- [ ] feeds/lazyblorg-deutsch.*
- [ ] feeds/lazyblorg-english.*

Language

State “DONE” from [2016-11-16 Wed 22:08]

lang-de, de, en, us, … ?
language tag is automatically derived
- by guessing language based on common stopwords or external library

Research: search for stop words

State “DONE” from “” [2015-05-06 Wed 11:26]

http://stackoverflow.com/questions/19560498/faster-way-to-remove-stop-words-in-python

from nltk.corpus import stopwords

cachedStopWords = stopwords.words("english")

def testFuncNew():
    text = 'hello bye the the hi'
    text = ' '.join([word for word in text.split() if word not in cachedStopWords])

if __name__ == "__main__":
    testFuncNew()

[ ] return percentage of stopwords for list of known languages

install and test nltk.corpus

State “DONE” from “NEXT” [2015-05-09 Sat 10:42]

http://www.nltk.org/api/nltk.corpus.html
- http://www.nltk.org/install.html
- http://www.nltk.org/data.html

installing nltk:

root@gary ~ #  pip install -U nltk
Downloading/unpacking nltk
  Downloading nltk-3.0.2.tar.gz (991Kb): 991Kb downloaded
  Running setup.py egg_info for package nltk

    warning: no files found matching 'Makefile' under directory '*.txt'
    warning: no previously-included files matching '*~' found anywhere in distribution
Installing collected packages: nltk
  Running setup.py install for nltk

    warning: no files found matching 'Makefile' under directory '*.txt'
    warning: no previously-included files matching '*~' found anywhere in distribution
Successfully installed nltk
Cleaning up...
root@gary ~ #

installing nltk.corpus > stopwords:

ipython
1. import nltk
2. nltk.download()
3. manually selecting corpus
  - second or third tab > stopwords
4. done

… way to complicated for other lazyblorg-users for just the stopwords!

cachedStopWords = stopwords.words("english")
cachedStopWords.length()  ## -> 127

For German, it’s 231 stopwords :-O

Note to myself: use this as argument on the broader variety of the German language compared to English :-)

Extract a sub-set of those stopwords and store it directly.

Determine words that occur as English and German stopwords:

In [16]: [x for x in cachedStopWordsde if x in cachedStopWords]
Out[16]: [u'am', u'an', u'in', u'so', u'was', u'will']

English stopwords without common German ones:

[u'I',
u'me',
u'my',
u'myself',
u'we',
u'our',
u'ours',
u'ourselves',
u'you',
u'your',
u'yours',
u'yourself',
u'yourselves',
u'he',
u'him',
u'his',
u'himself',
u'she',
u'her',
u'hers',
u'herself',
u'it',
u'its',
u'itself',
u'they',
u'them',
u'their',
u'theirs',
u'themselves',
u'what',
u'which',
u'who',
u'whom',
u'this',
u'that',
u'these',
u'those',
u'is',
u'are',
u'were',
u'be',
u'been',
u'being',
u'have',
u'has',
u'had',
u'having',
u'do',
u'does',
u'did',
u'doing',
u'a',
u'the',
u'and',
u'but',
u'if',
u'or',
u'because',
u'as',
u'until',
u'while',
u'of',
u'at',
u'by',
u'for',
u'with',
u'about',
u'against',
u'between',
u'into',
u'through',
u'during',
u'before',
u'after',
u'above',
u'below',
u'to',
u'from',
u'up',
u'down',
u'on',
u'off',
u'over',
u'under',
u'again',
u'further',
u'then',
u'once',
u'here',
u'there',
u'when',
u'where',
u'why',
u'how',
u'all',
u'any',
u'both',
u'each',
u'few',
u'more',
u'most',
u'other',
u'some',
u'such',
u'no',
u'nor',
u'not',
u'only',
u'own',
u'same',
u'than',
u'too',
u'very',
u'can',
u'just',
u'don',
u'should',
u'now']

German stopwords without common English ones:

[u'aber',
 u'alle',
 u'allem',
 u'allen',
 u'aller',
 u'alles',
 u'als',
 u'also',
 u'ander',
 u'andere',
 u'anderem',
 u'anderen',
 u'anderer',
 u'anderes',
 u'anderm',
 u'andern',
 u'anderr',
 u'anders',
 u'auch',
 u'auf',
 u'aus',
 u'bei',
 u'bin',
 u'bis',
 u'bist',
 u'da',
 u'damit',
 u'dann',
 u'der',
 u'den',
 u'des',
 u'dem',
 u'die',
 u'das',
 u'da\xdf',
 u'derselbe',
 u'derselben',
 u'denselben',
 u'desselben',
 u'demselben',
 u'dieselbe',
 u'dieselben',
 u'dasselbe',
 u'dazu',
 u'dein',
 u'deine',
 u'deinem',
 u'deinen',
 u'deiner',
 u'deines',
 u'denn',
 u'derer',
 u'dessen',
 u'dich',
 u'dir',
 u'du',
 u'dies',
 u'diese',
 u'diesem',
 u'diesen',
 u'dieser',
 u'dieses',
 u'doch',
 u'dort',
 u'durch',
 u'ein',
 u'eine',
 u'einem',
 u'einen',
 u'einer',
 u'eines',
 u'einig',
 u'einige',
 u'einigem',
 u'einigen',
 u'einiger',
 u'einiges',
 u'einmal',
 u'er',
 u'ihn',
 u'ihm',
 u'es',
 u'etwas',
 u'euer',
 u'eure',
 u'eurem',
 u'euren',
 u'eurer',
 u'eures',
 u'f\xfcr',
 u'gegen',
 u'gewesen',
 u'hab',
 u'habe',
 u'haben',
 u'hat',
 u'hatte',
 u'hatten',
 u'hier',
 u'hin',
 u'hinter',
 u'ich',
 u'mich',
 u'mir',
 u'ihr',
 u'ihre',
 u'ihrem',
 u'ihren',
 u'ihrer',
 u'ihres',
 u'euch',
 u'im',
 u'indem',
 u'ins',
 u'ist',
 u'jede',
 u'jedem',
 u'jeden',
 u'jeder',
 u'jedes',
 u'jene',
 u'jenem',
 u'jenen',
 u'jener',
 u'jenes',
 u'jetzt',
 u'kann',
 u'kein',
 u'keine',
 u'keinem',
 u'keinen',
 u'keiner',
 u'keines',
 u'k\xf6nnen',
 u'k\xf6nnte',
 u'machen',
 u'man',
 u'manche',
 u'manchem',
 u'manchen',
 u'mancher',
 u'manches',
 u'mein',
 u'meine',
 u'meinem',
 u'meinen',
 u'meiner',
 u'meines',
 u'mit',
 u'muss',
 u'musste',
 u'nach',
 u'nicht',
 u'nichts',
 u'noch',
 u'nun',
 u'nur',
 u'ob',
 u'oder',
 u'ohne',
 u'sehr',
 u'sein',
 u'seine',
 u'seinem',
 u'seinen',
 u'seiner',
 u'seines',
 u'selbst',
 u'sich',
 u'sie',
 u'ihnen',
 u'sind',
 u'solche',
 u'solchem',
 u'solchen',
 u'solcher',
 u'solches',
 u'soll',
 u'sollte',
 u'sondern',
 u'sonst',
 u'\xfcber',
 u'um',
 u'und',
 u'uns',
 u'unse',
 u'unsem',
 u'unsen',
 u'unser',
 u'unses',
 u'unter',
 u'viel',
 u'vom',
 u'von',
 u'vor',
 u'w\xe4hrend',
 u'war',
 u'waren',
 u'warst',
 u'weg',
 u'weil',
 u'weiter',
 u'welche',
 u'welchem',
 u'welchen',
 u'welcher',
 u'welches',
 u'wenn',
 u'werde',
 u'werden',
 u'wie',
 u'wieder',
 u'wir',
 u'wird',
 u'wirst',
 u'wo',
 u'wollen',
 u'wollte',
 u'w\xfcrde',
 u'w\xfcrden',
 u'zu',
 u'zum',
 u'zur',
 u'zwar',
 u'zwischen']

implement: basic usage of nltk.corpus > stopwords

State “CANCELLED” from “NEXT” [2015-05-09 Sat 10:42]
nltk install overhead too complicated just for the stopword lists

implement: exception handling if nltk is not installed yet

State “CANCELLED” from “NEXT” [2015-05-09 Sat 10:43]
nltk install overhead too complicated just for the stopword lists

get license of stopword list from nltk

State “DONE” from “NEXT” [2015-05-09 Sat 10:49]

https://github.com/nltk/nltk/wiki/FAQ
- ” The corpora are distributed under various licenses, as documented in their respective README files.”
  - locate: file:~/nltk_data/corpora/stopwords/README
    - “They were obtained from: http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/”

include list of stopwords in source

State “DONE” from “NEXT” [2015-05-09 Sat 12:17]

see id:2015-05-09-test-nltk for stopwords and extracting the lists

implement: return percentage of stopwords for list of known languages

State “DONE” from “NEXT” [2015-05-09 Sat 12:17]

implement: enable language guessing auto-tag using command line argument

State “DONE” from “NEXT” [2015-05-09 Sat 18:47]

implement: save auto-tag to object

State “DONE” from “NEXT” [2015-05-09 Sat 18:47]

Length

e.g., articles whose preview is equal to the whole article (i.e.: no sub-heading, no horizontal line) are marked with autotag-shorty (or similar)
oneliners
- below a certain threshold
middlesize(sic?)
- between oneliners and fullsizeentries
fullsize(sic?)

compare: id:autotag-estimated-time-to-read

Auto-Disclaimer

hooks for :TAGS: (can be optional) or case-sensitive keywords in headings
if found:
- link to a special pre-defined page
Example: if “What The World Needs”|”WTWN:” is found, link to a page where WTWN-series is described in general.

implement: display auto-tags (differently)

Not scheduled, was “<2015-05-09 Sat>” on [2015-05-09 Sat 22:49]

[X] write class information
[ ] different CSS format for the two classes

add tree of headings on each article with sub-headings

probably in sidebar?

handle Org-mode priorities in heading

Like “[#A]”.

possible ideas
- ignore priorities
  - suppress!
- convert into given tags (“important”, …)

[#B] CSS generated using Org/babel

css.org with Comments and css-blocks

example-CSS content

automatically extracting CSS code from that Org-mode file
example: http://www.tbray.org/ongoing/When/201x/2011/04/21/Reflowing

backward compatibility for old browsers:

section, article, header, footer, nav, aside, hgroup {
display: block;
}

[#B] format tables

add JavaScript to be able to sort by column
possible cnadidates for methods
- http://tablesorter.com/docs/
- HTML5 (?)
- CSS: http://www.cssjuice.com/16-sortable-table-techniques/
for lists
- http://farhadi.ir/projects/html5sortable/

[#C] add frames to images

example: http://www.tbray.org/ongoing/When/200x/2006/04/08/Picture-Frames
- caution: that’s Java

micro-blogging; Twitter-replacement; Mastodon

State “SOMEDAY” from [2018-09-30 Sun 19:55]

motivation
- use lazyblorg as a replacement for Twitter/Mastodon
ideas:
- generate ID and title/URL from time
  - e.g., 2018-09-23T15:49
    - ID: 2018-09-23T15:49
    - URL: http://karl-voit.at/2018/09/23/15.49/

Think about a new parser approach using Pandoc filters

State “SOMEDAY” from [2018-02-16 Fri 15:33]

Source
- Using Emacs 41 Pandoc: https://pandoc.org/filters.html
  - AST (Abstract Syntax Tree) of the internal format of pandoc usable for external tools like lazyblorg
- Could be done even in Python! :-)
Advantages
- Parsing would be “outsourced” to pandoc
Disadvantages
- Large dependency on pandoc and its internal structure and code
- New approach using filters on the already parsed AST
  - Not sure if everything from the current approach could be mapped to the new one: misc “layers” of replacements are done in different stages of org-parser and htmlizer.
- lazyblorg would be an add-on to pandoc(?)
  - Not sure about that though.

Atom-feeds for link combinations

State “SOMEDAY” from [2017-07-26 Wed 00:18]

link-only feeds
tag combinations always alphabetically ordered
- not foo_bar but bar_foo
- so that any user can guess them
  - instead of
    - finding the links
    - generating a multifold of feeds (bar_foo and foo_bar) with same content

DOI for blog entries?

State “SOMEDAY” from “NEXT” [2017-07-26 Wed 13:09]

http://www.doi.org/faq.html
- «register via a DOI Registration Agency (RA)»
- annual fees: example by crossref
  - no free service
Wordpress-plugin which provides DOIs

add? Shariff: c’t entwickelt datenschutzfreundliche Social-Media-Buttons weiter

State “SOMEDAY” from “NEXT” [2017-07-26 Wed 13:09]

Erklärung: http://www.heise.de/newsticker/foren/S-Re-Find-ich-schon-Re-Keine-gute-Idee/forum-288901/msg-26149283/read/
- http://www.heise.de/ct/ausgabe/2014-26-Social-Media-Buttons-datenschutzkonform-nutzen-2463330.html
https://github.com/heiseonline/shariff

switch to Bootstrap CSS/HTML framework

State “SOMEDAY” from “” [2015-07-14 Tue 17:22]

Bootstrap: outsourcing my poor CSS knowledge to experts
- https://en.wikipedia.org/wiki/Bootstrap_%28front-end_framework%29
  - http://getbootstrap.com/
    - http://getbootstrap.com/examples/blog/
- http://prideparrot.com/blog/archive/2014/4/blog_template_using_twitter_bootstrap3_part1
  - verbose how-to from ground up!
    - almost too verbose :-(
  - looks great!
- http://erjjones.github.io/blog/How-I-built-my-blog-in-one-day/
  - with Jekyll

downsides:

add an external dependency
add complexity (I may not need?)
current HTML has to be re-designed to fit Bootstrap
I still need to understand something in order to adapt it to my needs

bridge to Diaspora or similar Twitter-like services

State “SOMEDAY” from “STARTED” [2017-01-06 Fri 22:59]

include image files via `lbimg` custom link

State “SOMEDAY” from “NEXT” [2017-06-05 Mon 16:07]

“lbimg:image.png”
- works in Orgmode using custom link to valid folder
- lazyblorg recognizes it and translates it to img
show a fixed maximum width/height image
- probably with a magnifying glass and a plus symbol in its lower right corner
show the big version when clicking on it
see Kröner2011 p.140ff for HTML5 and figure/caption
handle old HTTP-ATTR lines and new Org-mode HTTP attributes

research different Org-mode ways of defining including images

just link a file, do not show image
show the linked image directly

simplify HTML, omit tags

State “SOMEDAY” from [2016-11-02 Wed 16:58]

https://twitter.com/stefan2904/status/793832217940140033
- IMPORTANT: https://google.github.io/styleguide/htmlcssguide.xml?showone=Optional_Tags#Optional_Tags

include bookmarks to lazyblorg

State “SOMEDAY” from “NEXT” [2015-06-21 Sun 11:39]

I store bookmarks according to Managing web bookmarks with Org-mode
Idea: create short (minimal) pages per bookmark

show (small) links to tag-matching bookmarks on articles and tag-pages

State “SOMEDAY” from “NEXT” [2015-06-21 Sun 11:40]

add diff to previous version in case of update

State “SOMEDAY” from “” [2014-02-28 Fr 09:27]

should be possible because lazyblorg stores old raw content and gets new one
[ ] what happens in case of re-generation blog with old diffs?

[#C] Pre-search for new blog articles before invoking lazyblorg

State “SOMEDAY” from “TODO” [2014-02-01 Sat 15:36]

do a “egrep ‘^\*+ .*:blog:’ | wc -l” and compare with last number
- if changed, run lazyblorg
- if not changed, do nothing
does not work when same number of blog articles get deleted as created in between
probably add this to best practice or FAQs

add/create/include/handle short URLs for each entry

State “SOMEDAY” from “” [2013-08-20 Tue 10:56]

e.g.:
- www.example.com/blog/i/aB3 ->
- www.example.com/i/aB3 ->
generate short URL as hash from ID?
- is it possible without getting a high chance of conflicts?
  - YES:
    - use 4-letter-part of sha1-hash
    - before storing, check on conflict with existing one
      - use creation-date as first-come-first-serve
      - in case of conflict: add more sha1-letters to short-URL
[2013-08-29 Thu]: idea: www.example.com/s(.html)#ID
- one (long) HTML page with links to all pages
  - large space between entries such that entries can not be mixed up (showing multiple entries the same time)
  - disadvantage: user has to click on the URL of the article
- working: /index.shtml#realcontent
  - www.example.com/s.html#ID
- working: /#realcontent
  - www.example.com/s/#ID
  - shorter!

fixed entries by using a tag

State “SOMEDAY” from “” [2013-01-08 Tue 14:46]

outside of YYYY/MM/DD-hierarchy
e.g.
- tools I use
- books I read
- …

publish (only) free/busy times (in multiple formats)

State “SOMEDAY” from “” [2013-01-08 Tue 14:48]

CSS: round corners of images

State “SOMEDAY” from “NEXT” [2013-01-08 Tue 14:53]

probably steal from http://www.tbray.org/ongoing/

Tasker-script: share URL and send to my lazyblorg

State “SOMEDAY” from “” [2013-07-20 Sat 10:58]

open questions
- encryption
  - necessary? in the end, it gets public anyway :-)
- prevent “content injection”
  - PKI: signing with private GnuPG-key of phone device?
    - DoS-attack still possible
      - sending a lot of fake messages
  - synchronous password?
  - ?

re-generate only necessary entries/pages

State “SOMEDAY” from “” [2013-08-22 Thu 21:19]

switch from “delete everything and re-generate everything on every run” to “delete and re-generate only necessary entries/pages”

[ ] adopt docstring of compare_blog_metadata()

[#C] in order not to parse whole content, split up parsing

State “SOMEDAY” from “” [2013-08-21 Wed 11:58]

For optimizing performance and RAM usage: use two parsing processes:

find new or updated articles
- parse for used ID-links
- collect and store metadata of these (everything except content)
- print out warnings for all IDs that are broken links
- create creative 404-page for all broken links in the meantime
parse everything again and store only new or updated article contents
- match with ID-links

[#C] do not parse HTML template file if unchanged

State “SOMEDAY” from “” [2013-08-26 Mon 19:41]

not much of a performance difference
only a nice-to-have

implement bookmark RSS in lazyblorg

State “SOMEDAY” from “” [2014-01-20 Mon 19:33]

[ ] handle public/private tags accordingly (or: noexport?)
[ ] migrate delicious private field to private tag

test alternative methods to generate ATOM feeds

State “SOMEDAY” from [2016-11-06 Sun 17:44]

[ ] test cgi.escape http://stackoverflow.com/questions/1061697/whats-the-easiest-way-to-escape-html-in-python
- escapes <, >, and &
  cgi.escape is fine. It escapes:
  
  < to < > to > & to &
  
  That is enough for all HTML.
  
  EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:
```
data.encode('ascii', 'xmlcharrefreplace')
          
```
  Don’t forget to decode data to unicode first, using whatever encoding it was encoded.
  
  However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).
  
  Example:
  
  >>> cgi.escape(u’<a>bá</a>’).encode(‘ascii’, ‘xmlcharrefreplace’) ‘<a>bá</a>
  
  Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (“) so you can use the resulting value in a XML/HTML attribute.
  
  EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.
Debian/Python-modules to generate feed altogether (outsourcing feed generation)
- python-feedgenerator: https://github.com/dmdm/feedgenerator-py3k
  - moved to differen repository and has few contributors
- [ ] check out https://github.com/lkiesow/python-feedgen

from feedgen.feed import FeedGenerator
fg = FeedGenerator()
fg.id('http://lernfunk.de/media/654321')
fg.title('Some Testfeed')
fg.author( {'name':'John Doe','email':'john@example.de'} )
fg.link( href='http://example.com', rel='alternate' )
fg.logo('http://ex.com/logo.jpg')
fg.subtitle('This is a cool feed!')
fg.link( href='http://larskiesow.de/test.atom', rel='self' )
fg.language('en')

fe = fg.add_entry()
fe.id('http://lernfunk.de/media/654321/1')
fe.title('The First Episode')

atomfeed = fg.atom_str(pretty=True) # Get the ATOM feed as string
rssfeed  = fg.rss_str(pretty=True) # Get the RSS feed as string
fg.atom_file('atom.xml') # Write the ATOM feed to a file
fg.rss_file('rss.xml') # Write the RSS feed to a file

[ ] test http://stackoverflow.com/questions/174890/how-to-output-cdata-using-elementtree
[ ] test http://stackoverflow.com/questions/13694143/parsing-cdata-in-xml-with-python

lazyblorg: omit org.txt in search

Not scheduled, was “[2016-03-30 Wed]” on [2016-03-30 Wed 20:06]

Syntax: https://duck.co/help/results/syntax
statt https://duckduckgo.com/?q=foobar+site%3Akarl-voit.at
soll: https://duckduckgo.com/?q=foobar+site%3Akarl-voit.at+-filetype%3Aorg.txt
search: &site=karl-voit.at&
replace: &site=karl-voit.at+-filetype=org.txt&

This does not work very well:

html-results get hidden with -filetype… that are shown without this filter
there is weird difference between -filetype:txt -filetype:.txt -filetype:org.txt

Home

Using lazyblorg:

Coding: