-
Notifications
You must be signed in to change notification settings - Fork 34
Roadmap
This file holds notes, ideas, sketches, brainstorming-output related to the project. This is the «personal playground» of Karl and his ideas on lazyblorg.
NOTE: Most headings that are marked as DONE or CANCELED are moved to
the corresponding archive-file Roadmap.org_archive
which can’t be
redered at GitHub because GitHub does not recognize the file extension
as Org-mode yet. You have to check out the Git source of the Wiki.
This is a great idea anyway because of the limitations of the orgmode
visualization of GitHub Wiki, this file is best viewed in GNU/Emacs
directly (after downloading it): TODO
keywords, tags, drawers,…
- State “DONE” from “STARTED” [2013-08-20 Tue 15:02]
- problem:
- keep a sorted list of elements like [ [<time-stamp>,<id>], […] ]
- http://wiki.python.org/moin/HowTo/Sorting/
- sorting by index or named attribute!
Sorting by age using tuples:
>>> student_tuples = [
('john', 'A', 15),
('jane', 'B', 12),
('dave', 'B', 10),
]
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Sorting by attribute of a class:
>>> class Student:
def __init__(self, name, grade, age):
self.name = name
self.grade = grade
self.age = age
def __repr__(self):
return repr((self.name, self.grade, self.age))
>>> student_objects = [
Student('john', 'A', 15),
Student('jane', 'B', 12),
Student('dave', 'B', 10),
]
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
- State “DONE” from “STARTED” [2014-02-01 Sat 15:02]
- top: public voit banner (as usual)
- main content: 7 most recent blog entries
- only up to first HR or heading
- if HR/heading is found, add “read while article…” as link below
- only up to first HR or heading
- side-bar
- “about” (persistent page)
- about this blog
- SW being used
- how to follow
- link to FEED
- about Karl Voit
- github
- about this blog
- “tags” (persistent page)
- explaining why I am using tags
- auto-tags
- overview on tags
- 2011, 2012, 2013, …: yearly overview pages
- of all years that contain blog articles
- “follow me”: get updates via FEED (persistent page)
- explaining the methods I provide
- “about” (persistent page)
Defining the content in a special template heading:
- main page as “** Mainpage”
- side-bar as “*** Sidebar”
- list of elements
- HR as separator (as shown above)
- State “DONE” from “STARTED” [2014-02-01 Sat 15:03]
- see paper from [2014-01-31 Fri]
First list is an implicit article update list showing a changelog GH-issue
- Not scheduled, was “[2018-04-14 Sat]” on [2019-02-15 Fri 15:17]
- Rescheduled from “[2018-04-07 Sat]” on [2018-04-08 Sun 21:49]
- Rescheduled from “[2018-04-01 Sun]” on [2018-04-03 Tue 21:39]
- Rescheduled from “[2018-03-18 Sun]” on [2018-03-18 Sun 20:21]
[2018-03-28 Wed 13:01]
- https://github.com/tjko/jpegoptim
- https://www.garron.me/en/bits/optimize-compress-jpg-png.html
- How JPEG and PNG could be optimized
[2018-09-25 Tue 20:16]
- awesome talk https://www.youtube.com/watch?v=7kVeCqQCxlk&feature=youtu.be
- simple example: https://codepen.io/mor10/pen/QvmLpd
- https://gridbyexample.com/
- Not scheduled, was “[2018-05-21 Mon]” on [2018-11-12 Mon 14:26]
- Rescheduled from “[2018-05-20 Sun]” on [2018-05-20 Sun 23:33]
- Rescheduled from “[2018-05-13 Sun]” on [2018-05-13 Sun 17:53]
- Rescheduled from “[2018-04-14 Sat]” on [2018-05-10 Thu 12:28]
- Rescheduled from “[2018-04-07 Sat]” on [2018-04-08 Sun 21:50]
- Rescheduled from “[2018-04-01 Sun]” on [2018-04-03 Tue 21:39]
- Rescheduled from “[2018-03-18 Sun]” on [2018-03-18 Sun 20:21]
- Rescheduled from “[2018-03-14 Wed]” on [2018-03-13 Tue 23:21]
- Rescheduled from “[2018-03-03 Sat]” on [2018-03-03 Sat 20:41]
- Rescheduled from “[2018-02-25 Sun]” on [2018-02-25 Sun 21:31]
- Rescheduled from “[2018-02-11 Sun]” on [2018-02-11 Sun 21:06]
- Rescheduled from “[2018-02-04 Sun]” on [2018-02-04 Sun 10:09]
- Rescheduled from “[2018-01-20 Sat]” on [2018-01-21 Sun 11:47]
- Rescheduled from “[2018-01-14 Sun]” on [2018-01-14 Sun 19:22]
- Rescheduled from “[2018-01-07 Sun]” on [2018-01-07 Sun 18:55]
- Rescheduled from “[2018-01-01 Mon]” on [2018-01-03 Wed 17:16]
- Rescheduled from “[2017-12-30 Sat]” on [2017-12-30 Sat 22:50]
- Suche in Schleife von vorne weg, bis
- nur ein eindeutiger Hit gefunden wird oder
- keiner mehr gefunden wird oder
- der String zu Ende ist
- issue: when an article is tagged twice with “foo”, it appears twice in the tag page and so forth
file:~/src/lazyblorg/testdata/end_to_end_test/result/2017/09/30/link-test/index.html
- link-only feeds
- Not scheduled, was “[2017-08-04 Fri]” on [2017-08-04 Fri 11:13]
- Rescheduled from “[2017-08-01 Tue]” on [2017-08-02 Wed 09:08]
- Rescheduled from “[2017-07-26 Wed]” on [2017-07-28 Fri 08:50]
- Rescheduled from “[2017-07-21 Fri]” on [2017-07-22 Sat 08:28]
- Rescheduled from “[2017-07-20 Thu]” on [2017-07-21 Fri 08:02]
- https://www.baty.net/index.xml
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Jack Baty's Blog</title>
<link>https://www.baty.net/</link>
<description>Recent content on Jack Baty's Blog</description>
<generator>Hugo -- gohugo.io</generator>
<language>en</language>
<lastBuildDate>Mon, 22 Oct 2018 07:34:52 -0400</lastBuildDate>
<atom:link href="https://www.baty.net/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Selecting a content directory with Easy-Hugo</title>
<link>https://www.baty.net/2018/selecting-a-content-directory-with-easy-hugo/</link>
<pubDate>Mon, 22 Oct 2018 07:34:52 -0400</pubDate>
<guid>https://www.baty.net/2018/selecting-a-content-directory-with-easy-hugo/</guid>
<description><p><a href="https://github.com/masasam/emacs-easy-hugo">Easy Hugo</a> is a handy Emacs mode for posting to Hugo-based blogs. One difficulty I had was that I have many content directories, and easy hugo only included methods for moving through directories one at a time.</p>
<p>In a related bug report, I suggested that being able to quickly select the content directory would be useful, and the author, <a href="https://github.com/masasam">masasam</a>, just <a href="https://github.com/masasam/emacs-easy-hugo/issues/42#issuecomment-431795287">added the feature</a>. Works great.</p>
<p><img src="https://www.baty.net/img/2018/2018-10-22_easy-hugo-select-postdir.png" alt="screenshot" /></p>
</description>
</item>
</channel>
</rss>
- https://github.com/gohugoio/hugo/issues/3473
- it’s better to drop RSS support and use Atom → Arguments listed!
- https://github.com/lingxz/er
- a Hugo theme implementing Atom feeds
- https://github.com/novoid/lazyblorg/issues/10
- current situation
- DOMAIN/tags/tag1
- DOMAIN/tags/tag2
- DOMAIN/tags/tag3
- this feature provides:
- DOMAIN/tags/tag1/tag2
- DOMAIN/tags/tag1/tag3
- DOMAIN/tags/tag2/tag1
- DOMAIN/tags/tag3/tag1
- … when there are:
- articles tagged with tag1, tag2, and tag3
- no articles tagged with tag2 and tag3 combined
- tag cloud with links whose names are: “tag1+tag2” “tag3+tag2” “tag4+tag2” … on page of tag2 (and so on)
- list of all articles that have (at least) the matching tags
- [ ] set level of depth in config.py; default = 1
tagstore proof of concept: tagstore_add_entry.py
## generate all permutations
def generate_permutation(str):
if len(str) <=1:
yield str
else:
for perm in generate_permutation(str[1:]):
for i in range(len(perm)+1):
yield perm[:i] + str[0:1] + perm[i:]
tagstore > store.py
def __build_store_navigation(self, link_name, tag_list, current_path):
"""
builds the whole directory and link-structure (describing & categorising nav path) inside a stores filesystem
"""
link_source = self.__watcher_path + "/" + link_name
for tag in tag_list:
self.__file_system.create_dir(current_path + "/" + tag)
self.__file_system.create_link(link_source, current_path + "/" + tag + "/" + link_name)
recursive_list = [] + tag_list
recursive_list.remove(tag)
self.__build_store_navigation(link_name, recursive_list, current_path + "/" + tag)
It holds a random ID. No need for the orgmode-id on the entry page.
“language:english” should point to tag page of “language” (and not “language:english”)
- Not scheduled, was “[2016-08-06 Sat]” on [2016-11-18 Fri 20:22]
- Rescheduled from “[2016-02-08 Mon]” on [2016-07-23 Sat 12:20]
- id:implemented-org-elements
- why
- list of data strutures is getting longer and longer
- adding a new data structure is getting tedious: unit tests, …
- simplification?
- data structures to add to single data structure-dict:
- blog_data
- metadata
- FIXXME: more?
analyze and steal CSS for columns from URL
- State “DONE” from “NEXT” [2015-06-27 Sat 19:06]
- http://endlessparentheses.com/endless.css
- interesting part is marked with
/* Responsiveness */
- reponsive design: different settings with different height/width:
/* Responsiveness */ @media (max-height: 34rem) { .left-sidebar-ad {display:none;} .se-flair {display:none;} } @media (min-height: 34.1rem) { .left-sidebar-ad {display:initial;} .se-flair {display:initial;} } @media (max-width: 62.2rem) { .post-ad-mobile { display: initial; width: 320px; height: 100px; margin-left: auto; margin-right: auto; } .post-ad {display: none;} .left-sidebar-ad {display:none;} .masthead-links { margin-top: .3rem; margin-left: 0; margin-right: 0; } .masthead-links li { margin:0; margin-bottom:.3rem; text-align: center; font-size:100%; width:32%; display:none; } /* Only 3 links fit in mobile. */ .masthead-links li:nth-child(1), .masthead-links li:nth-child(2), .masthead-links li:nth-child(3) { display:inline-block; } .pagination { width: 99%; } } @media (min-width: 62.3rem) { .container { /* overflow:hidden; */ max-width: 52rem; /* width:34rem; */ margin-left: auto; margin-right: auto; /* background-color:red; */ } .masthead { margin-top:2rem; margin-right:11rem; margin-bottom: 0; /* text-align:right; */ position:fixed; top:0; right:50%; display:block; width:13rem; height:100%; } .post-with-comments { /* float:left; */ margin-left:20rem; /* width:34rem; */ display:inline-block; /* overflow:scroll; */ } .post-ad-mobile {display: none;} } @media (min-width: 76.3rem) { .container { /* overflow:hidden; */ max-width: 31rem; /* width:34rem; */ margin-left: auto; margin-right: auto; /* background-color:red; */ } .post-with-comments { margin-left: 0; margin-right: 0; } .right-sidebar { font-family: "Droid Serif", serif; padding-bottom: 0; margin-right: 0rem; margin-top:1.3rem; margin-left:20rem; margin-bottom: 0; /* text-align:right; */ position:fixed; top:0; left:50%; display:inline-block; width:15rem; height:100%; /* background-color:blue; */ } .masthead { /* background-color:blue; */ margin-right:20rem; /* text-align:right; */ } }
- reponsive design: different settings with different height/width:
- interesting part is marked with
- only if the font is not loaded from a third party server (privacy!)
- [X] find out how to set this font
- https://kungsgeten.github.io/static_about.html
This blog is created with Emacs using the amazing org-mode. Most of the text is set in Alegreya — except for code snippets and other monospace text, which is set in Cousine. Much of the typography — the use of sidenotes and sections separated by whitespace, starting the sentence with small caps — is inspired by Edward Tufte’s work.
- https://www.google.com/fonts/specimen/Alegreya
- https://kungsgeten.github.io/static_about.html
- [ ] Is it possible to host this font on my own server?
- https://www.google.com/fonts#UsePlace:use/Collection:Alegreya
- this uses Google servers
- [ ] https://developers.google.com/fonts/
- [ ] http://michaelboeke.com/blog/2013/09/10/Self-hosting-Google-web-fonts/
- [ ] https://github.com/majodev/google-webfonts-helper
- https://www.google.com/fonts#UsePlace:use/Collection:Alegreya
- https://www.phase2technology.com/blog/implementing-an-estimated-read-time-on-articles/
- http://cs.stackexchange.com/questions/57285/how-to-calculate-an-accurate-estimated-reading-time-of-text
- algorithm
- http://niram.org/read/
- 200 words/min
- http://marketingland.com/estimated-reading-times-increase-engagement-79830
- «Research varies, but generally, the average adult reads 200-250 words in one minute.»
- arguments against using this feature (makes people angry)
- it was replaced by pypandoc
- Not scheduled, was “[2016-02-26 Fri]” on [2016-02-26 Fri 18:49]
- Rescheduled from “[2015-10-24 Sat]” on [2016-02-14 Sun 08:49]
- Rescheduled from “[2015-09-25 Fri]” on [2015-09-24 Thu 19:58]
- Rescheduled from “[2015-07-25 Sat]” on [2015-07-25 Sat 18:37]
- Rescheduled from “<2015-07-19 Sun>” on [2015-07-20 Mon 19:19]
- Rescheduled from “<2015-07-17 Fri>” on [2015-07-17 Fri 18:57]
- Rescheduled from “<2015-06-28 Sun>” on [2015-07-11 Sat 11:51]
- why?
- Org-syntax elements like lists or tables are hard to parse and htmlize correctly
- third party Org-mode parser and htmlizer would be great
- need to check third party library using unit tests!
- this attempt:
- keep control of the basic parsing/htmlizing process
- convert blocks using pypandoc library
- reduce complexity of current parser/htmlizer
- process
- [X] get a rough overview what needs to be changed
- [X] write unit-test for basic test of pypandoc!
- [X] document pypandoc requirement and its test
- [X] include test_pypandoc.py in testall
- [X] thinking of: keeping parser/htmlizer and using pypandoc only
for hard to parse/htmlize blocks?
- [X] test naïve pypandoc lazyblorg for public-voit before
- decision: I stick to my own parser so far and might convert some Org-mode syntax elements to pypandoc later on
- [ ] data model (file_blog_data)
- simplified: instead of blocks and so on, there will only be meta-data from the headings and blobs of Org-mode blocks
- decision: not feasible because I need to much insider-information on the blocks in order to do all the magic
- [X] implement exception handling when pypandoc is not found/installed
- [X] implement pypandoc_test.py testcase with all used Org-mode syntax elements
- [-] implement tables using pypandoc
- in order to get experience of the possibilities
- don’t forget sanitizing
- [X] write parser
- [X] write parser tests
- [X] write htmlizer
- [X] write htmlizer tests
- [ ] add CSS for tables
- [-] implement lists using pypandoc
- don’t forget sanitizing
- [X] write parser
- [ ] write parser tests
- [X] write htmlizer
- [X] write htmlizer tests with complicated lists
- [ ] add CSS for lists
- [X] pypandoc as fall-back for any content which has no special treatment
- lazyblorg.py
- [ ] generate_output()
- using a different output module!
- pandocizer.py (or similar)
- using a different output module!
- [ ] keep old orgparser for templates?
- [ ] generate_output()
- htmlizer.py -copy-> pandocizer.py (or similar)
- [ ] feed generator definitions
- feedentry += ‘\n’.join(blog_data_entry[‘content’])
- feedentry += ‘\n’.join(blog_data_entry[‘htmlteaser’])
- [ ] generate_entry_page()
- [ ] sanitize_and_htmlize_blog_content() -> completely obsolete?
- [ ] htmlize_simple_text_formatting() -> completely obsolete?
- [ ] sanitize_html_characters() -> completely obsolete?
- [ ] _generate_temporal_article()
- [ ] _generate_persistent_article()
- [ ] feed generator definitions
- https://github.com/novoid/lazyblorg/issues/5
- see id:2014-05-09-managing-digital-photographs
All portrait photographs are rotated using [[http://www.sentex.net/~mwandel/jhead/][jhead]]. Also with jhead, I generate file-name time-stamps from the Exif header time-stamps. Using [[https://github.com/novoid/date2name][date2name]] I add time-stamps also to the movie files. After processing all those files, they get moved to the destination folder for new digicam files: ~$HOME/tmp/digicam/tmp/~.
… will be transformed into:
… which is wrong
- [X] time-ordered by last modification (for FEED and main page)
- Newest entry of entry[‘finished-timestamp-history’] is the time-stamp of the last update
- for each entry in entries
- get newest entry of entry[‘finished-timestamp-history’]
- store to a sorted list (newest first or last)
- [ ] time-ordered by issue day (for overview pages)
- Oldest entry of entry[‘finished-timestamp-history’] is the publication time-stamp!
- for each entry in entries
- get oldest entry of entry[‘finished-timestamp-history’]
- store to a sorted list (newest first or last)
- [ ] define, what “archived entries” is
- tag :ARCHIVE:
- file.org_archive
- FIXXME
- [ ] check if archived tag gets removed
- [ ] add command line parameter for adding archived entries
- by default, archived entries do not get added to the blog
- https://github.com/novoid/lazyblorg/issues/5
http://sd.wareonearth.com/~phil/xdu/examp1.gif
… gets messed up to:
http://sd.wareonearth.com/</code>phil/xdu/examp1.gif
on https://karl-voit.at/2014/03/25/xdu
- [ ] add unit test to htmlizer
- [ ] fix bug
- [ ] test
- https://github.com/novoid/lazyblorg/issues/6
- 10 most frequently used tags (with occurrence)
- 10 leastd frequently used tags (with occurrence; if occurrence == 1 → show link to article)
- Rescheduled from “[2016-05-22 Sun]” on [2016-05-22 Sun 18:58]
- Rescheduled from “[2016-05-15 Sun]” on [2016-05-19 Thu 13:42]
- Rescheduled from “[2016-05-13 Fri]” on [2016-05-14 Sat 10:34]
[[Public Voit]] > Archive [from year of oldest entry to year of newest entry] | | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | | Jan | | | | | | | | Feb | | | | | | | | Mar |[[1]] | | | | | | | Apr | | | | | | | | May |[[5]] | | | | | | | Jun | | | | | | | | Jul | | | | | | | | Aug |[[2]] | | | | | | | Sep | | | | | | | | Oct | | | | | | | | Nov | | | | | | | | Dec | | | | | | | | | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | | 2009 | | |[[1]]| |[[5]]| | |[[2]]| | | | | | 2010 | | | | | | | | | | | | | | 2011 | | | | | | | | | | | | | | 2012 | | | | | | | | | | | | | | 2013 | | | | | | | | | | | | | | 2014 | | | | | | | | | | | | |
- tasks
- [ ] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python
[[Public Voit]] > [[Archive]]: 2014 January: 2 [is link to monthly overview] February: March: 4 ...
- tasks
- [ ] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python
- Not scheduled, was “2014-03-01 Sat” on [2014-03-01 Sat 21:01]
[[Public Voit]] > [[2014]] - 01 - 2014-01-17: Title of the blog article - 2014-01-21: Another title
- tasks
- [X] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python
- Not scheduled, was “2014-03-01 Sat” on [2014-03-01 Sat 21:01]
like monthly overview but only for the day
- tasks
- [ ] create blog-format.org entries with HTML source and replacement entities
- [ ] implement in Python
- https://docs.python.org/2/library/calendar.html
- why?
- to re-direct an old ID/entry when there is a new one
- to enable short jump-pages like id:aproject-about -> id:2016-10-04-foo-bar-project-about-page
- idea
- entry with title «redirect: id:foo-bar» results in a simple (minimal) redirect page
- [X] how to generate a simple redirect page?
- http://stackoverflow.com/questions/5411538/redirect-from-an-html-page
- http://karl-voit.at/test.html
- [ ] implement in lazyblorg
- https://github.com/novoid/lazyblorg/issues/4
- example: id:2015-05-24-browser-keywords
- filter Org-mode articles with parameter of one or more tags
- allows for generating different blogs (or sub-blogs) just with different commands
- htmlizer.py -> “## FIXXME: replace pre with suitable source code environment!”
- sub-headings within blog articles can have ID-property as well
- parser indexes those IDs
- HTML template adds anchor-ID
- sanitize internal links resolves those links as well
- I want to
- refer to any ID of any blog article heading or blog article sub-heading using the same method:
[[id:any-id][anchor text]]
- lazyblorg has to be able to derive following according to any ID:
- get the URL of a blog entry
- get the ID/HREF of any sub-heading of any blog entry
- heading gets a blog entry with a unique :ID:
- setting “Update 1/2/3/…” for each one of those:
:LOGBOOK: - State "DONE" from "NEXT" [2011-10-07 Fri 15:40] :END:
- ALTERNATIVELY: set “Update YYYY-MM-DD for last one of those (from above)
- heading with known unique ID and no state DONE
- should stay the same until state changes back to DONE
- this requires something which remembers states
- this requires keeping old entries
- body:
- manual section:
- Updates:
- YYYY-MM-DD: short description
- YYYY-MM-DD: short description
- Updates:
- manual section:
see also id:2012-11-06-ago-generating
- e.g., publish new stuff on a “public-voit”-Twitter-account
- probably there is a cloud service that translates RSS to Twitter?
- probably more RSS-to-something-translators?
- historic context
- YYYY-MM-DD -> links to Wikipedia-entries of days
- https://en.wikipedia.org/wiki/Portal:Current_events/2010_August_26
- auto-tags are visually separated from manual tags to make it clear that they are automatically generated (and might be bogus sometime)
- [ ] add to about-page
- [ ] add to documentation (README, …)
- [ ] syntax of auto-tags
- or only highlight with different background color of “tags”?
- feeds for auto-tags
- [ ] feeds/lazyblorg-shorts.*
- [ ] feeds/lazyblorg-deutsch.*
- [ ] feeds/lazyblorg-english.*
- State “DONE” from [2016-11-16 Wed 22:08]
- lang-de, de, en, us, … ?
- language tag is automatically derived
- by guessing language based on common stopwords or external library
- State “DONE” from “” [2015-05-06 Wed 11:26]
from nltk.corpus import stopwords
cachedStopWords = stopwords.words("english")
def testFuncNew():
text = 'hello bye the the hi'
text = ' '.join([word for word in text.split() if word not in cachedStopWords])
if __name__ == "__main__":
testFuncNew()
- [ ] return percentage of stopwords for list of known languages
- State “DONE” from “NEXT” [2015-05-09 Sat 10:42]
- http://www.nltk.org/api/nltk.corpus.html
- http://www.nltk.org/install.html
- http://www.nltk.org/data.html
installing nltk:
root@gary ~ # pip install -U nltk Downloading/unpacking nltk Downloading nltk-3.0.2.tar.gz (991Kb): 991Kb downloaded Running setup.py egg_info for package nltk warning: no files found matching 'Makefile' under directory '*.txt' warning: no previously-included files matching '*~' found anywhere in distribution Installing collected packages: nltk Running setup.py install for nltk warning: no files found matching 'Makefile' under directory '*.txt' warning: no previously-included files matching '*~' found anywhere in distribution Successfully installed nltk Cleaning up... root@gary ~ #
installing nltk.corpus > stopwords:
- ipython
- import nltk
- nltk.download()
- manually selecting corpus
- second or third tab > stopwords
- done
… way to complicated for other lazyblorg-users for just the stopwords!
cachedStopWords = stopwords.words("english") cachedStopWords.length() ## -> 127
For German, it’s 231 stopwords :-O
Note to myself: use this as argument on the broader variety of the German language compared to English :-)
Extract a sub-set of those stopwords and store it directly.
Determine words that occur as English and German stopwords:
In [16]: [x for x in cachedStopWordsde if x in cachedStopWords] Out[16]: [u'am', u'an', u'in', u'so', u'was', u'will']
English stopwords without common German ones:
[u'I',
u'me',
u'my',
u'myself',
u'we',
u'our',
u'ours',
u'ourselves',
u'you',
u'your',
u'yours',
u'yourself',
u'yourselves',
u'he',
u'him',
u'his',
u'himself',
u'she',
u'her',
u'hers',
u'herself',
u'it',
u'its',
u'itself',
u'they',
u'them',
u'their',
u'theirs',
u'themselves',
u'what',
u'which',
u'who',
u'whom',
u'this',
u'that',
u'these',
u'those',
u'is',
u'are',
u'were',
u'be',
u'been',
u'being',
u'have',
u'has',
u'had',
u'having',
u'do',
u'does',
u'did',
u'doing',
u'a',
u'the',
u'and',
u'but',
u'if',
u'or',
u'because',
u'as',
u'until',
u'while',
u'of',
u'at',
u'by',
u'for',
u'with',
u'about',
u'against',
u'between',
u'into',
u'through',
u'during',
u'before',
u'after',
u'above',
u'below',
u'to',
u'from',
u'up',
u'down',
u'on',
u'off',
u'over',
u'under',
u'again',
u'further',
u'then',
u'once',
u'here',
u'there',
u'when',
u'where',
u'why',
u'how',
u'all',
u'any',
u'both',
u'each',
u'few',
u'more',
u'most',
u'other',
u'some',
u'such',
u'no',
u'nor',
u'not',
u'only',
u'own',
u'same',
u'than',
u'too',
u'very',
u'can',
u'just',
u'don',
u'should',
u'now']
German stopwords without common English ones:
[u'aber',
u'alle',
u'allem',
u'allen',
u'aller',
u'alles',
u'als',
u'also',
u'ander',
u'andere',
u'anderem',
u'anderen',
u'anderer',
u'anderes',
u'anderm',
u'andern',
u'anderr',
u'anders',
u'auch',
u'auf',
u'aus',
u'bei',
u'bin',
u'bis',
u'bist',
u'da',
u'damit',
u'dann',
u'der',
u'den',
u'des',
u'dem',
u'die',
u'das',
u'da\xdf',
u'derselbe',
u'derselben',
u'denselben',
u'desselben',
u'demselben',
u'dieselbe',
u'dieselben',
u'dasselbe',
u'dazu',
u'dein',
u'deine',
u'deinem',
u'deinen',
u'deiner',
u'deines',
u'denn',
u'derer',
u'dessen',
u'dich',
u'dir',
u'du',
u'dies',
u'diese',
u'diesem',
u'diesen',
u'dieser',
u'dieses',
u'doch',
u'dort',
u'durch',
u'ein',
u'eine',
u'einem',
u'einen',
u'einer',
u'eines',
u'einig',
u'einige',
u'einigem',
u'einigen',
u'einiger',
u'einiges',
u'einmal',
u'er',
u'ihn',
u'ihm',
u'es',
u'etwas',
u'euer',
u'eure',
u'eurem',
u'euren',
u'eurer',
u'eures',
u'f\xfcr',
u'gegen',
u'gewesen',
u'hab',
u'habe',
u'haben',
u'hat',
u'hatte',
u'hatten',
u'hier',
u'hin',
u'hinter',
u'ich',
u'mich',
u'mir',
u'ihr',
u'ihre',
u'ihrem',
u'ihren',
u'ihrer',
u'ihres',
u'euch',
u'im',
u'indem',
u'ins',
u'ist',
u'jede',
u'jedem',
u'jeden',
u'jeder',
u'jedes',
u'jene',
u'jenem',
u'jenen',
u'jener',
u'jenes',
u'jetzt',
u'kann',
u'kein',
u'keine',
u'keinem',
u'keinen',
u'keiner',
u'keines',
u'k\xf6nnen',
u'k\xf6nnte',
u'machen',
u'man',
u'manche',
u'manchem',
u'manchen',
u'mancher',
u'manches',
u'mein',
u'meine',
u'meinem',
u'meinen',
u'meiner',
u'meines',
u'mit',
u'muss',
u'musste',
u'nach',
u'nicht',
u'nichts',
u'noch',
u'nun',
u'nur',
u'ob',
u'oder',
u'ohne',
u'sehr',
u'sein',
u'seine',
u'seinem',
u'seinen',
u'seiner',
u'seines',
u'selbst',
u'sich',
u'sie',
u'ihnen',
u'sind',
u'solche',
u'solchem',
u'solchen',
u'solcher',
u'solches',
u'soll',
u'sollte',
u'sondern',
u'sonst',
u'\xfcber',
u'um',
u'und',
u'uns',
u'unse',
u'unsem',
u'unsen',
u'unser',
u'unses',
u'unter',
u'viel',
u'vom',
u'von',
u'vor',
u'w\xe4hrend',
u'war',
u'waren',
u'warst',
u'weg',
u'weil',
u'weiter',
u'welche',
u'welchem',
u'welchen',
u'welcher',
u'welches',
u'wenn',
u'werde',
u'werden',
u'wie',
u'wieder',
u'wir',
u'wird',
u'wirst',
u'wo',
u'wollen',
u'wollte',
u'w\xfcrde',
u'w\xfcrden',
u'zu',
u'zum',
u'zur',
u'zwar',
u'zwischen']
- State “CANCELLED” from “NEXT” [2015-05-09 Sat 10:42]
nltk install overhead too complicated just for the stopword lists
- State “CANCELLED” from “NEXT” [2015-05-09 Sat 10:43]
nltk install overhead too complicated just for the stopword lists
- State “DONE” from “NEXT” [2015-05-09 Sat 10:49]
- https://github.com/nltk/nltk/wiki/FAQ
- ” The corpora are distributed under various licenses, as
documented in their respective README files.”
- locate: file:~/nltk_data/corpora/stopwords/README
- “They were obtained from: http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/”
- locate: file:~/nltk_data/corpora/stopwords/README
- ” The corpora are distributed under various licenses, as
documented in their respective README files.”
- State “DONE” from “NEXT” [2015-05-09 Sat 12:17]
see id:2015-05-09-test-nltk for stopwords and extracting the lists
- State “DONE” from “NEXT” [2015-05-09 Sat 12:17]
- State “DONE” from “NEXT” [2015-05-09 Sat 18:47]
- State “DONE” from “NEXT” [2015-05-09 Sat 18:47]
- e.g., articles whose preview is equal to the whole article (i.e.: no sub-heading, no horizontal line) are marked with autotag-shorty (or similar)
- oneliners
- below a certain threshold
- middlesize(sic?)
- between oneliners and fullsizeentries
- fullsize(sic?)
compare: id:autotag-estimated-time-to-read
- hooks for :TAGS: (can be optional) or case-sensitive keywords in headings
- if found:
- link to a special pre-defined page
- Example: if “What The World Needs”|”WTWN:” is found, link to a page where WTWN-series is described in general.
- Not scheduled, was “<2015-05-09 Sat>” on [2015-05-09 Sat 22:49]
- [X] write class information
- [ ] different CSS format for the two classes
- probably in sidebar?
Like “[#A]”.
- possible ideas
- ignore priorities
- suppress!
- convert into given tags (“important”, …)
- ignore priorities
- css.org with Comments and css-blocks
example-CSS content
- automatically extracting CSS code from that Org-mode file
- example: http://www.tbray.org/ongoing/When/201x/2011/04/21/Reflowing
backward compatibility for old browsers:
section, article, header, footer, nav, aside, hgroup {
display: block;
}
- add JavaScript to be able to sort by column
- possible cnadidates for methods
- http://tablesorter.com/docs/
- HTML5 (?)
- CSS: http://www.cssjuice.com/16-sortable-table-techniques/
- for lists
- http://farhadi.ir/projects/html5sortable/
- example: http://www.tbray.org/ongoing/When/200x/2006/04/08/Picture-Frames
- caution: that’s Java
- State “SOMEDAY” from [2018-09-30 Sun 19:55]
- motivation
- use lazyblorg as a replacement for Twitter/Mastodon
- ideas:
- generate ID and title/URL from time
- e.g., 2018-09-23T15:49
- ID: 2018-09-23T15:49
- URL: http://karl-voit.at/2018/09/23/15.49/
- e.g., 2018-09-23T15:49
- generate ID and title/URL from time
- State “SOMEDAY” from [2018-02-16 Fri 15:33]
- Source
-
Using Emacs 41 Pandoc: https://pandoc.org/filters.html
- AST (Abstract Syntax Tree) of the internal format of pandoc usable for external tools like lazyblorg
- Could be done even in Python! :-)
-
Using Emacs 41 Pandoc: https://pandoc.org/filters.html
- Advantages
- Parsing would be “outsourced” to pandoc
- Disadvantages
- Large dependency on pandoc and its internal structure and code
- New approach using filters on the already parsed AST
- Not sure if everything from the current approach could be mapped to the new one: misc “layers” of replacements are done in different stages of org-parser and htmlizer.
- lazyblorg would be an add-on to pandoc(?)
- Not sure about that though.
- State “SOMEDAY” from [2017-07-26 Wed 00:18]
- link-only feeds
- tag combinations always alphabetically ordered
- not foo_bar but bar_foo
- so that any user can guess them
- instead of
- finding the links
- generating a multifold of feeds (bar_foo and foo_bar) with same content
- instead of
- State “SOMEDAY” from “NEXT” [2017-07-26 Wed 13:09]
- http://www.doi.org/faq.html
- «register via a DOI Registration Agency (RA)»
- annual fees: example by crossref
- no free service
- Wordpress-plugin which provides DOIs
- State “SOMEDAY” from “NEXT” [2017-07-26 Wed 13:09]
- Erklärung: http://www.heise.de/newsticker/foren/S-Re-Find-ich-schon-Re-Keine-gute-Idee/forum-288901/msg-26149283/read/
- http://www.heise.de/ct/ausgabe/2014-26-Social-Media-Buttons-datenschutzkonform-nutzen-2463330.html
- https://github.com/heiseonline/shariff
- State “SOMEDAY” from “NEXT” [2017-02-12 Sun 11:01]
- contrast of tag colour
- last menioning of tag
- light gray: every tag that was not mentioned in the recent two years
- several steps of grayiness/black from two years until now (or most recent blog article)
- [ ] need for a legend?
- [ ] implement “taglist_with_update_date(create/update)”
- State “SOMEDAY” from “” [2015-07-14 Tue 17:22]
- Bootstrap: outsourcing my poor CSS knowledge to experts
- https://en.wikipedia.org/wiki/Bootstrap_%28front-end_framework%29
- http://getbootstrap.com/
- http://getbootstrap.com/examples/blog/
- http://getbootstrap.com/
- http://prideparrot.com/blog/archive/2014/4/blog_template_using_twitter_bootstrap3_part1
- verbose how-to from ground up!
- almost too verbose :-(
- looks great!
- verbose how-to from ground up!
- http://erjjones.github.io/blog/How-I-built-my-blog-in-one-day/
- with Jekyll
- https://en.wikipedia.org/wiki/Bootstrap_%28front-end_framework%29
downsides:
- add an external dependency
- add complexity (I may not need?)
- current HTML has to be re-designed to fit Bootstrap
- I still need to understand something in order to adapt it to my needs
- State “SOMEDAY” from “STARTED” [2017-01-06 Fri 22:59]
- State “SOMEDAY” from “NEXT” [2017-06-05 Mon 16:07]
- “lbimg:image.png”
- works in Orgmode using custom link to valid folder
- lazyblorg recognizes it and translates it to img
- show a fixed maximum width/height image
- probably with a magnifying glass and a plus symbol in its lower right corner
- show the big version when clicking on it
- see Kröner2011 p.140ff for HTML5 and figure/caption
- handle old HTTP-ATTR lines and new Org-mode HTTP attributes
- just link a file, do not show image
- show the linked image directly
- State “SOMEDAY” from [2016-11-02 Wed 16:58]
- https://twitter.com/stefan2904/status/793832217940140033
- IMPORTANT: https://google.github.io/styleguide/htmlcssguide.xml?showone=Optional_Tags#Optional_Tags
- State “SOMEDAY” from “NEXT” [2015-06-21 Sun 11:39]
- I store bookmarks according to Managing web bookmarks with Org-mode
- Idea: create short (minimal) pages per bookmark
- State “SOMEDAY” from “NEXT” [2015-06-21 Sun 11:40]
- State “SOMEDAY” from “” [2014-02-28 Fr 09:27]
- should be possible because lazyblorg stores old raw content and gets new one
- [ ] what happens in case of re-generation blog with old diffs?
- State “SOMEDAY” from “TODO” [2014-02-01 Sat 15:36]
- do a “egrep ‘^\*+ .*:blog:’ | wc -l” and compare with last number
- if changed, run lazyblorg
- if not changed, do nothing
- does not work when same number of blog articles get deleted as created in between
- probably add this to best practice or FAQs
- State “SOMEDAY” from “” [2013-08-20 Tue 10:56]
- e.g.:
- www.example.com/blog/i/aB3 ->
- www.example.com/i/aB3 ->
- generate short URL as hash from ID?
- is it possible without getting a high chance of conflicts?
- YES:
- use 4-letter-part of sha1-hash
- before storing, check on conflict with existing one
- use creation-date as first-come-first-serve
- in case of conflict: add more sha1-letters to short-URL
- YES:
- is it possible without getting a high chance of conflicts?
- [2013-08-29 Thu]: idea: www.example.com/s(.html)#ID
- one (long) HTML page with links to all pages
- large space between entries such that entries can not be mixed up (showing multiple entries the same time)
- disadvantage: user has to click on the URL of the article
- working: /index.shtml#realcontent
- www.example.com/s.html#ID
- working: /#realcontent
- www.example.com/s/#ID
- shorter!
- one (long) HTML page with links to all pages
- State “SOMEDAY” from “” [2013-01-08 Tue 14:46]
- outside of YYYY/MM/DD-hierarchy
- e.g.
- tools I use
- books I read
- …
- State “SOMEDAY” from “” [2013-01-08 Tue 14:48]
- State “SOMEDAY” from “NEXT” [2013-01-08 Tue 14:53]
- probably steal from http://www.tbray.org/ongoing/
Tasker-script: share URL and send to my lazyblorg
- State “SOMEDAY” from “” [2013-07-20 Sat 10:58]
- open questions
- encryption
- necessary? in the end, it gets public anyway :-)
- prevent “content injection”
- PKI: signing with private GnuPG-key of phone device?
- DoS-attack still possible
- sending a lot of fake messages
- DoS-attack still possible
- synchronous password?
- ?
- PKI: signing with private GnuPG-key of phone device?
- encryption
- State “SOMEDAY” from “” [2013-08-22 Thu 21:19]
switch from “delete everything and re-generate everything on every run” to “delete and re-generate only necessary entries/pages”
- [ ] adopt docstring of compare_blog_metadata()
- State “SOMEDAY” from “” [2013-08-21 Wed 11:58]
For optimizing performance and RAM usage: use two parsing processes:
- find new or updated articles
- parse for used ID-links
- collect and store metadata of these (everything except content)
- print out warnings for all IDs that are broken links
- create creative 404-page for all broken links in the meantime
- parse everything again and store only new or updated article contents
- match with ID-links
- State “SOMEDAY” from “” [2013-08-26 Mon 19:41]
- not much of a performance difference
- only a nice-to-have
- State “SOMEDAY” from “” [2014-01-20 Mon 19:33]
- [ ] handle public/private tags accordingly (or: noexport?)
- [ ] migrate delicious private field to private tag
- State “SOMEDAY” from [2016-11-06 Sun 17:44]
- [ ] test cgi.escape http://stackoverflow.com/questions/1061697/whats-the-easiest-way-to-escape-html-in-python
- escapes <, >, and &
cgi.escape is fine. It escapes:
< to < > to > & to &
That is enough for all HTML.
EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:
data.encode('ascii', 'xmlcharrefreplace')
Don’t forget to decode data to unicode first, using whatever encoding it was encoded.
However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).
Example:
>>> cgi.escape(u’<a>bá</a>’).encode(‘ascii’, ‘xmlcharrefreplace’) ‘<a>bá</a>
Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (“) so you can use the resulting value in a XML/HTML attribute.
EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.
- escapes <, >, and &
- Debian/Python-modules to generate feed altogether (outsourcing feed generation)
- python-feedgenerator: https://github.com/dmdm/feedgenerator-py3k
- moved to differen repository and has few contributors
- [ ] check out https://github.com/lkiesow/python-feedgen
- python-feedgenerator: https://github.com/dmdm/feedgenerator-py3k
from feedgen.feed import FeedGenerator
fg = FeedGenerator()
fg.id('http://lernfunk.de/media/654321')
fg.title('Some Testfeed')
fg.author( {'name':'John Doe','email':'john@example.de'} )
fg.link( href='http://example.com', rel='alternate' )
fg.logo('http://ex.com/logo.jpg')
fg.subtitle('This is a cool feed!')
fg.link( href='http://larskiesow.de/test.atom', rel='self' )
fg.language('en')
fe = fg.add_entry()
fe.id('http://lernfunk.de/media/654321/1')
fe.title('The First Episode')
atomfeed = fg.atom_str(pretty=True) # Get the ATOM feed as string
rssfeed = fg.rss_str(pretty=True) # Get the RSS feed as string
fg.atom_file('atom.xml') # Write the ATOM feed to a file
fg.rss_file('rss.xml') # Write the RSS feed to a file
- [ ] test http://stackoverflow.com/questions/174890/how-to-output-cdata-using-elementtree
- [ ] test http://stackoverflow.com/questions/13694143/parsing-cdata-in-xml-with-python
- Not scheduled, was “[2016-03-30 Wed]” on [2016-03-30 Wed 20:06]
- Syntax: https://duck.co/help/results/syntax
- statt
https://duckduckgo.com/?q=foobar+site%3Akarl-voit.at
- soll:
https://duckduckgo.com/?q=foobar+site%3Akarl-voit.at+-filetype%3Aorg.txt
- search:
&site=karl-voit.at&
- replace:
&site=karl-voit.at+-filetype=org.txt&
This does not work very well:
- html-results get hidden with -filetype… that are shown without this filter
- there is weird difference between -filetype:txt -filetype:.txt -filetype:org.txt
Using lazyblorg:
- Page Types (must-read)
- Orgmode Elements (must-read)
- FAQs
- Roadmap
- Project Origin
- Similar Projects
Coding: