Skip to content

defgsus/teletext-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

archive of german online teletexts

Or videotext, as we used to call it.

DEPRECATED: Collecting raw html files every 30 minutes is just too much:

  • for github: repo size is 800 mb after only 3 weeks
  • for parsing: it takes 6 single-thread hours to beautiful-soup through all files in each commit

A slimmer version runs at teletext-archive-unicode

Below is historical

------8<------8<------8<------8<------8<------

This repo exists mainly because it's just possible to scrape those online teletexts with github actions. And, you know, interesting stuff might evolve from historic beholding.

The data is collected raw in docs/snapshots. Each commit adds, overwrites or removes the individual files of each teletext page.

scraped stations:

station since type link
3sat 2022-01-28 html with font-map https://blog.3sat.de/ttx/
ARD 2022-01-28 html https://www.ard-text.de/
NDR 2022-01-27 html https://www.ndr.de/fernsehen/videotext/index.html
n-tv 2022-01-28 json https://www.n-tv.de/mediathek/teletext/
SR 2022-01-28 html https://www.saartext.de/
WDR 2022-01-28 html https://www1.wdr.de/wdrtext/index.html
ZDF 2022-01-27 html https://teletext.zdf.de/teletext/zdf/
ZDFinfo 2022-01-27 html https://teletext.zdf.de/teletext/zdfinfo/
ZDFneo 2022-01-27 html https://teletext.zdf.de/teletext/zdfneo/

related stuff

Oh boy, look what else exists on the web:

TODO

beyond the borders