Skip to content

Releases: medialab/hyphe

Early 2024

31 Jan 09:52
Compare
Choose a tag to compare

ChangeLog:

  • Give access to detailed crawl logs within frontend (#452)
  • Diverse small UI fixes/improvements in frontend (#482, #483, #485, #486, #488, #494)
  • Complete adaptation of web archives handling to INA's (#484)

Full Changelog: v1.10.9...v1.11.0

Back-to-school papercuts

25 Aug 17:31
Compare
Choose a tag to compare

ChangeLog:

  • Add a button to export metadata from all pages of a webentity (#318)
  • Explicitly separate startpages warnings regarding redirected pages and faulty ones (#379)
  • Allow to set a specific User-Agent per crawl within the web interface (#461)
  • Display hints on the meaning of the different possible status of a crawl (#474)
  • Highlight corresponding webentities when hovering a status or a tag in the network legend (#459)
  • Switch User-Agents list used within crawls to relying on https://www.useragents.me/ (#453)
  • Various improvements (cleaner backend logs, remove empty traphs directories (#475), updated heuristics for webentity links calculation rhythm, visual fixes (#476, #477)

Hot Summer '23

21 Aug 10:34
Compare
Choose a tag to compare

ChangeLog:

  • migrated caching WELinks to (working) files instead of mongo to handle huge corpuses
  • allow to set archives pass as ENV variable for docker instances
  • display time required by links indexation on overview

Summer '23

21 Jul 17:27
Compare
Choose a tag to compare

ChangeLog:

  • Added handling of more webarchives as sources (Arquivo.pt + INA DLWeb) + fixed various webarchives frontend info (#469, #471,
  • Added a corpus setting "ignore internal links" to crawl but not record links within the currently crawled webentity in order to fasten drastically indexation of entities with crazy amounts of links (with a cost in terms of functionalities since the network of internal pages is then not available, and entities that are split after a crawl will require to recrawled) (cf #371, #378, #433)
  • Better handle frontend warning on pending actions when trying to close a tab (#465, #466)
  • Minor fixes (#448, #460, #467, #468, #470, 50d97e8, 85decf2)

Better, faster, stronger traph, there it is!

29 Nov 18:11
Compare
Choose a tag to compare

ChangeLog:

  • Switched to breaking new version of hyphe-traph 2.1, which should help fasten indexation on big networks, but requires to rebuild corpuses from start
  • Make iterator traph calls less recurrent to leave priority to quick user actions
  • Fixed stack on calling empty callback in List Webentities
  • Upgraded urllib3 to handle SSL deprecation
  • Froze dependencies to maintain python2.7 compat

Summer '22

19 Aug 11:48
Compare
Choose a tag to compare

ChangeLog:

  • Upgraded User Agents list
  • Added extra default WebEntity CreationRules for Github, Instagram, TikTok, Reddit and a bunch of blog platforms
  • Added perma.cc to list of default autofollowlinks
  • Diverse fixes and extra features for webarchives (links to archive permalinks, etc.)
  • Minor bugfixes

Spring '22

30 Mar 15:50
Compare
Choose a tag to compare

ChangeLog:

  • Added a distinction between successful and errored crawled pages to identify Suspicious crawls (#425)
  • Fixed frontend compatibility within Hyphe-Browser (medialab/hyphe-browser#212)
  • Fixed WebArchives crawling interface (#431) and behavior from BNF's archives (#426)
  • Improved network page's interaction using latest sigma.js v2.2 (node highlight etc & #367)
  • Allowed frontend to automatically restart a closed corpus when reopening the frontend directly on a specific corpus link (#440)
  • Allowed to check contiguous cases in frontend's lists of webentities using the shift key (#438)
  • Allowed to tune the frontend's header color from the config (#430)
  • Published Hyphe on Zenodo & Software Heritage
  • Minor fixes (#397, #388, #432, #429, #437, #343, #341, #444, #325)

Robots sensitive crawls (stabilized)

15 Nov 14:55
Compare
Choose a tag to compare

ChangeLog:

  • Fixed environment variable OBEY_ROBOTS for Docker instance
  • Added explanation helpers in frontend
  • Fixed undeletable corpora

Robots sensitive crawls

25 Oct 15:09
Compare
Choose a tag to compare

ChangeLog:

WebArchives powered crawls

23 Sep 11:00
Compare
Choose a tag to compare

ChangeLog:

  • Allow to start crawls on Web Archives to browse disappeared or modified webentities in the past (#372)
  • Allow to setup advanced individual crawl settings (using a specific cookie, adjusting the depth, using a web archive...)
  • Allow to display only crawled pages in a webentity's webpages list
  • Upgraded fake user agents dependency for more recent UAs
  • Add to the API a route to collect crawled webentity's webpages content as clear text instead of zipped base64
  • Minor fixes (#397, #416, #418, 8b8f73f, 3b48755, 6aea48a, f3c1e85, e97b9d0, b05d470, 01aac8a, ...)