Skip to content

Releases: medialab/hyphe

Sum' 21

03 Sep 12:58
Compare
Choose a tag to compare

ChangeLog:

  • Fix links from OUT entities not accessible (#401)
  • Upgraded MongoDB version to work on more recent debian and such (#377)
  • Fix breakage with some old python2 configurations
  • Upgraded traph version to fix issue happening sometimes when getting paginated pages
  • Minor fixes (#263, #397, #416...)

Early 2021

13 Jan 18:14
Compare
Choose a tag to compare

ChangeLog:

  • Fix WebEntityCreationRules not taken into account anymore after corpus reset (#392)
  • Better handle startpages manually from the front as batch (#352 #365 #336)
  • Fix use of htpasswd to lock instances in docker builds (#390)
  • Improve & fasten docker builds & CI autodeploys (#339 #374)
  • Updated list of automatically followed redirection domains in docker builds
  • Allow to set tags from API when creating a WebEntity
  • Ensure settings input from frontend are valid (#360)

Winter 2019

18 Nov 14:37
Compare
Choose a tag to compare

ChangeLog:

  • Settings can now be adjusted when creating a new corpus (#229)
  • Network pages improvements (search nodes #255, view links direction #286, colors #311)
  • Admin page improvements (filter, order, backup and reindex actions, destroy all button... #264)
  • Links from OUT and DISCOVERED entities not taken into account anymore when computing WEs indegree (#232)
  • First version for displaying each WE's ego network (#316 #204)
  • Export buttons for crawls metadata in All crawls page (#319)
  • Fix starting crawls with many many prefixes and startpages previously impossible (#353)
  • Use imported urls as startpages when importing preexisting webentity (#365)
  • Handle case of nested imbricated WebEntities for crawls (#326)
  • Updated list of redirection domains to follow when crawling (#346)
  • Fix missing www for creationrules with a path prefix (#363)
  • Minor frontend improvements (login #239 #279, prospect #323, webentity edit #314 #304, pages network #335, startpages #324, homepage #340, crawls #297, ...)
  • Add API routes to collect crawled pages metadata & html content when option activated

Many thanks to @2LaMa who's behind a lot of these improvements!

Fall 2019 (fixes 2 breaking bugs + minor fixes)

13 Sep 13:10
Compare
Choose a tag to compare

ChangeLog:

  • Fix "homepage" mode for automatic startpages (breaking crawls from prospect on some settings)
  • Fix some breaking calls to get_tags with no namespace
  • Fix action menu in List WebEntities sometimes not triggered
  • Better handle errors coming from empty calls (closes #337)

Summer 2019 (fix issues with big webentities)

06 Aug 10:00
Compare
Choose a tag to compare

Changelog:

  • Use traph 1.2.0 with paginated queries to fix issues collecting all pages and pagelinks of a single webentity at once (#293), also fasten collecting childentities and cache number of pages by entity during network computation
  • Fix broken WebEntity pages network view
  • Add number of pages per webentity to WebEntities Lists, as well as exports and network view
  • Fix creationrules missing after resetting a corpus (#320)
  • Fix password protected access to corpora
  • Always include homepage as a startpage when crawling a discovered (#322)
  • Fix various crawler errors
  • Allow editing a tag in a single API call instead of removing then adding
  • Add script to trigger backup for all existing corpora

Spring 2019 (upgraded crawler)

06 Jun 12:27
Compare
Choose a tag to compare

Changelog:

  • Upgraded Scrapy (0.24.6 -> 1.6) and ScrapyD (1.0.1 -> 1.2.0) versions to latest ones, fixing broken crawls on many https websites (#268 #273 #312 #270) and broken Docker installs on some Windows and Mac machines
  • Upgraded Hyphe-Traph (1.0.0 -> 1.1.0) for faster homepages automatic identification
  • Upgraded Graphology (0.11.4 -> 0.14.1) & Sigma (2.0.0-alpha18 -> 2.0.0-alpha20) for small networks fixes
  • Improved Tags Inputs in Frontend's "WebEntity edition" and "Manage Tags" pages
  • Transformed FREETAGS into actual research "Field notes" preparing HyBro's coming new direction (#296)
  • Plenty of minor backend & frontend fixes (#305 #291 #310 #302 #281 #276 #248 #244 #275 #236 #294 #290 #258 ...)

Working early 2019

18 Jan 18:04
Compare
Choose a tag to compare

Changelog:

  • Fix docker issue with NFS volumes and alpine dependencies
  • Give to crawler more recent user-agents
  • Use more recent version of sigma.js in frontend's graph visualisation (#285)
  • Add sorting buttons in frontend's crawls list
  • Minor frontend fixes (#280 #277)

Early 2019

18 Jan 16:17
7c2e2e7
Compare
Choose a tag to compare
Early 2019 Pre-release
Pre-release

Warning: please privilege version 1.0.5

Changelog:

  • Fix docker issue with NFS volumes
  • Give to crawler more recent user-agents
  • Use more recent version of sigma.js in frontend's graph visualisation (#285)
  • Add sorting buttons in frontend's crawls list
  • Minor frontend fixes (#280 #277)

Sum 2018

23 Aug 12:07
Compare
Choose a tag to compare

Changelog:

  • Small frontend bugfixes (#259, #262 + tags autocomplete/sorting issues)
  • Add option to setup cookies for some crawls via API advanced use
  • Priorize indexation over webentity links calculation when queue gets too long

Faster indexation for big corpora

26 Apr 15:43
Compare
Choose a tag to compare

Changelog:

  • Updated mongodb calls and added more indexes to fasten pages indexation
  • Changed default configuration from storing html contents to not storing them to lighten disk consumption
  • Small frontend bugfixes (#252, #254, #261)
  • Fixed bin/clone_corpus script