Skip to content

Latest commit



70 lines (49 loc) · 3.23 KB

File metadata and controls

70 lines (49 loc) · 3.23 KB
%load_ext autoreload
%autoreload 2
import surf
from knowledge_base import KnowledgeBase
kb = KnowledgeBase("../../knowledge_base/config/virtuoso_local.ini")
w = kb.get_resource_by_urn("urn:cts:greekLit:tlg0012.tlg001")
ts = w.add_text_structure("Canonical text structure of homer's iliad")

te = kb._session.get_resource("%s/urn:cts:greekLit:tlg0012.tlg001:1" % w.subject
                              , kb._session.get_class(surf.ns.HUCIT['TextElement']))


  • refactor sub-module names/organisation
  • fix bug in surf query translator (see below)
  • merge epibau branch
  • attach TextElements to a named graph, named after the TextStructure they live in (its URI)
  • tests for the populate module
  • in travis, run tests against the newly installed triple store (not the remote one)

bug with surf

problem: when calling surf.resource.Resource.update() the language of Literal gets overwritten to None.

this is what the log (of the underlying SPARQL query) looks like:

2020-10-02 11:09:52,440 DEBUG    surf    DELETE  FROM <> {  ?s ?p ?o  } WHERE { {  {  ?s ?p ?o .  FILTER (?s = <> AND ?p = <>)  }  } UNION {  {  ?s ?p ?o .  FILTER (?s = <> AND ?p = <>)  }  } }
2020-10-02 11:09:52,477 DEBUG    surf    INSERT  INTO <> {  <> <> <> .  <> <> "Isaeus" .  <> <> "Isaios" .  <> <> "Iseo" .  <> <> "Isée"  }

This seems related to this line of the SPARQL query translator: as isinstance(name.rdfs_label.first, str) return True, thus when translating a Literal the correct if/else statement is not triggered.

possible solution: swap lines 96 and 103

when doing this, make sure to clone from I may also try to get in touch with him

.. note:: Notes on fetching less common/stable text structures (Bekker, Stephanus).

**Problem**: the Leipzig CTS API exposes only Stephanus pages (e.g. 17)
            but not Stephanus sections (e.g. 17a). but the sections are there
            in the TEI XML, marked up as `tei"milestone` elements.

**Solution**: a solution to this is to fetch the first level via the API,
            and extract the second level units directly from the TEI/XML
            via xpath.

.export(output=Mimetypes.PYTHON.ETREE).xpath(".//tei:milestone")` ecc

Publish library to

python sdist

twine check dist/*

twine upload dist/*