Skip to content
gabriel-straub edited this page Sep 13, 2017 · 19 revisions

Except for uri, all of the below parameters can be specified either at the root (to query everything) or within a specific partition or collection in order to fix the scope.

Parameter Status Proposed by? Description
uri Live Perform an identifier lookup, results in a 30x redirect to the item if found
q Live Locate items containing the specified text
limit Live Limit the resultset to n items
offset Live Return results starting at item #n
class Live Restrict results to those having the specified class URI
media Live Restrict results to those Creative Works and Concepts which have associated media of the specified class (any, collection, dataset, video, image, interactive, software, audio, text, or class URI)
type Live Restrict results to those Creative Works and Concepts which have associated media delivered as the specified MIME type (e.g., text/html, audio/mp4)
for Live Include media whose restricted-audience URI matches the given URI
score Live Set the minimum prominence score that items must have to appear in results
mode Live Set to autocomplete in order to perform stem matching
lang Live When performing text-based queries, specify the language of the search terms (e.g., cy-GB)
about Proposed MM Restrict results to those items which have one or more of the specified concept URIs as a topic
duration-min, duration-max Dev Covatic Restrict results to works with media whose duration matches the specified range (either bound is optional), in seconds
date Proposed Restrict results to (a) events occurring on the specified date; and (b) works with media which has a publication/broadcast on the specified date
similar Proposed GS Restrict results to those items within a certain (optionally-specifiable) distance of the n-dimensional coordinates of the specified item(s)

Future work for the Datalab graph:

Strengthen the graph (make it more usable)

  • Put on stable platform
  • Proper ETLs to move the data
  • Get data from the authoritative sources (rather than some of the short cuts we have used to date)
  • Make relationships (between content) queriable
  • Make it easy to mass extract for analytics and machine learning (e.g. all content names and descriptions)
  • Better search against the content
  • Different access requirements for different data

Widen the graph (add more content)

  • Long form articles (news and sports)
  • Interactive
  • Bitesize
  • Taster
  • Recipes
  • Weather

Deepen the graph (add more data for the content)

  • Channel

Ought to be present in the data, indexing/query TBC

  • Screening times

Present but not meaningfully indexed (broadcast events are first-order entities)

  • Existing tags (as already in the system)

Straightforward

  • ML based descriptors (with confidence)

Named graphs with their own attributes → index confidence factors

  • Key people (director, actors, etc)

As with existing tags

List of example requests we want to be able to run against the graph:

  • Give me … pieces of content of a specific type with a specific length that cover these topics …

limit, type, duration-min & duration-max, about

  • Give me … pieces of content of a specific type with a specific length that are similar to these pieces of content…

limit, type, duration-min & duration-max, similar (see note regarding similar above)

  • Tell me how much content we have on …
  • Tell me how much content we have on … of length … that was created before …
  • Tell me all the names and descriptors of news articles that were created since …
  • Tell me the average length of content on … and how that compares based on which year it was created in
  • Tell me how many minutes of total content we have on …
  • Give me all the descriptions used for content that was created in the last … months