-
-
Notifications
You must be signed in to change notification settings - Fork 278
Adding WebSites
Noah Santacruz edited this page Jan 25, 2023
·
9 revisions
A WebSite object corresponds to at least one WebPage objects. The corresponding WebSite to a WebPage can found as follows:
domain = WebPage.domain_for_url(webpage.url)
site_data = WebPage.site_data_for_domain(domain)
website = WebSite().load(site_data)
Attribute Name | Is Required | Description |
---|---|---|
name | Yes | Displays in the WebPages sidebar. |
domains | Yes | List of all domains corresponding to the WebSite with the specified name. |
is_whitelisted | Yes | Must be set to True in order for the WebSite's WebPages to appear in the Sefaria sidebar. |
bad_urls | No | List of regular expressions that match URls we don't want to save in our database or appear in the sidebar. |
normalization_rules | No | see normalize_url() in sefaria/model/webpage.py . In normalize_url() , the URL of an incoming WebPage is normalized based on global rules that are applied to all incoming WebPages, and the URL can be normalized by other rules if specified in the WebSite object's normalization_rules list |
title_branding | No | Used for normalizing the title field when WebPage data is received by the server |
initial_title_branding | No | Used for normalizing the title field when WebPage data is received by the server |
exclude_from_tracking | No | Only relevant for linker v1 and v2. |
whitelist_selectors | No | Only relevant for linker v3. List of CSS selectors that should be included in the page content when searching for citations. This should be used when you see some parts of the page are not included by default. |
Here is an example of a WebSite in the database:
{
"name" : "Torah In Motion",
"domains" : [
"torahinmotion.org",
"torahinmotionorg.e.civicrm.ca"
],
"is_whitelisted" : true,
"bad_urls" : [
"torahinmotionorg\\.e\\.civicrm\\.ca\\/store"
],
"normalization_rules" : [
"remove www"
],
"title_branding" : [
"TORAH IN MOTION"
],
"initial_title_branding" : true,
}
To add a WebSite in the CLI:
from sefaria.model.webpage import *
w = WebSite()
w.name = "Torah In Motion" # required attribute
w.domains = ["torahinmotion.org", "torahinmotionorg.e.civicrm.ca"] # required attribute
w.is_whitelisted = True #required attribute
w.save()