Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

知识图谱的前世今生:文摘大杂烩 #5

Open
lidingpku opened this issue Feb 28, 2017 · 1 comment
Open

知识图谱的前世今生:文摘大杂烩 #5

lidingpku opened this issue Feb 28, 2017 · 1 comment

Comments

@lidingpku
Copy link
Owner

lidingpku commented Feb 28, 2017

这是一个文摘,梳理一下知识图谱的一些相关概念,semantical network, semantic web, linked data, knowledge graph。

一、 Semantic Network

    semantic network = node + relation

语义网络不是什么新概念,其本质就是通过符号系统描述事物的关联。按照人们对事物关系不同的关注点,又可以细分为若干类型,例如,定义,描述,因果等关系。关系的语义通常是描述性的,既可以是基于文字描述,也可以是基于逻辑表达式。 当关系被进一步量化表示为概率时,也有可能演化为 probabilistic graphic model

A semantic network or net is a graph structure for representing knowledge in patterns of interconnected nodes and arcs. Computer implementations of semantic networks were first developed for artificial intelligence and machine translation, but earlier versions have long been used in philosophy, psychology, and linguistics. The Giant Global Graph of the Semantic Web is a large semantic network (Berners-Lee et al. 2001; Hendler & van Harmelen 2008).

What is common to all semantic networks is a declarative graphic representation that can be used to represent knowledge and support automated systems for reasoning about the knowledge. Some versions are highly informal, but others are formally defined systems of logic. Following are six of the most common kinds of semantic networks:

  1. Definitional networks emphasize the subtype or is-a relation between a concept type and a newly defined subtype. The resulting network, also called a generalization or subsumption hierarchy, supports the rule of inheritance for copying properties defined for a supertype to all of its subtypes. Since definitions are true by definition, the information in these networks is often assumed to be necessarily true.
  2. Assertional networks are designed to assert propositions. Unlike definitional networks, the information in an assertional network is assumed to be contingently true, unless it is explicitly marked with a modal operator. Some assertional networks have been proposed as models of the conceptual structures underlying natural language semantics.
  3. Implicational networks use implication as the primary relation for connecting nodes. They may be used to represent patterns of beliefs, causality, or inferences.
  4. Executable networks include some mechanism, such as marker passing or attached procedures, which can perform inferences, pass messages, or search for patterns and associations.
  5. Learning networks build or extend their representations by acquiring knowledge from examples. The new knowledge may change the old network by adding and deleting nodes and arcs or by modifying numerical values, called weights, associated with the nodes and arcs.
  6. Hybrid networks combine two or more of the previous techniques, either in a single network or in separate, but closely interacting networks.

source: semantic networks http://www.jfsowa.com/pubs/semnet.htm

二、 Semantic Web(1998)

Semantic Web = web of data  = distributed data + ontology ~= URI + RDF + OWL

The Semantic Web is a web of data, in some ways like a global database.
https://www.w3.org/DesignIssues/Semantic.html
image

What is the Semantic Web?

  • The Semantic Web is a web of data.
  • The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.
  • The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

source: https://www.w3.org/2001/sw/

Original “The Semantic Web” Vision (Scientific America, 2001)

  • Expressing Meaning: The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
  • Knowledge Representation : For the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Two important technologies for developing the Semantic Web are already in place: eXtensible Markup Language (XML) and the Resource Description Framework (RDF). Meaning is expressed by RDF, which encodes it in sets of triples, each triple being rather like the subject, verb and object of an elementary sentence.
  • Ontologies :The most typical kind of ontology for the Web has a taxonomy and a set of inference rules.
  • Agents: The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs.The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.
  • Evolution of Knowledge: The semantic web is not "merely" the tool for conducting individual tasks that we have discussed so far. In addition, if properly designed, the Semantic Web can assist the evolution of human knowledge as a whole.

三、 Linked Data(2006)

Linked Data  =  Semantic Web  - OWL Ontology + Link Resolution ~= RDF + SPARQL

With linked data, when you have some of it, you can find other, related, data.Like the web of hypertext, the web of data is constructed with documents on the web. However,  unlike the web of hypertext,  where links are relationships anchors in hypertext documents written in HTML, for data they links  between arbitrary things described by RDF,.  The URIs identify any kind of object or  concept.   But for HTML or RDF, the same expectations apply to make the web grow:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs. so that they can discover more things.
    https://www.w3.org/DesignIssues/LinkedData.html

image

四、knowledge graph (2012)

knowledge graph  =  linked data + NLP  ~=  entity linking  + graph query

image

Knowledge Graph (R) @google

  • The Knowledge Graph is a knowledge base used by Google to enhance its search engine's search results with semantic-search information gathered from a wide variety of sources. Knowledge Graph display was added to Google's search engine in 2012, starting in the United States, having been announced on May 16, 2012.
  • The goal is that users would be able to use this information to resolve their query without having to navigate to other sites and assemble the information themselves. The short summary provided in the knowledge graph is often used as a spoken answer in Google Now searches.
  • The Knowledge Graph enhances Google Search in three main ways to start:
    • Find the right thing
    • Get the best summary
    • Go deeper and broader
  • In May 2016, The Washington Post reported that "knowledge panels and other sorts of 'rich answers' have mushroomed across Google, appearing atop the results on roughly one-third of its 100 billion monthly searches".
    source: https://en.wikipedia.org/wiki/Knowledge_Graph
    source: https://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html — the official blog

image

Graph Search (R) @ Facebook

  • Facebook Graph Search was a semantic search engine that was introduced by Facebook in March 2013. It was designed to give answers to user natural language queries rather than a list of links.
  • it was announced that the Graph Search algorithm finds information from within a user's network of friends. Additional results were provided by Microsoft's Bing search engine.
  • When Facebook first launched, the main way most people used the site was to browse around, learn about people and make new connections. Graph Search takes us back to our roots and allows people to use the graph to make new connections
  • With Graph Search, people can search the social graph by looking for things like “sushi restaurants that my friends have been to in Los Angeles,” “hotels near the Eiffel Tower,” or “TV shows my friends like.”
  • The first version of Graph Search focuses on four main areas — people, photos, places, and interests.
    • People: “friends who live in my city,” “people from my hometown who like hiking,” “friends of friends who have been to Yosemite National Park,” “software engineers who live in San Francisco and like skiing,” “people who like things I like,” “people who like tennis and live nearby”
    • Photos: “photos I like,” “photos of my family,” “photos of my friends before 1999,” “photos of my friends taken in New York,” “photos of the Eiffel Tower”
    • Places: “restaurants in San Francisco,” “cities visited by my family,” “Indian restaurants liked by my friends from India,” “tourist attractions in Italy visited by my friends,” “restaurants in New York liked by chefs,” “countries my friends have visited”
    • Interests: “music my friends like,” “movies liked by people who like movies I like,” “languages my friends speak,” “strategy games played by friends of my friends,” “movies liked by people who are film directors,” “books read by CEOs”
  • In December 2014, Facebook changed its search features, dropping partnership with Bing, and eliminating most of the search patterns
  • The feature was developed under former Google employees Lars Rasmussen and Tom Stocky

source: https://en.wikipedia.org/wiki/Facebook_Graph_Search

source: http://newsroom.fb.com/news/2013/01/introducing-graph-search-beta/

source: https://www.facebook.com/graphsearcher

  • Under the Hood: The Entities Graph @facebook
    • One of the early choices we made was selecting Wikipedia as the canonical set of things we would match against.
    • Next came the task of tokenizing, canonicalizing, and matching the free text on 400-500M user profiles against Wikipedia pages. We built the matching pipeline on Hive[2] using Python to help with the text-processing bits, and drove it using a custom dependency-tracking build system. We used Cython to implement some fun text-matching algorithms along the way, including BK-trees and a somewhat modified Smith-Waterman algorithm.
    • Other interesting challenges included handling remakes (http://en.wikipedia.org/wiki/3:10_to_Yuma), series (“Twilight Series”) and sequels (“Predator 2”) of movies, shortened forms of book names (“Goblet of Fire”), false positive matches for music albums (“Thriller” ), non-English coverage, ASCII art, and obscenities.
    • First, we created millions of “fallback” Pages to represent the strings that we were unable to match. These Pages contained no data other than a name, so people could choose to connect with them after the migration and give us information about the true nature of these strings.
    • Second, we exposed recommended matches in an interface that allowed people to add missing connections and remove incorrect ones before migrating to the new data model.
    •  Thus, our general approach to quality is:
      •       Start with a vetted source of data
      •       Fix the head via human labeling and specialized crowdsourcing
      •       Fix the long tail via machine learning and crowdsourcing
      •       Get user feedback to find false positives
      •       Use human labeling to assess quality
    • Category prediction: We categorize Wikipedia pages into Facebook’s ontology of Page categories.  Wikipedia category tags and infoboxes (the box on the right side of many Wikipedia pages) provide useful signal for our machine learning.
    • Text extraction: We parse the wiki markup into a short snippet that we show on the top of the Facebook Page.
    • Alternate forms expansion:we use the WordNet database to capture and store alternate forms of thousands of entities on the site
    • Loose entity expansion
    • Veracity scores 
    • Entity Types
  • The Graph API is named after the idea of a 'social graph' - a representation of the information on Facebook composed of:
    • nodes - basically "things" such as a User, a Photo, a Page, a Comment
    • edges - the connections between those "things", such as a Page's Photos, or a Photo's Comments
    • fields - info about those "things", such as a person's birthday, or the name of a Page
  • people

source: https://www.facebook.com/notes/facebook-engineering/under-the-hood-the-entities-graph/10151490531588920/

source:  https://developers.facebook.com/docs/graph-api/overview

image

LinkedIn Knowledge Graph

  • LinkedIn’s knowledge graph is a large knowledge base built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical locations, schools, etc. These entities and the relationships among them form the ontology of the professional world and are used by LinkedIn to enhance its recommender systems, search, monetization and consumer products, and business and consumer analytics.
  • Different from these efforts, we derive LinkedIn’s knowledge graph primarily from a large amount of user-generated content from members, recruiters, advertisers, and company administrators, and supplement it with data extracted from the internet, which is noisy and can have duplicates.
  • To date, there are 450M members, 190M historical job listings, 9M companies, 200+ countries (where 60+ have granular geolocational data), 35K skills in 19 languages, 28K schools, 1.5K fields of study, 600+ degrees, 24K titles in 19 languages, and 500+ certificates, among other entities.

source: https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-graph

image

Satori

  • Today, Bing has over a billion entities (people, places, and things) and the number is growing every day. For those entities, we have over 21 billion associated facts, 18 billion links to key actions and over 5 billion relationships between entities.

source: https://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/
source: https://blogs.bing.com/search/2015/08/20/bing-announces-availability-of-the-knowledge-and-action-graph-api-for-developers/

@lidingpku
Copy link
Owner Author

2014年VLDB的tutorial: Knowledge Bases in the Age of Big Data Analytics http://www.vldb.org/pvldb/vol7/p1713-suchanek.pdf

全文 http://resources.mpi-inf.mpg.de/yago-naga/vldb2014-tutorial/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant