Closed
Description
Many (but not all) pages have a fairly clear primary topic, some entity or thing that the page describes. For example a restaurant's home page might be primarily about that Restaurant, or an event listing page might represent a single event. Sometimes metadata (including inbound links) exploits the obvious association between page and the thing it describes, even blurring the distinction between the two.
The proposal here is to add a new property, 'mainEntity' which is a relationship between a document and the main thing that it describes.
Update: Feb5th based on discussion below and other feedback, am proposing also adding an inverseOf mainEntity, mainEntityOfPage.
Goals
- Given a page, potentially describing several different things using schema.org, it should be very easy to determine which thing/entity is the 'main' one that it describes.
Non-Goals
- Resolving https://en.wikipedia.org/wiki/HTTPRange-14
- Specifying exactly when it is ok to use 'url' of a page when really we're talking about the underlying real world entity, e.g. using http://www.imdb.com/name/nm0000136/ to stand for Johnny Depp.
- Addressing vaguer notions of aboutness such as topics ("social policy in Manchester after 1965"), or situations such as list/category pages where the document is about a collection of things (although ItemList may be useful here for some scenarios).
- Requiring every page to say what it is about. While it is useful to be able to mention a mainEntity (at least to have the terminology defined to do so), it is also not likely to be required every time schema.org is being used to describe something.
Examples
A Restaurant homepage (in Microdata)
Taking an existing example of http://schema.org/Restaurant and expanding:
<div itemid="" itemscope itemtype="http://schema.org/WebPage">
<div itemprop="mainEntity" itemscope itemtype="http://schema.org/Restaurant">
<h1 itemprop="name">Fondue for Fun and Fantasy</h1>
<p itemprop="description">Fantastic and fun for all your cheesy occasions.</p>
<p>Open: <time itemprop="openingHours" datetime="Mo,Tu,We,Th,Fr,Sa,Su 11:30-23:00">Daily from 11:30am till 11pm</time></p>
<p>Phone: <span itemprop="telephone" content="+155501003333">555-0100-3333</span></p>
<p>View <a itemprop="menu" href="http://example.com/menu">our menu</a>.</p>
</div>
</div>
A MusicGroup described in page alongside related entities (e.g. JSON-LD)
- A page such as https://musicbrainz.org/artist/650e7db6-b795-4eb5-a702-5ea2fc46c848 is primarily about a MusicGroup whose name is "Lady Gaga" (see embedded JSON-LD)
- It records many other facts about her, using several different schema.org types.
- Alongside the main entity, the MusicGroup "Lady Gaga", there are several subsidiary entities mentioned; e.g. musical aspects such as a MusicAlbum or MusicRecording; as well as AdministrativeArea, Country, City etc., perhaps Events too.
- How can we tell which is the main focus of the page? We look for a mainEntity relationship pointing to the entity from the page. Other approaches that might seem reasonable, like taking the first, top-most or outer entity are fragile and do not work well with extraction techniques (e.g. any23) and all RDF tooling, which abstract away from concrete syntax.
- In JSON-LD this could be written using @Rev syntax to nest a link to the page URL inside a description of the MusicGroup, or wrapped around the MusicGroup. Different variations will make sense on different sites, depending on how their markup and data are structured.
Questions
- Is mainEntity inverseOf the url property? Maybe...
- Is this the same as foaf:primaryTopic? Yes.
- Would supplying an 'url' property indicating the current page be an adequate alternate expression of 'mainEntity'? Not sure - let's discuss. The use of 'url' varies quite a lot in practice...
- Will this help close the gap between schema.org and Open Graph Protocol (OGP)? Potentially, yes.
Alternatives that do not work (for the purposes above)
- The http://schema.org/mainContentOfPage property, which relates a WebPage to a WebPageElement, is superficially similar. However its values are presented as parts of the page, rather than the real world things described by that markup. It also has a long, unwieldy name. see also Expand the range of 'mainContentOfPage' to Thing #215
- Re-using the 'about' property. Since 2011 this property has allowed multiple values. For the mainEntity it is important that at most one value is allowed (i.e. it is a "functional property") (see retracted proposal from @danbri )
- Having mainEntity be a boolean property. Although it might be simpler to write mainEntity='true' or , the information we are recording is intrinsically related to the page carrying it.