Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalisation of dump files required#2304

Closed
RichardWallis opened this issue Jul 11, 2019 · 9 comments
Closed

Rationalisation of dump files required #2304

RichardWallis opened this issue Jul 11, 2019 · 9 comments

Comments

@RichardWallis
Copy link
Contributor

As identified in issues #2301 & #2302; within the definition download files, there is not full segregation between the terms defined in the core and terms defined in sections (extensions).

eg.The pending defined CssSelectorType, which is defined as the range for cssSelector property, defined in the core vocab. As the schema dump file only contains terms defined in the core, CssSelectorType definition doesn't appear in that file.

Anecdotally, it appears that the main down of dump files are for 'schema' (core only) the default offered on the downloads page and the 'all-layers' which contains all definitions (including those in the attic section).

I recommend that we rationalise the types of dump files offered (currently 8) down to 2 with the following contents:

  • schema - containing all sections except attic.
  • all-layers - containing all sections including attic.
@VladimirAlexiev
Copy link

#2537 shows that the current reduced schema lacks closure (some classes are referenced but not defined)

@danbri
Copy link
Contributor

danbri commented Jul 15, 2020

I agree. A few years ago we introduced a fair amount of engineering to support layered extension. In practice it turned out too difficult to disentangle various overlapping areas. Should we have an "extension" for Cultural Heritage? for Galleries / Archives / Libraries / Museums (GLAM)? Or different named efforts for bibliography and archival vocabulary? What about tourism, travel, real estate, and e-commerce? Where should ebooks be described? etc etc.

In practice we have gradually migrated back towards a simple structure.

  • Schema.org is Schema.org
    • It has an area called "attic" for things we did that we can't undo, but don't want to highlight
    • It has an area called "pending" for things we are doing, where we want to highlight a heightened potential for ongoing changes and the need for feedback

We have already downplayed the use of named subdomain extensions ("pending", "bib", etc.) in the site navigation. We have found little use for the defensive layering structure which attributed each triple in the schema definitions to one of those sections. In practice, "all layers" is the only sensible subset to use. There should be a simple triple representation, a consolidated Turtle representation (which looks as similar as possible to the MCF-flavoured Turtle source files), and I guess there is a desire for JSON-LD. Do we believe the CSV files have proved useful?

For naming I'd suggest something like

  • "schemaorg-current" (everything minus attic)
  • "schemaorg-current-plus-removed" (everything including attic)

@RichardWallis
Copy link
Contributor Author

Reviewing the files we currently produce I see we do not reference attic, ether individually or in other dump files.

As both schema.* and all-layers.* are often referenced in various issues and other comms, we ideally should not change those names without good reason.

To that end I suggest we retain the schema.* name, but update the contents to include all terms. As all-layers.* is often used/recommended it should be retained for now, even though the contents will be identical to the updated schema.* files.

The CSV versions have proved useful. Many issues, and confusion about file contents, I have been involved with have often stemmed from someone reading a CSV versions.

@danbri
Copy link
Contributor

danbri commented Jul 16, 2020

As both schema.* and all-layers.* are often referenced in various issues and other comms, we ideally should not change those names without good reason.

The good reason is that we no longer have a layered architecture for the sections of Schema.org.

Please use "schemaorg-current" and "schemaorg-all". This will make it easier to avoid confusing schema.ttl for (data/)schema.ttl, amongst other things. It is also developer friendly, in that it is a stronger reminder that this data comes from schema(.)org.

@ktk
Copy link

ktk commented Jul 16, 2020

Will that influence download locations of URLs like https://schema.org/version/latest/schema.nt which is referenced here? https://schema.org/docs/developers.html#defs

@danbri
Copy link
Contributor

danbri commented Jul 16, 2020

@ktk we'll have to update the developer documents, yes. /cc @RichardWallis

@danbri
Copy link
Contributor

danbri commented Jul 16, 2020

see https://twitter.com/danbri/status/1283767294267731971 for announcement of this proposed change

danbri pushed a commit that referenced this issue Jul 20, 2020
* Simplified and expanded dump files and updated associated documentation.
Re: issue (#2304)

* Adjust all-lays copies to account for limited file builds (as in travis)

* Updated dump file names to schemaorg-current & schemaorg-all

* Updated to create http & https versions of dumpfiles.

* Modified name of dumpfile used to reflect changes

Co-authored-by: Dataliberate <rjw@dataliberate.com>
@RichardWallis
Copy link
Contributor Author

Implemented in PR #2654

@VladimirAlexiev
Copy link

Checked https://schema.org/version/latest/schemaorg-current-http.ttl, all used classes are defined:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix schema: <http://schema.org/> 

select distinct ?class {
    ?p schema:domainIncludes|schema:rangeIncludes ?class
    filter not exists {?class a rdfs:Class}
} order by ?class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants