add section about implementing extensions #31

pwinckles · 2020-10-15T16:36:21Z

This is the second breakout from PR #29. It covers how extensions should be implemented.

It includes additional, and perhaps controversial, changes from the same section in PR #29. Specifically, it introduces a new extension manifest file, extensions.json that is located in root extension directories. Per the OCFL spec, it is technically not permissible to have this file in the root extension directory. I think this is a mistake, but a workaround could be to stick it in a subdirectory instead.

Why is the manifest needed? Having the manifest explicitly declare what extensions are in use enables extension chaining extensions and gracefully handles obsoleted extensions or extensions that are no longer used.

In the chaining case, suppose that you have an extension that chains together 3 other extensions. The chaining extension and each of the chained extensions each have their own configuration. So, you have a root extension directory that looks something like this:

extensions/
  - 0001-extension-a /
  - 0002-extension-b/
  - 0003-extension-c/
  - 0004-extension-d/

Without the manifest, a client has to inspect the extensions directory and load whatever it finds. The end result in this case is that 4 unique extensions are loaded.

If it had a manifest that looked like:

{
  "extensions": [
    "0004-extension-d"
  ]
}

Then, it would instead only load a single extension, the chaining extension, which would in turn load its dependent extensions. It is true, this same behavior could be achieved without the manifest but it is more convoluted and requires a second pass to "link" the dependent extensions after the initial load.

In the obsoleted/deprecated case, suppose that you're using an extension and decide to replace it with another or stop using it entirely. In this case, you do not necessarily want to get rid of the extension's configuration. For example, let's say that it's an extension that describes a log format that you're no longer using, but you still have historic files in the log directory that use the format. You probably want to keep a record of the extension that created those files and its configuration, and you can do so if you can have extension configuration within the extensions directory that is not actively used because it is not declared as being in use.

Finally, there is an outstanding problem that this PR still does not address, which is how can you used the same extension configured in multiple different ways on the same repository or object? For example, let's say that there's an extension that's used for specifying id formats, and you have a repository that contains two distinct types of ids. It is currently impossible to use the same extension to define both of the formats, assuming that the extension itself does not internally support this, because the extension can only be configured once.

README.md

rosy1280 · 2020-10-20T13:37:13Z

Editors met, and we really like this idea but want to deep dive and explore in more detail, only so we can be sure that we get it right and aren't continuing to go back and forth. An issue has been created #33 to help us explore this topic further.

In the mean time we'll leave this PR open and label it as a draft so we can use some (or all) of the language later.

pwinckles · 2020-10-21T12:06:45Z

An alternative extensions.json format would be if it contained the extension configuration itself. For example:

{
  "enabled": {
    "0000-example-extension": {
      "extensionName": "0000-example-extension",
      "firstExampleParameter": 12, 
      "secondExampleParameter": "Hello", 
      "thirdExampleParameter": "Green"
    },
    "0002-flat-direct-storage-layout": {
      "extensionName": "0002-flat-direct-storage-layout"
    }
  }
}

Using this scheme, there would not be individual configuration files for each extension. There would only be a single configuration file in the storage root and one per object that uses extensions. The only reason that extension specific directories under extensions would exist is if the extension needs a place to write files, eg the mutable HEAD extension.

I did not suggest this earlier because I was trying to keep the structure as close to what we've discussed previously as possible. As always, there are advantages and disadvantages to both approaches. I think there is one key advantage to having all of the configuration in a single file, and that is that it reduces the number of files that need to be read when accessing the object. At this point, I think it's unlikely that anyone would ever have more than a couple extensions on their objects, when the configuration is separated into different files there is a minimum of one plus the number of extensions used files to read when accessing an object, the cost of which varies depending on your storage (S3...).

Additional notes about this format:

The extension configuration objects must all have a required extensionName key that's value is the extension's registered name. This allows JSON libraries to polymorphically deserialize the objects.
In the example, the extension map is under the enabled key. This was in case we wanted to preserve the ability to maintain extension configuration that is no longer in use. For example, you could put inactive configuration under disabled.

README.md

awoods · 2020-11-04T15:41:28Z

@pwinckles : it looks like this PR needs a rebase.

…cessed

pwinckles · 2020-11-04T17:26:45Z

@pwinckles : it looks like this PR needs a rebase.

Fixed

pwinckles mentioned this pull request Oct 15, 2020

rework extensions overview #29

Closed

pwinckles added the in-review label Oct 15, 2020

awoods reviewed Oct 20, 2020

View reviewed changes