Index


REGISTRATION INFORMATION

   Tag             28 (shareable)
   Data Item       multiple
   Semantics       mark value as (potentially) shared
   Reference       http://cbor.schmorp.de/value-sharing
   Contact         Marc A. Lehmann <cbor@schmorp.de>

   Tag             29 (sharedref)
   Data Item       unsigned integer
   Semantics       reference nth marked value
   Reference       http://cbor.schmorp.de/value-sharing
   Contact         Marc A. Lehmann <cbor@schmorp.de>

SUMMARY

These two tags can be used to implement shared value support in CBOR.

RATIONALE

Many serialisable data structures can contain values that are shared. For example, in Perl, you could have an array with two hash references pointing to the same object.

When serialising these data structures to CBOR, these values either become unshared (duplicated), or, when the structure contains cycles, they are not serialisable into CBOR at all.

This extension implements explicit shared value support - encoders need to explicitly mark values as potentially shared and can later refer to them.

This extension is not meant to save space in the CBOR representation by encoding duplicated values only once - the shared values are supposed to refer to the same value after decoding (e.g. when implemented as a pointer, all references to the value should point to the same memory object).

Space saving is a side effect of encoding data structures with large shared substructures using this extension - the CBOR representation will then have similar space requirements as the original data structure.

DESCRIPTION

To share values, the first occurrence of the value must be explicitly tagged with the shareable tag (value 28).

Subsequent occurrences can then be encoded by encoding the index of a previously marked value tagged with the sharedref tag (value 29). That is, index 0 refers to the first value marked as shareable in the CBOR stream, index 1 to the second and so on.

Any taggable value can be marked for sharing, and there is no requirement to actually refer to a value marked as shareable - encoders can mark any value they want without ever referring to them.

Implementors are advised that, to be able to encode cyclic structures, it must be possible to refer to a value before it is completely decoded. For example, during decoding of a map, some entries can refer to the map being decoded. Thus an implementation cannot decode a shareable value and then record it for later references - it has to record the reference before decoding the value.

This can be handled in a variety of ways. For example, in Perl, values not explicitly referenced (hash, array, scalar ref) can not (normally) be shared, so it only has to handle explicit references. Other shared values will usually become unshared, for efficiency reasons.

Implementations that do not support sharing can duplicate the values after decoding, or they can use fix-up lists to fix shared references after decoding.

IMPLEMENTATION NOTE

The semantics of shareable/sharedref tags require the decoder to be aware and the encoder to be under control of the sequence in which data items are encoded into the CBOR stream. This means these tags cannot be implemented on top of every generic CBOR encoder/decoder (which might reorder entries in a map); they typically need to be integrated into their works.

EXAMPLES

A simple shared array

The following Perl fragment creates an array reference with three entries, all of which array references themselves. All of them contain the same data, but the first two actually reference the same (shared) data structure:

   $data = [ ([]) x 2, [] ];

This is another way to create it:

   my $shared = [];
   $data = [ $shared, $shared, [] ];

The shared aspect means that setting an element of the first nested arrayref also makes it visible inside the second nested arrayref, as it is the same array:

   $data->[0][0] = "test";
   # results in: [["test"],["test"],[]]

A standard CBOR en-/decoder will encode three separate arrays, which will decode into three separate arrays again. So when the original data structure is en- and then decoded, the arrays will be unshared:

   $unshared = decode_cbor encode_cbor [ ([]) x 2, [] ];
   $unshared->[0][0] = "test";
   # results in: [["test"],[],[]]

The CBOR encoding might be:

   83    # array(3)
      80 # array(0)
      80 # array(0)
      80 # array(0)

The value-sharing extension allows an encoder to flag the two arrays and keep the shared arrays actually shared. For example, the CBOR::XS encoder, when configured to use the value sharing extension, will emit this CBOR value:

   [28([]), 29(0), []]

   83       # array(3)
      d8 1c # tag(28)
         80 # array(0)
      d8 1d # tag(29)
         00 # unsigned(0)
      80    # array(0)

When decoding it, the first two array references will point to the same array.

A cyclic data structure

The following cyclic Perl data structure references itself from within itself. Here a decoder will see a reference to the shared value before it has completely decoded the shared value:

   $data = [];
   $data->[0] = $data; # make the first array element refer to the array

This data structure is not representable in standard CBOR. Using the value sharing extension, it can be encoded as follows:

   28([29(0)])

   d8 1c       # tag(28)
      81       # array(1)
         d8 1d # tag(29)
            00 # unsigned(0)

SECURITY CONSIDERATIONS

Implementing this extension can open up a decoder for additional resource exhaustion attacks.

The possibility to create cyclic data structures can create problems for implementtaions that rely on garbage collection to clean up data structures, and it could potentially lead to problems with algorithms that can't cope with this, leading to infinite loops or similar problems. One way to counter that is to not enable decoding of shared values by default, so code that wants it can opt-in. Another way would be to disallow decoding of cyclic data structures while still allowing other forms of shared values (for example, by tracking whether a referenced value has been fully decoded already and fail if not).

Implementations that decode shared values by duplicating them could also suffer from excessive expansion (e.g. having a string referenced multiple times in an array, then having this array referenced multiple times in another array and so on). This is not usually a problem as most implementations already treat arrays and objects as references, and can therefore avoid duplication.

IMPLEMENTATIONS

This section lists known implementations of this extension (drop me a mail if you want to be listed here).

* [Perl] CBOR::XS (reference implementation)
* [JavaScript] borc-refs
* [Perl] CBOR::Free
* [Nodejs] node-cbor