Vous avez reçu un message "Your GitLab account has been locked ..." ? Pas d'inquiétude : lisez cet article https://docs.gricad-pages.univ-grenoble-alpes.fr/help/unlock/

Commit 77d8e5e7 authored by EXT Arnaud Clère's avatar EXT Arnaud Clère
Browse files

Updated doc

parent 8b87fc84
# Design
## The key idea
QBind is more general than (de)serialization and should be understood as a generic way to traverse[^1] a C++ dataset and
another generic dataset, binding the related parts together. In effect:
* the traversal may be partial, leaving out unrelated dataset parts (satisfying R2)
* the same traversal may be used to:
- read/write (resp. deserialize/serialize) files or buffers
- visit/build pointer-based data structures
- compute statistics on C++ data
- bind ordinary C++ data to succinct data structures (like [SDSL](https://github.com/simongog/sdsl-lite))
Hence, from now on, we will use the term *bind* instead of the more restricted *(de)serialization* term.
This traversal is driven by QBind<T> methods which may use a BindMode (Read,Write,...) to determine whether to read the generic dataset or write it according to the C++ one.
[^1]: *traverse* meaning to go through without returning back
QDebug and QDataStream translate all data to a "flat" sequence of characters/bytes which structure becomes implicit and can only
be determined for sure by reading the code that produced it. But R1, R2, R3 require that we describe our data in a little bit more
detail. Moreover, W1 and RW2 require that we choose a careful compromise between data format features.
QBind allows binding C++ `data` to a choice of:
* `sequence` of adjacent `data` `item`s
* `record` of named `data` `item`s
* `null` value (meaning no information available on `data`)
* Atomic values with a textual representation and optional binary ones:
- text (utf16/utf8 encodings)
- boolean (true/false)
- numbers (integral/floating-point, unsigned/signed, 8/16/32/64 bits)
- *date/time (TBD)*
- *uuid (TBD)*
* Generically supported T values for which a specialized QBind<T>::bind() is defined
We argue that QBind allows lossless conversions between all supported formats, and that the addition of optional metadata (RW3)
can address most peculiarities of the supported formats. However, it may not always conveniently address formats that do not have standard
ways of representing data structures such as XML (e.g. binding the Person type with enough metadata to conform the result to
[xCard schema](https://tools.ietf.org/html/rfc6351) would be cumbersome).
## QBind grammar
The QBind traversal is formally described by the following recursive automaton:
```mermaid
graph LR
......@@ -118,5 +158,5 @@ IWriter and IReader provide partial IBind implementations simplifying the work o
*NB:* BindNative types could be extended by specifying BindSupport<TImpl,T> trait but then, (de)serialization code must be specialized
for each TImpl. For instance, a QBind<QColor,QDataStream> may be implemented differently from QBind<QColor,IBind> but QBind<QColor>
example shows that meta() can also be used to avoid specialized serialization code that breaks W5 requirement. If meta() is deemed
example shows that meta() can also be used to avoid specialized serialization code that breaks RW2 requirement. If meta() is deemed
enough, the BindSupport trait and TImpl template parameters can be removed.
......@@ -63,8 +63,8 @@ HEADERS += \
data.h
DISTFILES += \
DESIGN.md \
README.md \
design.md \
persons.proto \
sample.ini \
samples.txt
......@@ -20,7 +20,6 @@ or [data model](https://doc.qt.io/qt-5/model-view-programming.html) while provid
See below:
- [The requirements (read/write)](#the-requirements)
- [The key idea](#the-key-idea)
- [Some examples](#examples)
- [The results](#results)
- [Our conclusion](#conclusion)
......@@ -32,24 +31,27 @@ See below:
## The requirements
* **RW1. Easily customizable** for user-defined and third-part types
1. do not require writing template specializations for user-defined types
2. use type system and code completion to guide the user for simple binds (see also W1)
3. avoid most boiler-plate code, including redundant read/write code (as required with QDataStream << and >>)
4. override existing bind with custom view types or lambda
* **RW2. Good support of Qt data**:
1. almost all features of simple data (QJson..., QDataStream, QSettings)
2. most features of complex data (QCbor..., QXml..., QMetaObject, QModel...)
* **RW3. Allow optional metadata for complex formats** (CBOR tags, XML tags and attributes, QModel* columnNames, etc.)
* **RW4. No restriction on data size**
(some restrictions may apply with specific implementations that may, e.g. store context for each data structure levels, cache out-of-order data)
### Write (serialization)
* **W1. Very fast**
* **W1. Format can be changed at runtime**
(a compiled tracepoint in a library must be able to generate Json or Cbor as desired by library user)
* **W2. Very fast**
1. similar to QDataStream
2. same order of magnitude as protobuf/lttng/etl
3. potentially without dynamic memory allocation or thread locking
* **W2. Easily extensible** to user-defined and third-part types
1. do not require writing template specializations for user-defined types
2. use type system and code completion to guide the user for simple binds (see also W4)
3. avoid most boiler-plate code, including redundant read/write code (as required with QDataStream << and >>)
* **W3. Well-formed** data ensured (almost) with low impact on performance
* **W4. Format can be changed at runtime**
(a compiled tracepoint in a library must be able to generate Json or Cbor as desired by library user)
* **W5. Good support of Qt data**:
1. almost all features of simple data (QJson..., QDataStream, QSettings)
2. most features of complex data (QCbor..., QXml..., QMetaObject, QModel...)
* **W6. No restriction on output data size**
(some restrictions may apply with specific implementations that may, e.g. store context for each data structure levels)
### Read (deserialization)
......@@ -62,9 +64,6 @@ See below:
3. *changing from optional or required to repeated an item (provided it is not itself a sequence) (TBD)*
* **R3. Allow reporting all errors** and mismatches between what was expected and what is read (unless the data format was implicit as below)
* **R4. Allow implicit formats like QDataStream** when the reader knows exactly what to read
* **R5. Allow optional metadata for complex formats** (CBOR tags, XML tags and attributes, QModel* columnNames, etc.)
* **R6. No restriction on input data size or shape**
(some restrictions may apply with specific implementations that may, e.g. cache out-of-order data)
### A notable non-requirement
......@@ -74,44 +73,6 @@ graph data using some kind of "reference" values. We argue this makes the model
it does not mandate native support for references. Moreover, QBind supports metadata as an optional way to encode such special values
for data formats supporting it like [CBOR value sharing tags](http://cbor.schmorp.de/value-sharing) and XML.
## The key idea
QBind is more general than (de)serialization and should be understood as a generic way to traverse[^1] a C++ dataset and
another generic dataset, binding the related parts together. In effect:
* the traversal may be partial, leaving out unrelated dataset parts (satisfying R2)
* the same traversal may be used to:
- read/write (resp. deserialize/serialize) files or buffers
- visit/build pointer-based data structures
- compute statistics on C++ data
- bind ordinary C++ data to succinct data structures (like [SDSL](https://github.com/simongog/sdsl-lite))
Hence, from now on, we will use the term *bind* instead of the more restricted *(de)serialization* term.
This traversal is driven by QBind<T> methods which may use a BindMode (Read,Write,...) to determine whether to read the generic dataset or write it according to the C++ one.
[^1]: *traverse* meaning to go through without returning back
QDebug and QDataStream translate all data to a "flat" sequence of characters/bytes which structure becomes implicit and can only
be determined for sure by reading the code that produced it. But R1, R2, R3 require that we describe our data in a little bit more
detail. Moreover, W4 and W5 require that we choose a careful compromise between data format features.
QBind allows binding C++ `data` to a choice of:
* `sequence` of adjacent `data` `item`s
* `record` of named `data` `item`s
* `null` value (meaning no information available on `data`)
* Atomic values with a textual representation and optional binary ones:
- text (utf16/utf8 encodings)
- boolean (true/false)
- numbers (integral/floating-point, unsigned/signed, 8/16/32/64 bits)
- *date/time (TBD)*
- *uuid (TBD)*
* Generically supported T values for which a specialized QBind<T>::bind() is defined
We argue that QBind allows lossless conversions between all supported formats, and that the addition of optional metadata (R5)
can address most peculiarities of the supported formats. However, it may not always conveniently address formats that do not have standard
ways of representing data structures such as XML (e.g. binding the Person type with enough metadata to conform the result to
[xCard schema](https://tools.ietf.org/html/rfc6351) would be cumbersome).
## Examples
One can rely on Qt reflection to bind a Q_OBJECT or Q_GADGET stored properties using `QBIND_GADGET_WITH_METAOBJECT` macro
......@@ -313,8 +274,10 @@ Last but not least, providing in advance some `meta` data allows binding deep C+
```cpp
QStandardItemModel tree, table, matrix;
QModelWriter<>(&matrix).meta(qmSizes ,"4,3" ).bind(transform);
QModelWriter<>(& flat).sequence() .with(persons, flatten); // recursive bind function
QModelWriter<>(& tree).meta(qmChildren,"children" ).bind(persons);
QModelWriter<>(& table).meta(qmColumns ,"names,age").bind(persons);
//...
```
![Q...View](qstandardmodel.PNG)
......@@ -324,16 +287,14 @@ By convention:
- `qmColumns` defines the ordered set of named items that should be bound to their respective [columns](https://doc.qt.io/Qt-5/qmodelindex.html#column)
- `qmSizes` defines the bounds of successive dimensions of a N-dimensional row- or column-wise array (where N=2 to account for Q...View limitations)
- `qmName` allows naming data items for, e.g. XML root element and sequence items
- `qmType` provides type names for, e.g. XML attributes, CBOR tags
- `qmDataStreamVersion` allows binding specifically for QDataStream compatibility
- ...
The latest customized binds require using ad-hoc std::function (mimicking Python list comprehensions)
The latest customized binds require using ad-hoc std::function like `flatten` or lambda below (mimicking Python list comprehensions):
```cpp
QStandardItemModel custom;
QModelWriter<>(&custom).sequence().with([&](Seq<Cursor>&& s) {
for (auto&& person : persons) { // Read would require looping while !s.item()
s = s // To keep working with the active Cursor
.record()
for (auto&& person : persons) {
s = s.record()
.item("first name")
.meta(qmColor, person.age >= 42 ? "green" : "blue")
.bind(person.firstName)
......@@ -348,7 +309,7 @@ QModelWriter<>(&custom).sequence().with([&](Seq<Cursor>&& s) {
})
.out();
}
return std::move(s); // So caller stops calling IBind if user function was unable to keep track of the active Cursor
return std::move(s);
});
```
......@@ -422,22 +383,23 @@ bf647479706502666e756d6265726b2b34342031323334353637ff
### Write performance
Overall, QBind demonstrates write performance superior to existing Qt classes except QDataStream (which does not meet our W4
Overall, QBind demonstrates write performance superior to existing Qt classes except QDataStream (which does not meet our W1
and read requirements). Not surprisingly, the performance depends on the kind of C++ data and data format used. Here are some
explanations about the best results:
- QByteArray obviously fails almost all our requirements since it does not even have a global version number as QDataStream does
but it represents the minimum cost of serializing each dataset as it is equivalent to a few memcpy into a reserved buffer
(QByteArray results are similar to protobuf serialization part without dataset construction)
- QDataStream can offer data schema evolution using a global data schema version although readers may not be able to detect errors
which is the main reason we are advocating more explicit data formats
- QDataStream performs very well and can offer data schema evolution using a global data schema version regularly updated for Qt internal
needs but users do not control it and will not be able to detect errors on read, so they should refuse to read new schema versions
(this is the main reason why we are advocating more explicit data formats)
- The fluent interface (which brings convenience and well-formedness guarantees) costs around 20% for non trivial datasets as can
be seen between Data and QDataStream but this cost is usually compensated by other factors like below
bes seen between Data and QDataStream but this cost is usually compensated by other factors like below
- Cbor obtains better results than QCborWriter because the fluent interface cost is more than compensated by working
directly on a QByteArray instead of a QIODevice (around 50% slower)
- Cbor can be up to 10x faster than QDebug for the "builtin" dataset and even 20x faster for the "doubles" dataset (which is not a
surprise), but it suffers from the absence of utf16 encoding when more QString are added like in the "Person" dataset (this may be
solved by defining a utf16 Cbor tag)
- The cost of `Writable` which allows to compile a tracepoint and choose the trace format at runtime is hardly measurable.
- The cost of `Bindable` which allows to compile a tracepoint and choose the trace format at runtime is hardly measurable.
- Regarding text formats suitable for display on a console providing more metadata (such as the name of classes and enum values), our
toy example "TextWriter" obtains better performance than QDebug essentially because:
1. it works with all source code literals in utf8 instead of QString utf16
......@@ -449,7 +411,7 @@ explanations about the best results:
Regarding the worst performances:
- Variant which translates C++ types to an in-memory data structure of QVariantList and QVariantMap cannot perform very well
(without even storing metadata at all) because of the numerous small allocations
(without even storing metadata at all) because of the numerous small allocations that may cost much from time to time
- QJsonValue, QCborValue bad performance is explained the same way as Variant above
- More generally, it seems clear that pointer-based generic data structures cannot be as efficient as QBind to offer runtime
choice of final data format
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment