Vous avez reçu un message "Your GitLab account has been locked ..." ? Pas d'inquiétude : lisez cet article https://docs.gricad-pages.univ-grenoble-alpes.fr/help/unlock/

DESIGN.md 10.9 KB
Newer Older
1
2
# Design

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
3
The core QTransmogrifier implementation (excluding QAbstractValue implementations) is a few hundreds line of C++11 using templates defined
4
in the headers, an abstract QAbstractValue class, and QAbstractValueWriter/QAbstractValueReader base classes.
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
5

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
6
7
## The key idea

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
8
QTransmogrifier is more general than (de)serialization and should be understood as a generic way to traverse[^1] a C++ dataset and
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
9
10
11
12
13
14
15
16
17
18
another generic dataset, binding the related parts together. In effect:
* the traversal may be partial, leaving out unrelated dataset parts (satisfying R2)
* the same traversal may be used to:
  - read/write (resp. deserialize/serialize) files or buffers
  - visit/build pointer-based data structures
  - compute statistics on C++ data
  - bind ordinary C++ data to succinct data structures (like [SDSL](https://github.com/simongog/sdsl-lite))

Hence, from now on, we will use the term *bind* instead of the more restricted *(de)serialization* term.

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
19
This traversal is driven by QTransmogrifier<T> methods which may use a QValueMode (Read,Write,...) to determine whether to read the generic dataset or write it according to the C++ one.
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
20
21
22
23
24
25
26

[^1]: *traverse* meaning to go through without returning back

QDebug and QDataStream translate all data to a "flat" sequence of characters/bytes which structure becomes implicit and can only
be determined for sure by reading the code that produced it. But R1, R2, R3 require that we describe our data in a little bit more
detail. Moreover, W1 and RW2 require that we choose a careful compromise between data format features.

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
27
QTransmogrifier allows binding C++ `data` to a choice of:
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
28
29
* `sequence` of adjacent `data` `item`s
* `record` of named `data` `item`s
30
* `null` value (meaning information is irrelevant on `data`)
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
31
32
33
34
35
36
* Atomic values with a textual representation and optional binary ones:
  - text (utf16/utf8 encodings)
  - boolean (true/false)
  - numbers (integral/floating-point, unsigned/signed, 8/16/32/64 bits)
  - *date/time (TBD)*
  - *uuid (TBD)*
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
37
* Generically supported T values for which a specialized QTransmogrifier<T>::bind() is defined
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
38

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
39
We argue that QTransmogrifier allows lossless conversions between all supported formats, and that the addition of optional metadata (RW3)
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
40
41
42
43
can address most peculiarities of the supported formats. However, it may not always conveniently address formats that do not have standard
ways of representing data structures such as XML (e.g. binding the Person type with enough metadata to conform the result to
[xCard schema](https://tools.ietf.org/html/rfc6351) would be cumbersome).

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
44
## QTransmogrifier grammar
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
45

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
46
The QTransmogrifier traversal is formally described by the following recursive automaton:
EXT Arnaud Clère's avatar
WIP doc  
EXT Arnaud Clère committed
47
48
```mermaid
graph LR
49
subgraph QVal
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
50
51
52
53
54
55
56
57
58
i((start))--"any()"          --> x((end))
i((start))--"null()"         --> x((end))
i((start))--"bind#lt;T#gt;()"--> x((end))
i((start))--"sequence()"     --> QSeq
i((start))--"record()"       --> QRec
QSeq       --"out()"         --> x((end))
QRec       --"out()"         --> x((end))
QSeq       --"item()"        --> vs["QVal#lt;QSeq#gt;"]
QRec       --"item(name)"    --> vr["QVal#lt;QRec#gt;"]
EXT Arnaud Clère's avatar
WIP doc  
EXT Arnaud Clère committed
59
end
60
61
```
- Boxes (nested) represent possible states when traversing the data, the automaton is always in a single valid state
62
- Edges represent possible state transitions that translate to specific data format read/write actions
63
64

The automaton is implemented as follows:
65
- `QVal<_>`, `QRec<_>` and `QSeq<_>` types implement possible states where _ denotes the type of outer state
66
- State types only expose public methods corresponding to possible transitions, that return the destination state type
67
68
69
- The initial state type is `QValue` and the final state is `QCur` (for instance: `QValue::null()` returns a `QCur`)
- Returning a `QRec<_>` or `QSeq<_>` type automatically convert to the final `QCur` type invoking as much `out()` transitions as required
- `QCur` is a non-owning pointer to an `QAbstractValue` interface which implementations translate data traversal into specific
70
  data format read/write actions
71
- `QCur` instance is moved from the start state type to the end state type only for successful transitions, allowing to test
72
  alternatives before proceeding with the traversal
73
- Transitions may fail for various reasons specific to `QAbstractValue` implementations:
74
75
76
  - builders may not be able to allocate new items
  - readers may read data not matching the expected transition
  - ...
77
- In case of unsuccessfull transition the returned state type receives a null `QCur` that transparently bypasses calls to `QAbstractValue`
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
78
- `bind<T>()` calls are forwarded to the actual `QAbstractValue` or generic `QTransmogrifier` depending on `BindSupport<T>`:
79
  - BindNative  : **QAbstractValue** interface method
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
80
  - BindGeneric : **QTransmogrifier** template specialization for T
81
- Every `bind<T>()` starts from a QValue which is an un *unsafe* QCur copy wrt well-formedness (these `unsafeItem()` copies are protected from incorrect use)
82
83
84

## C++ types extensibility

85
QTransmogrifier is a functor templated on T type receiving a Value and T reference (either lvalue or rvalue reference) and returning the QCur.
86
87
Template specializations can be defined for any T and optionally refined for specific Cur<TImpl> with different sets of BindNative types.

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
88
A default QTransmogrifier specialization attempts to call `T::bind(...)` to conveniently bind `T* this` without having to understand template syntax,
89
90
91
92
rvalue or forwarding references. Such custom implementations are facilitated by the fluent interface below.

## Convenient fluent interface

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
93
A convenient side-effect of encoding the QTransmogrifier traversal in the type system is that smart C++ editors offer QTransmogrifier-aware code completion
94
95
96
97
to the fluent interface making it similar to using an advanced XML editor for typing XML tags. Say, after typing `Value(myImpl).` the
editor will propose to either `bind(myData.item)`, or to construct a `sequence()`, `record()` or `null()` value.

## Well-formedness guarantees
98

99
Thanks to this design, the compiler will make sure that the only possibility to return a QCur from a `QValue` is to traverse
100
the data without backtracking, calling only and all necessary QAbstractValue virtual methods.
101
102
103
104

The addition of default and optional values take into account most data schema evolutions in a purely declarative fluent interface without
having to test schema versions and the like. The benefit is that it is not possible to introduce bugs using just the fluent interface.

105
The downside is that writing loops with the fluent interface is unnatural as one must never forget to follow the valid QCur.
106
107
108
109
For instance:
```cpp
auto seq(v.sequence());
for (auto&& t : ts) {
110
    seq = seq.bind(t); // do not forget to reassign seq, or subsequent items will be `bind` to the moved-from QCur and automatically ignored
111
112
113
114
115
}
```

## Write performance

116
Since `QCur`, `QVal`, `QRec` and `QSeq` have no data member other than outer types and `QAbstractValue*`, calling their methods can be
117
optimized and replaced with just the following operations:
118
119
1. test the QAbstractValue pointer validity [^1]
2. call the QAbstractValue virtual method corresponding to the possible transitions
120
3. return the resulting QCur, QVal, QRec or QSeq with a valid or invalid QCur depending on QAbstractValue method success or failure
121

EXT Arnaud Clère's avatar
WIP doc  
EXT Arnaud Clère committed
122
123
[^1]: Experiments to use constexpr to bypass this step for writers that always return true did not seem to improve performance.

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
124
`QTransmogrifier<T>` can define up to 3 bind() overloads to efficiently and conveniently handle lvalue references, const lvalue references, and
125
126
rvalue references depending on T characteristics (which of copy/move is 1/possible and 2/efficient).

127
Compared to manually calling non-virtual, format-specific implementations, the overhead of always testing the validity of QAbstractValue*
128
and calling virtual methods is around 20% in our benchmark, with a maximum of 100% for trivial format-specific implementations
129
130
131
132
133
134
like copying a single char to a pre-allocated buffer.

This performance cost is usually dwarfed by unoptimized code and the ability to select a data format that performs well for the data
at hand. For instance, choosing Cbor with a mixture of boolean, character literals and numbers makes this overhead negligible.

Other than that, write performance depends on several factors:
135
- An explicit QUtf8Data type allows handling item names much more efficiently than using utf16 QString, while still being
136
  distinguishable from QByteArray binary data
137
- Using QData<TContent> classes to tag string encodings allowed to pinpoint unnecessary encoding conversions, notably in QVariant handling
138
- In the end, directly using QByteArray buffers instead of using QIODevice can amount to ~ 2x better write performance
139
- QAbstractValue implementations need to use efficient data structures from storing state. For instance, using an optimized std::vector<bool>
140
141
  to memorize opened JSON object/array can usually be stored in a single byte and avoid memory allocations, resulting in ~ 10x better
  write performance compared to QVector<bool>
142

143
## Read robustness
144

EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
145
Performance is not so important for Read. But compared to manually calling non-virtual, format-specific implementations, QTransmogrifier
146
enforces well-formedness checks necessary to reliably read data coming from unknown sources (QAbstractValue implementations being responsible
147
148
for low-level checks).

149
150
151
All errors are reported as `QIdentifierLiteral` to the `QAbstractValue` implementations that will decide what to do with them:
- ignore the details and set a global error status enumeration is usually appropriate to QAbstractValueWriter implementations
- storing all read mismatches is usually more appropriate to world-facing QAbstractValueReader implementations
152
153
154
Standardizing error literals allows efficient reporting and analysis while ensuring that various libraries can define new ones independantly.

## Data types extensibility
155

156
157
QAbstractValue is an abstract base class for translating fluent interface calls to format-specific read or write actions.
The fluent interface guarantees that QAbstractValue virtual methods are always called at appropriate times, so QAbstractValue implementations do not
158
159
160
have to check well-formedness. It defines a set of BindNative types and default textual representations and encodings for non-textual
native types simplifying again the implementations (see TextWriter example).

161
QAbstractValueWriter and QAbstractValueReader provide partial QAbstractValue implementations simplifying the work of implementors and offering default textual representations.
162

163
*NB:* BindNative types could be extended by specifying BindSupport<T> trait but then, (de)serialization code must be specialized
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
164
for each TImpl. For instance, a QTransmogrifier<QColor,QDataStream> may be implemented differently from QTransmogrifier<QColor,QAbstractValue> but QTransmogrifier<QColor>
EXT Arnaud Clère's avatar
EXT Arnaud Clère committed
165
example shows that meta() can also be used to avoid specialized serialization code that breaks RW2 requirement. If meta() is deemed
166
enough, the BindSupport trait and TImpl template parameters can be removed.