Remove UTF-8 encoding requirement
Motivation
Lustre doesn't specify an encoding, and yet we're currently using String
s and &str
s everywhere. This makes many tests/examples from the official repository not compile without prior conversion, as they are not in UTF-8, but in an old-school weird French ASCII variant.
Solution
We should eventually migrate our uses of &str
to &[u8]
, as logos v0.13.0 now supports bytes slices (which wasn't the case when we started the project). However, we're still blocked by rowan
, whose internal representation relies on string slices.
Implementation
This shouldn't be too hard to implement at all. The only place where we may actually care about the actual textual representation of strings would be when printing diagnostics containing identifiers, and maybe later, in docstrings.
- Docstrings are out of the question: we'll literally be the ones specifying them so we can choose to require UTF-8, at least inside the comment
- Printing diagnostics isn't an issue too, because they follow the regex
/[a-zA-Z_][a-zA-Z0-9_]*/
which is trivially safe to treat as a string. For the matters of making our compiler as lax as possible, it may not be absurd to relax this regex to wider ranges of unicode and check their validity much later, but again, we're making the rules, so we can choose to really only allow UTF-8 anyway.
If we trust logos
' regex engine with our lives, we can probably do an unsafe conversion of bytes to String (behind a default-enabled feature flag, with a safe baseline implementation else, just in case).