cjdb / lingua Goto Github PK
View Code? Open in Web Editor NEWA Rust compiler implemented using modern C++.
License: Apache License 2.0
A Rust compiler implemented using modern C++.
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
String literals are delimited by "
, and strings with new characters that are not prefixed by \\
are ill-formed.
Describe the solution you'd like
A diagnostic that informs the user their string literal is not terminated.
Example
print!("hello
);
Diagnostic
- program ill-formed from 1:8 to 1:13:
unterminated string literal `"hello`
Example
let x = "abc\
def\
ghi";
let y = "jkl
mno";
Diagnostic
- program ill-formed from 4:9 to 5:5:
unterminated string literal `"jkl`
Is your feature request related to a problem? Please describe.
Floating-point literals may not have more than one radix point in Rust.
Describe the solution you'd like
A diagnostic to notify the user when floating-point literals have multiple radix points.
Example
1.2.3
Diagnostic
- program ill-formed from 1:1 to 1:6:
floating-point literal `1.2.3` has 2 radix points: it must have at most one.
Is your feature request related to a problem? Please describe.
Rust is a language that supports UTF-8, which is what char8_t
is designed to accommodate. There's now compiler support for char8_t
and support in {fmt}.
Describe the solution you'd like
Change all occurrences of char
to char8_t
, string
to u8string
, and string_view
to u8string_view
.
Is your feature request related to a problem? Please describe.
Rust supports ASCII escapes, byte escapes, Unicode escapes, and quote escapes. As these escapes are well-defined, there needs to be a diagnostic for escapes that are not recognised by the compiler.
Describe the solution you'd like
There should be three ill-formed diagnostics:
*An ill-formed ASCII escape is an ill-formed byte escape that also includes character codes beyond 0x7F
. (E.g. b"\xFF" is a byte escape, but not an ASCII escape).
Each diagnostic should identify exactly one unknown escape, even if there are multiple unknown escapes in the same string. This will ultimately require the compiler to report multiple diagnostics if a string has multiple unknown escapes.
Example
String:
"hello, \world!"
Diagnostic:
- program ill-formed from 1:1 to 1:17:
unrecognised ASCII escape '\w' in string literal "hello, \world!"
^~
Example
String:
b"hell\o, \world!"
Diagnostic:
- program ill-formed from 1:1 to 1:18:
unrecognised byte escape '\o' in string literal "hell\o, \world!"
^~
- program ill-formed from 1:1 to 1:18:
unrecognised byte escape '\w' in string literal "hell\o, \world!"
^~
Additional context
Quote escapes don't have any unknown sequences.
Is your feature request related to a problem? Please describe.
Rust tokens are what comprise the Rust lexicon. Anything that is not recognised as such should be diagnosed as an unknown token.
Describe the solution you'd like
A type that diagnoses unknown tokens. If the token has a non-ASCII character in its spelling, this should be added to the diagnosis. Note that since ` is an unrecognised token, this diagnostic should not quote unrecognised tokens using this character.
Example
123 `plus` 456;
Diagnosis
- program ill-formed from 1:5 to 1:6:
unrecognised token "`"
^
Describe the bug
The exhaustive Unicode escape test doesn't actually test anything beyond two characters.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Have all checks enabled.
Additional context
TODO
The compiler front-end will need a way to indicate where source mappings are located in a source file. This can be achieved using coordinates to specific points in a source file, and coordinate ranges.
Regular
(see source_coordinate()
and operator==
)StrictTotallyOrdered
(see operator<
)fmt::format("{}", cursor)
returns "line:column"
, where cursor
is a source coordinate object, line
is an integral value referring to a particular line, and column
is an integral value referring to a particular column.operator+
)Although the line and column types should be integral, they should be strongly typed to avoid programmer error.
class [[nodiscard]] source_coordinate {
public:
/// \brief A strong type alias for the column type
/// \note explicitly constructible from std::intmax_t
/// \note explicitly convertible to std::intmax_t
///
using column_type = unspecified ;
/// \brief A strong type alias for the line type
/// \note explicitly constructible from std::intmax_t
/// \note explicitly convertible to std::intmax_t
///
using line_type = unspecified ;
/// \brief Initialises the object so that it is equivalent to
/// source_coordinate{line_type{1}, column_type{1}}.
///
constexpr source_coordinate() = default;
/// \brief Initialises the column value with the column parameter and initialises the line
/// value with the line parameter.
///
constexpr explicit source_coordinate(line_type line, column_type column) noexcept;
/// \brief Returns the column value.
///
constexpr column_type column() const noexcept;
/// \brief Returns the line value.
///
constexpr line_type line() const noexcept;
/// \brief Checks that the column and line values of x are the same as the column and line
/// values of y.
/// \returns x.line() == y.line() and x.column() == y.column()
///
constexpr friend bool operator==(source_coordinate x, source_coordinate y) noexcept;
/// \brief Checks that the column and line values of x are not the same as the column and line
/// values of y.
/// \returns not (x == y)
///
constexpr friend bool operator!=(source_coordinate x, source_coordinate y) noexcept;
/// \brief Checks that a source_coordinate is strictly less than another source_coordinate.
/// \returns true if:
/// * `x.line()` is less than `y.line()`
/// * `x.line() == y.line()` and `x.column()` is less than `y.column()`
/// false otherwise
///
constexpr friend bool operator<(source_coordinate x, source_coordinate y) noexcept;
/// \brief Checks that a source_coordinate is strictly greater than another source_coordinate.
/// \returns `y < x`
///
constexpr friend bool operator>(source_coordinate x, source_coordinate y) noexcept;
/// \brief Checks that a source_coordinate is partially less than another source_coordinate.
/// \returns `not (y < x)`
///
constexpr friend bool operator<=(source_coordinate x, source_coordinate y) noexcept;
/// \brief Checks that a source_coordinate is partially less than another source_coordinate.
/// \returns `not (x < y)`
///
constexpr friend bool operator>=(source_coordinate x, source_coordinate y) noexcept;
/// \brief Moves x by:
/// 1. adding y.line() to x.line(), and
/// 2. (a) adding y.column() to x.column() if y.line() == 0, or
/// (b) assigning y.column() to x.column() otherwise
/// \returns x
///
constexpr friend source_coordinate
operator+(source_coordinate x, source_coordinate y) noexcept;
};
Regular
Range
(that is: map some of its interface, but not its semantics)class [[nodiscard]] source_coordinate_range {
public:
/// \brief Constructs a source_coordinate_range.
/// \param begin The beginning of the source_coordinate range.
/// \param end The end of the source_coordinate range.
/// \note begin and end form the half-open interval [begin, end).
///
explicit constexpr source_coordinate_range(source_coordinate const begin,
source_coordinate const end) noexcept
[[expects: begin <= end]];
/// \brief Returns the beginning of the source_coordinate range.
/// \returns the beginning of the source_coordinate range.
///
constexpr source_coordinate begin() const noexcept;
/// \brief Returns the end of the source_coordinate range.
/// \returns the end of the source_coordinate range.
///
constexpr source_coordinate end() const noexcept;
/// \brief Checks if the range is empty.
/// \returns true if `begin() == end()`; false otherwise
///
constexpr bool empty() const noexcept;
/// \brief Checks that two source_coordinate_ranges are equivalent.
/// \returns true if `x.begin() == y.begin()` and `x.end() == y.end()`; false otherwise.
///
friend constexpr bool
operator==(source_coordinate_range const& x,
source_coordinate_range const& y) noexcept;
/// \brief Checks that two source_coordinate_ranges are not equivalent.
/// \returns `not (x == y)`
///
friend constexpr bool
operator!=(source_coordinate_range const& x,
source_coordinate_range const& y) noexcept;
};
fmt: https://github.com/fmtlib/fmt
magma: https://en.wikipedia.org/wiki/Magma_(algebra)
Is your feature request related to a problem? Please describe.
From the design specification:
All diagnostics should store a range of coordinates to source tokens, and a help message. There will be three diagnostic types: ill-formed, warning, and remark.
Describe the solution you'd like
Since the underlying implementation of one diagnostic is the same as another, it makes sense to have a common base for every diagnostic, to:
The base class will rely on a diagnostic level type, so it makes sense to couple it in #3.
Diagnostics are intended to be privately derived from the diagnostic base, although the implementation has no way to make this guarantee.
Is your feature request related to a problem? Please describe.
Rust integer and floating-point literals may have a suffix (e.g. 0u8
, 42i32
or 1f32
). The full list of suffixes include:
u8
, u16
, u32
, u64
, u128
, usize
i8
, i16
, i32
, i64
, i128
, isize
f32
, f64
Everything outside of this list of suffixes is an unrecognised suffix.
Describe the solution you'd like
A diagnostic that identifies unrecognised suffixes.
Example
1u8 + 7u7
Diagnostic
- program ill-formed from 1:7 to 1:10:
unrecognised integer suffix `u7` in integer literal `7u7`
^~
Example
14hello_world
Diagnostic
- program ill-formed from 1:3 to 1:14:
unrecognised integer suffix `hello_world` in integer literal `14hello_world`
^~~~~~~~~~~
Example
42abac
Diagnostic
- program ill-formed from 1:3 to 1:7:
unrecognised integer suffix `abac` in *decimal* integer literal `42abac`
^~~~
Did you mean the *hexadecimal* integer literal `0x42abac`?
Is your feature request related to a problem? Please describe.
Multi-line comments have a start (/*
) and and end (*/
), and unlike C++, can nest. That means that /*/**/
is an unterminated Rust comment.
Describe the solution you'd like
A diagnostic that identifies unterminated comments.
Example
/* hello
glorious
world
Diagnostic
- program ill-formed from 1:1 to 3:8:
unterminated multi-line comment starting with:
/* hello
Example
/*/**/
Diagnostic
- program ill-formed from 1:1 to 1:7:
unterminated multi-line comment starting with:
/*/**/
[note: Rust comments nest]
Is your feature request related to a problem? Please describe.
The diagnostics are all independent types, but the lexer will need to return them via a common type.
Describe the solution you'd like
There should be a variant that has exactly the following alternative types:
float_exponent_missing_digits
float_multiple_radix_points
invalid_identifier
unknown_digit_binary
unknown_digit_octal
unknown_escape_ascii
unknown_escape_byte
unknown_escape_unicode
unknown_token
unrecognised_literal_suffix
unterminated_comment
unterminated_string_literal
Is your feature request related to a problem? Please describe.
A floating-point number with an exponent is required to be followed by digits. Any floating-point literal that ends with e
or E
is ill-formed.
Describe the solution you'd like
A diagnostic that identifies floating-point literals ending with e
or E
.
Example
1.23e + 1E
Diagonstic
- program ill-formed from 1:1 to 1:6:
floating-point literal's exponent does not have digits in `1.23e`
~~~~^
- program ill-formed from 1:8 to 1:10:
floating-point literal's exponent does not have digits in `1E`
~^
Is your feature request related to a problem? Please describe.
A Rust raw identifier must not be any of crate
, extern
, self
, super
, or Self
.
Describe the solution you'd like
Implement a diagnostic that identifies invalid identifiers.
Example
let r#crate = r#extern;
Diagnostic
- program ill-formed from 1:5 to 1:12:
`crate` is not allowed as a raw identifier.
- program ill-formed from 1:15 to 1:26:
`extern` is not allowed as a raw identifier.
Is your feature request related to a problem? Please describe.
A diagnostic is a form of communication between the compiler and a programmer. Helpful diagnostics can make suggestions, and it would also be nice if the documentation was linked to for specific issues, so that the amount of time hacking is reduced, and the problem can be solved faster.
Describe the solution you'd like
Diagnostics should output links to documentation whenever possible.
This will remain an active issue until all diagnostics are considered 'smart diagnostics'.
Is your feature request related to a problem? Please describe.
Rust supports binary, octal, decimal, and hexadecimal based integers. Integers represented using binary should only permit 0s and 1s and octal integers should only permit digits in the range [0, 7].
It is necessary for a Rust lexer to diagnose any digits that aren't a part of the number system being used.
Describe the solution you'd like
There should be two diagnostics:
Each diagnostic should identify exactly one invalid digit, even if there are multiple invalid digits in the string. Only decimal digits outside the respective ranges should be diagnosable.
Example
String:
0b12
Diagnostic:
- program ill-formed from 1:1 to 1:5:
unrecognised digit `2` in binary integer literal `0b12`
^
Example
String
0o189
Diagnostic:
- program ill-formed from 1:1 to 1:6:
unrecognised digit `8` in octal integer literal `0o189`
^
- program ill-formed from 1:1 to 1:6:
unrecognised digit `9` in octal integer literal `0o189`
^
Additional context
As non-decimal integers are prefixed with 0[box]
, decimal integers do not have 'unknown digits'. An integer such as 123f00d
will be diagnosed as an invalid suffix instead.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.