Git Product home page Git Product logo

lingua's People

Contributors

cjdb avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lingua's Issues

Add unterminated string literal diagnostic

Is your feature request related to a problem? Please describe.
String literals are delimited by ", and strings with new characters that are not prefixed by \\ are ill-formed.

Describe the solution you'd like
A diagnostic that informs the user their string literal is not terminated.

Example

print!("hello
);

Diagnostic

- program ill-formed from 1:8 to 1:13:
      unterminated string literal `"hello`

Example

let x = "abc\
def\
ghi";
let y = "jkl
mno";

Diagnostic

- program ill-formed from 4:9 to 5:5:
      unterminated string literal `"jkl`

Add floating-point literal has multiple radix points diagnostic

Is your feature request related to a problem? Please describe.
Floating-point literals may not have more than one radix point in Rust.

Describe the solution you'd like
A diagnostic to notify the user when floating-point literals have multiple radix points.

Example

1.2.3

Diagnostic

- program ill-formed from 1:1 to 1:6:
      floating-point literal `1.2.3` has 2 radix points: it must have at most one.

Move from char to char8_t

Is your feature request related to a problem? Please describe.
Rust is a language that supports UTF-8, which is what char8_t is designed to accommodate. There's now compiler support for char8_t and support in {fmt}.

Describe the solution you'd like
Change all occurrences of char to char8_t, string to u8string, and string_view to u8string_view.

Create an unrecognised escape sequence diagnostic

Is your feature request related to a problem? Please describe.
Rust supports ASCII escapes, byte escapes, Unicode escapes, and quote escapes. As these escapes are well-defined, there needs to be a diagnostic for escapes that are not recognised by the compiler.

Describe the solution you'd like

There should be three ill-formed diagnostics:

  1. unknown ASCII escape*
  2. unknown byte escape*
  3. unknown Unicode escape

*An ill-formed ASCII escape is an ill-formed byte escape that also includes character codes beyond 0x7F. (E.g. b"\xFF" is a byte escape, but not an ASCII escape).

Each diagnostic should identify exactly one unknown escape, even if there are multiple unknown escapes in the same string. This will ultimately require the compiler to report multiple diagnostics if a string has multiple unknown escapes.

Example

String:

"hello, \world!"

Diagnostic:

- program ill-formed from 1:1 to 1:17:
      unrecognised ASCII escape '\w' in string literal "hello, \world!"
                                                               ^~

Example

String:

b"hell\o, \world!"

Diagnostic:

- program ill-formed from 1:1 to 1:18:
      unrecognised byte escape '\o' in string literal "hell\o, \world!"
                                                           ^~
- program ill-formed from 1:1 to 1:18:
      unrecognised byte escape '\w' in string literal "hell\o, \world!"
                                                               ^~

Additional context
Quote escapes don't have any unknown sequences.

Create unknown token diagnostic

Is your feature request related to a problem? Please describe.
Rust tokens are what comprise the Rust lexicon. Anything that is not recognised as such should be diagnosed as an unknown token.

Describe the solution you'd like
A type that diagnoses unknown tokens. If the token has a non-ASCII character in its spelling, this should be added to the diagnosis. Note that since ` is an unrecognised token, this diagnostic should not quote unrecognised tokens using this character.

Example

123 `plus` 456;

Diagnosis

- program ill-formed from 1:5 to 1:6:
      unrecognised token "`"
                          ^

[BUG] is_escape.cpp doesn't test Unicode escapes

Describe the bug
The exhaustive Unicode escape test doesn't actually test anything beyond two characters.

To Reproduce
Steps to reproduce the behavior:

  1. Run the test. Notice that the run-time is much shorter than expected.
  2. Comment out all but the last exhaustive test; notice the run-time is much longer.

Expected behavior
Have all checks enabled.

Additional context
TODO

Add source coordinate types

Is your feature request related to a problem? Please describe.

The compiler front-end will need a way to indicate where source mappings are located in a source file. This can be achieved using coordinates to specific points in a source file, and coordinate ranges.

Describe the solution you'd like

A source coordinate should

  • store a single source line and column
  • model Regular (see source_coordinate() and operator==)
  • model StrictTotallyOrdered (see operator<)
  • interface with {fmt} so that fmt::format("{}", cursor) returns "line:column", where cursor is a source coordinate object, line is an integral value referring to a particular line, and column is an integral value referring to a particular column.
  • model a magma (see operator+)

Although the line and column types should be integral, they should be strongly typed to avoid programmer error.

class [[nodiscard]] source_coordinate {
public:
   /// \brief A strong type alias for the column type
   /// \note explicitly constructible from std::intmax_t
   /// \note explicitly convertible to std::intmax_t
   ///
   using column_type = unspecified ;

   /// \brief A strong type alias for the line type
   /// \note explicitly constructible from std::intmax_t
   /// \note explicitly convertible to std::intmax_t
   ///
   using line_type = unspecified ;

   /// \brief Initialises the object so that it is equivalent to
   ///        source_coordinate{line_type{1}, column_type{1}}.
   ///
   constexpr source_coordinate() = default;

   /// \brief Initialises the column value with the column parameter and initialises the line
   ///        value with the line parameter.
   ///
   constexpr explicit source_coordinate(line_type line, column_type column) noexcept;

   /// \brief Returns the column value.
   ///
   constexpr column_type column() const noexcept;

   /// \brief Returns the line value.
   ///
   constexpr line_type line() const noexcept;

   /// \brief Checks that the column and line values of x are the same as the column and line
   ///        values of y.
   /// \returns x.line() == y.line() and x.column() == y.column()
   ///
   constexpr friend bool operator==(source_coordinate x, source_coordinate y) noexcept;

   /// \brief Checks that the column and line values of x are not the same as the column and line
   ///        values of y.
   /// \returns not (x == y)
   ///
   constexpr friend bool operator!=(source_coordinate x, source_coordinate y) noexcept;

   /// \brief Checks that a source_coordinate is strictly less than another source_coordinate.
   /// \returns true if:
   ///     * `x.line()` is less than `y.line()`
   ///     * `x.line() == y.line()` and `x.column()` is less than `y.column()`
   ///  false otherwise
   ///
   constexpr friend bool operator<(source_coordinate x, source_coordinate y) noexcept;

   /// \brief Checks that a source_coordinate is strictly greater than another source_coordinate.
   /// \returns `y < x`
   ///
   constexpr friend bool operator>(source_coordinate x, source_coordinate y) noexcept;

   /// \brief Checks that a source_coordinate is partially less than another source_coordinate.
   /// \returns `not (y < x)`
   ///
   constexpr friend bool operator<=(source_coordinate x, source_coordinate y) noexcept;

   /// \brief Checks that a source_coordinate is partially less than another source_coordinate.
   /// \returns `not (x < y)`
   ///
   constexpr friend bool operator>=(source_coordinate x, source_coordinate y) noexcept;

   /// \brief Moves x by:
   ///       1. adding y.line() to x.line(), and
   ///       2. (a) adding y.column() to x.column() if y.line() == 0, or
   ///          (b) assigning y.column() to x.column() otherwise
   /// \returns x
   ///
   constexpr friend source_coordinate
   operator+(source_coordinate x, source_coordinate y) noexcept;
};

A source coordinate range should

  • store two source coordinates
    1. a coordinate to the first mapped character
    2. a coordinate past-the-end of the last mapped character
  • model Regular
  • weakly model Range (that is: map some of its interface, but not its semantics)
class [[nodiscard]] source_coordinate_range {
   public:
      /// \brief Constructs a source_coordinate_range.
      /// \param begin The beginning of the source_coordinate range.
      /// \param end The end of the source_coordinate range.
      /// \note begin and end form the half-open interval [begin, end).
      ///
      explicit constexpr source_coordinate_range(source_coordinate const begin,
         source_coordinate const end) noexcept
      [[expects: begin <= end]];

      /// \brief Returns the beginning of the source_coordinate range.
      /// \returns the beginning of the source_coordinate range.
      ///
      constexpr source_coordinate begin() const noexcept;

      /// \brief Returns the end of the source_coordinate range.
      /// \returns the end of the source_coordinate range.
      ///
      constexpr source_coordinate end() const noexcept;

      /// \brief Checks if the range is empty.
      /// \returns true if `begin() == end()`; false otherwise
      ///
      constexpr bool empty() const noexcept;

      /// \brief Checks that two source_coordinate_ranges are equivalent.
      /// \returns true if `x.begin() == y.begin()` and `x.end() == y.end()`; false otherwise.
      ///
      friend constexpr bool
      operator==(source_coordinate_range const& x,
         source_coordinate_range const& y) noexcept;

      /// \brief Checks that two source_coordinate_ranges are not equivalent.
      /// \returns `not (x == y)`
      ///
      friend constexpr bool
      operator!=(source_coordinate_range const& x,
         source_coordinate_range const& y) noexcept;
};

fmt: https://github.com/fmtlib/fmt
magma: https://en.wikipedia.org/wiki/Magma_(algebra)

Create a diagnostic level and diagnostic base

Is your feature request related to a problem? Please describe.
From the design specification:

All diagnostics should store a range of coordinates to source tokens, and a help message. There will be three diagnostic types: ill-formed, warning, and remark.

Describe the solution you'd like

Since the underlying implementation of one diagnostic is the same as another, it makes sense to have a common base for every diagnostic, to:

  1. implement DRY
  2. take advantage of EBO

The base class will rely on a diagnostic level type, so it makes sense to couple it in #3.

Diagnostics are intended to be privately derived from the diagnostic base, although the implementation has no way to make this guarantee.

Create unrecognised literal suffix diagnostic

Is your feature request related to a problem? Please describe.
Rust integer and floating-point literals may have a suffix (e.g. 0u8, 42i32 or 1f32). The full list of suffixes include:

  • u8, u16, u32, u64, u128, usize
  • i8, i16, i32, i64, i128, isize
  • f32, f64

Everything outside of this list of suffixes is an unrecognised suffix.

Describe the solution you'd like
A diagnostic that identifies unrecognised suffixes.

Example

1u8 + 7u7

Diagnostic

- program ill-formed from 1:7 to 1:10:
      unrecognised integer suffix `u7` in integer literal `7u7`
                                                            ^~

Example

14hello_world

Diagnostic

- program ill-formed from 1:3 to 1:14:
      unrecognised integer suffix `hello_world` in integer literal `14hello_world`
                                                                      ^~~~~~~~~~~

Example

42abac

Diagnostic

- program ill-formed from 1:3 to 1:7:
      unrecognised integer suffix `abac` in *decimal* integer literal `42abac`
                                                                         ^~~~
      Did you mean the *hexadecimal* integer literal `0x42abac`?

Add unterminated comment diagnostic

Is your feature request related to a problem? Please describe.
Multi-line comments have a start (/*) and and end (*/), and unlike C++, can nest. That means that /*/**/ is an unterminated Rust comment.

Describe the solution you'd like
A diagnostic that identifies unterminated comments.

Example

/* hello
   glorious
   world

Diagnostic

- program ill-formed from 1:1 to 3:8:
      unterminated multi-line comment starting with:
            /* hello

Example

/*/**/

Diagnostic

- program ill-formed from 1:1 to 1:7:
      unterminated multi-line comment starting with:
            /*/**/
      [note: Rust comments nest]

Implement lexical analysis diagnostic type

Is your feature request related to a problem? Please describe.
The diagnostics are all independent types, but the lexer will need to return them via a common type.

Describe the solution you'd like
There should be a variant that has exactly the following alternative types:

  • float_exponent_missing_digits
  • float_multiple_radix_points
  • invalid_identifier
  • unknown_digit_binary
  • unknown_digit_octal
  • unknown_escape_ascii
  • unknown_escape_byte
  • unknown_escape_unicode
  • unknown_token
  • unrecognised_literal_suffix
  • unterminated_comment
  • unterminated_string_literal

Create floating-point number missing exponent diagnostic

Is your feature request related to a problem? Please describe.
A floating-point number with an exponent is required to be followed by digits. Any floating-point literal that ends with e or E is ill-formed.

Describe the solution you'd like
A diagnostic that identifies floating-point literals ending with e or E.

Example

1.23e + 1E

Diagonstic

- program ill-formed from 1:1 to 1:6:
      floating-point literal's exponent does not have digits in `1.23e`
                                                                 ~~~~^
- program ill-formed from 1:8 to 1:10:
      floating-point literal's exponent does not have digits in `1E`
                                                                 ~^

Create invalid identifier diagnostic

Is your feature request related to a problem? Please describe.
A Rust raw identifier must not be any of crate, extern, self, super, or Self.

Describe the solution you'd like
Implement a diagnostic that identifies invalid identifiers.

Example

let r#crate = r#extern;

Diagnostic

- program ill-formed from 1:5 to 1:12:
      `crate` is not allowed as a raw identifier.
- program ill-formed from 1:15 to 1:26:
      `extern` is not allowed as a raw identifier.

Smart diagnostics

Is your feature request related to a problem? Please describe.
A diagnostic is a form of communication between the compiler and a programmer. Helpful diagnostics can make suggestions, and it would also be nice if the documentation was linked to for specific issues, so that the amount of time hacking is reduced, and the problem can be solved faster.

Describe the solution you'd like
Diagnostics should output links to documentation whenever possible.

This will remain an active issue until all diagnostics are considered 'smart diagnostics'.

Create unknown integer digit diagnostic

Is your feature request related to a problem? Please describe.
Rust supports binary, octal, decimal, and hexadecimal based integers. Integers represented using binary should only permit 0s and 1s and octal integers should only permit digits in the range [0, 7].

It is necessary for a Rust lexer to diagnose any digits that aren't a part of the number system being used.

Describe the solution you'd like
There should be two diagnostics:

  1. Unknown digit in binary integer literal
  2. Unknown digit in octal integer literal

Each diagnostic should identify exactly one invalid digit, even if there are multiple invalid digits in the string. Only decimal digits outside the respective ranges should be diagnosable.

Example
String:

0b12

Diagnostic:

- program ill-formed from 1:1 to 1:5:
      unrecognised digit `2` in binary integer literal `0b12`
                                                           ^

Example
String

0o189

Diagnostic:

- program ill-formed from 1:1 to 1:6:
      unrecognised digit `8` in octal integer literal `0o189`
                                                          ^
- program ill-formed from 1:1 to 1:6:
      unrecognised digit `9` in octal integer literal `0o189`
                                                           ^

Additional context
As non-decimal integers are prefixed with 0[box], decimal integers do not have 'unknown digits'. An integer such as 123f00d will be diagnosed as an invalid suffix instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.