redprl / asai Goto Github PK

View Code? Open in Web Editor NEW

33.0 33.0 1.0 1.1 MB

🩺 A library for compiler diagnostics

Home Page: https://ocaml.org/p/asai

License: Apache License 2.0

OCaml 99.32% Standard ML 0.68%

algebraic-effects diagnostics error-handling error-reporting ocaml

asai's People

Contributors

Stargazers

Watchers

Forkers

kit-ty-kate

asai's Issues

:bulb: Intermediate representation

open Bwd

(** Styles *)
type style = [`Context | `Highlight | `Mark]

(** A segment is a styled string without control characters. *)
type marked_string = style * string

(** A line is a list of segments. *)
type marked_line = marked_string list

(** A block is a collection of consecutive lines. *)
type marked_block = { start_line_num : int; text : marked_line list }

(** A file consists of multiple blocks. *)
type marked_file = { file_path : string; blocks : marked_block list }

(** A multi-span consists of all formatted spans across multiple files. *)
type marked_message = marked_file list * Asai.Diagnostic.message

(** a message *)
type t =
  { code : string
  ; severity : Asai.Severity.t
  ; message : marked_message
  ; traces : marked_file bwd
  }

Current LSP backend is not compositional

It should be noted that autocompletion, semantic tokens, document synchronization, go to definition/declaration, find references, hover, etc. are all out of the scope of asai. The current LSP backend is exclusive and makes it difficult to create an LSP server that has error reporting and other features. This has to be changed.

⌨️ No way to generate spans for input strings at an interactive prompt

Yet another great question/suggestion from @mikeshulman. The library is expecting a file_path in a position, but there's no "file" at interactive prompts. While the internal code is highly modularized and file I/O is isolated, the current external API bundles the file I/O. Here are some concrete fixes I can think of:

Rename Position.file_path to Position.source and make its type polymorphic (so that we will have 'source Span.t). Downsides: too many type variables can reduce usability.

Still rename Position.file_path to Position.source but fix its type to the following:

type source = [`File of string | `String of string | `Uri of string ]

In either way, a more generic backend which can handle strings from interactive prompts should be made available. This can be as simple as taking yet another optional argument to handle URIs (if we want to allow them).

↩️ Displaying newlines and EOFs?

Inspired by @mikeshulman's suggestion, perhaps it's useful to display newline characters and/or the end-of-file. I can think of a few choices:

Newline: ␤ ⏎ ↲ ↵
End of file: ␄ ⌁ ¶ ⏹

🚨 TTY backend: write things to stderr or make it configurable

Register printers for all unhandled exceptions and effects

Navigate additional messages in the TTY terminal app

🆕 The string-based API (not the TTY backend) should convert all Unicode newline sequences to `pp_forced_newline`

:bulb: Format.dprintf-less interface

Using Format.kdprintf, it should be possible to provide an alternative interface that takes format strings directly. For example:

E.fatal "number %i is too cool" 42

Update LSP code

🔖 Collect and list related works

Error Handling

Visualizer

Rust
Clang compiler
Ocaml packages
- OCaml compiler's built-in error message displayer
- https://github.com/johnyob/grace

:bulb: Dream API for error reporting (2022/7/28 draft)

Changes:

merge info and warning into print

Based on the Discord discussion:

E.tracef ?loc "when@ loading@ module@ %s" name @@ fun () -> ...
;;
E.printf ~code:`GalaxyNumber ?loc "number@ %i@ is@ too@ large" very_small_number
;;
E.fatalf ~code:`TypeError ?loc "%a@ does@ not@ have@ type@ %a" pp_tm tm pp_ty ty
;;
E.printf ~code:`EmojiError ?loc ~marks:all_occurrences
  "emoji@ %a@ is@ used@ more@ than@ %i@ times."
  pp_emoji emoji threshold
;;
E.messagef ~code:`TypeError ?loc
  "%s@ has@ type@ type@ %a,@ but@ we@ expected@ it@ to@ have@ type@ %a."
  var pp_tp actual pp_tp expected
|> E.mark [binding_loc1; binding_loc2]
|> E.fatal
;;
E.messagef ?loc ~code:`ChiError "variable@ name@ %s@ does@ not@ have@ any@ emojis." var
|> E.fatal ~marks:[]
;;
E.printf ?loc ~code:`ChiInfo "raise@ %s@ here." "CCHM"

In sum, we should have these functions

(repeated) tracef to construct a backtrace
messagef to construct a message
mark to add, well, marks
fatal(f) and print(f) to log something, and all four variants can take ?marks
fatal(f) intends to end the program after printing out the message.
The -f functions always take ?loc.
No significance of the ordering of marks (via ?marks and/or calls of E.mark).

💅 Implement a handler based on grace

@johnyob has implemented the Rust-style diagnostic rendering at ~~https://gitlab.com/alistair.obrien/grace~~ https://github.com/johnyob/grace.

On the surface, it looks straightforward to adapt it into an asai diagnostic handler. Of course, the renderer will not satisfy our Level-2 display stability requirements, but I think we can implement a renderer with lower display stability requirements, which has the benefits of not relying solely on colors to highlight spans.

:bulb: Clarify or redesign `causes`

In the LSP protocol, "related info" is closer to multi-spans, while "causes" are currently used as the backtrace in the unix backend. These two seem to be semantically distinct, and ideally we should capture semantic differences in the Ultimate API.

:notebook_with_decorative_cover: Dream API for (single) spans and locations

This is a redesign of src/core/Loc.mli. Changes:

Consolidate everything into a single module (which will be renamed to Asai.Span).
Exposed the position type, and change filename to file_path.
Removed many helper functions that are not used, too back-end specific, or should probably be avoided. (height does not make sense in the presence of wrapping, and line_numbers will be useless once wrapping is supported in the terminal backend.)
Removed the prefix utf8_ because we will not support other encodings anyways.
Removed the slicing functions that need to access file content.

(** {1 Types} *)

(** The type of positions *)
type position = {
  file_path : string;
  (** The absolute file path of the file that contains the position. *)

  offset : int;
  (** The byte offset of the position relative to the beginning of the file. *)

  start_of_line : int;
  (** The byte offset pointing to the beginning of the line that contains the position. *)

  line_num : int;
  (** The 1-indexed line number of the line that contains the position. *)
}

(** The abstract type of spans. *)
type t

(** {1 Builders} *)

(** [make start end_] builds the span [[start, end_)] from a pair of positions [start] and [end_].

    @raise Invalid_argument if the positions do not share the same file path or if [end_] comes before [start]. The comparison of file paths is done by [String.equal] without any path normalization.
*)
val make : position -> position -> t

(** [of_lex_pos pos] conversion [pos] of type {!type:Lexing.position} to a {!type:position}. The input [pos] must be in byte-indexed. (Therefore, [ocamllex] is compatible, but [sedlex] is not because it uses code points.) *)
val of_lex_pos : Lexing.position -> position

(** [to_positions span] returns the pair of the start and end positions of [span]. *)
val to_positions : t -> position * position

(** {1 Accessors} *)

(** [file_path span] returns the file path associated with [span]. *)
val file_path : t -> string

(** [start_line_num span] returns the 1-indexed line number of the start position. *)
val start_line_num : t -> int

(** [end_line_num span] returns the 1-indexed line number of the end position. *)
val end_line_num : t -> int

(** {1 Auxiliary types} *)

type 'a located = { span : t option; value : 'a }

🖥️ Explicitly display the severity instead of using only colors

The Tty backend should have explicitly printed out the severity (e.g., "ERROR") instead of only relying on highlighting colors. This was suggested by @mikeshulman and I strongly agreed! I guess the remaining question is to find a good place in the Unicode art to add it:

Current output with location information:

    ╒══ examples/stlc/example.lambda
    │
  1 │ line1
  2 │ line2
    ┊
 20 │ line20
 21 │ line21
    ┷
 [E002] Message line 1
        Message line 2

Current output without location information:

 [E002] Message line 1
        Message line 2

Debugging support

Based on some private conversation, I believe what we want is some support for developers to debug. @mmcqd

The current design is to make users happy, but we may also want to support spewing out more debugging information.

Better printing of multi-line messages

Currently, the registered printer and Asai_tty do not handle multi-line messages well/correctly.

↔️ Rename `Span` to `Range`?

It seems LSP and grace are using Range instead of Span.
"Byte ranges" appear to be much more popular than "byte spans".
No one in the development team objects... yet.

For 0.2.0, Span will only be deprecated, before its removal in later versions.

Publish to OPAM

Documentation
- More example code
- #43
README
#57

We should probably also publish algaeff 1.1 which comes with public register_printer...?

Document complexity of flattening

➡️ The string-based API and the TTY backend should expand horizontal tabs `U+0009` to spaces

We would like to print Python and Go code!

📛 Allow string source to have a title

This should solve some rendering issue

📚 Document "Info" and "Hint" and maybe other severity levels

This is another good question/suggestion by @mikeshulman. These two I believe were copied from the language server protocol (LSP) @TOTBWF. Is there a source of these terms? The official LSP specification does not seem to explain them?!

Let me just write down what I thought they are:

Bug: something went terribly wrong, and it's a bug; a kind end user may consider notifying the implementer
Error: something went terribly wrong, but it's not a bug. The error was caused by external factors (e.g., no internet access) or users (e.g., syntax errors).
Warning: something might have gone wrong, but it's okay
Hint: some extra information directly related to the code, such as type information and refactor suggestion.
Info: other extra information (e.g., the type checking was done!)

Review UTS 55 Section 4: Source Code Display

UTS 55 Section 4: Source Code Display

The Tty render sometimes fails with `Fatal error: exception Invalid_argument("Bytes.create")`

I have not created a minimal reproduction of this yet.

✂️ Separate LSP handler into a standalone package

Reasons:

This handler is currently not complete: UTF-16 encoding issues, etc.
This handler needs significantly more documentation: #1
This handler is currently not compositional: #42
This handler depends on two heavy packages eio and lsp, preventing the "core" asai from being integrated into lightweight applications.

💬 Alternative API for (fully) structured messages?

Yet³ another question/suggestion from @mikeshulman. The current asai library wants the implementer to directly write out the message when sending it. The other design is usid in cooltt, whene we send "structured" errors of this type:

type t =
  | MalformedCase
  | InvalidTypeExpression of CS.con
  | ExpectedSynthesizableTerm of CS.con_
  | CannotEliminate of Pp.env * S.tp
  | ExpectedSimpleInductive of Pp.env * S.tp
  | InvalidModifier of CS.con
  | ExpectedFailure of CS.decl

And then there's a pretty printer to turn these structured errors into elaborated messages. I am not sure how to have an API that works (nicely) for both, but at very least the design decision to "write the (unstructured) message when sending it" should be documented. This reminded me of the old debate between catgets and gettext, except that we are using high-level, rich constructors from OCaml types instead of integers in the case of catgets.

Any suggestions for redesigning the API?

🔤 Terminal captions for the TTY backend

:bulb: Use mmap (`Unix.map_file`)

I consider it memory leak when whole content of closed files still occupy the memory. I wonder if we could switch to a Unix.lseek-based implementation?

🆕 The TTY backend should provide different modes for handling newlines

The Unicode-compliant mode: recognize all newline sequences:

The traditional mode that is compatible with the LSP: only CR, LF, and CRLF are recognized.

In either mode, report any inconsistency.

Apply the severity/code/message Thought

The following design, upon approval, should be carefully documented and applied everywhere:

emit/fatal: whether we should continue (emit) or interrupt (fatal) the code in the server after the message
severity: how the message should be displayed or decorated in the client
code: a short identifier that is easy for the end user to look up in the documentation; it is not for humans to read, but for a search engine to locate relevant documents quickly
message: a human-readable, detailed explanation

Currently, the STLC example is violating the design, and documentation is lacking.

🎨 Redesign TTY output

The current handler is not ideal:

It doesn't show the extra_remarks.
It probably occupies too much space, especially the mode that shows everything.
It does not support customization of characters and colors.
Maybe (?) it should display zero-width spans anyway?

Concrete suggestions: remove "empty lines" between blocks; instead, mix remarks/messages with content from end users (while maintaining Level-2 display stability).

Support Progress

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#progress

:bulb: Single-message design

Currently, the unix backend prints out something like this:

Error [E123]: This is the high-level description of the error
that is hopefully more understandable.

[cool Unicode art]
    **dubious code**
      ^^^^^^^^^^^^
      This line is wrong!
[cool Unicode art]

I think we can do the following instead for diagnostics with ranges:

[cool Unicode art]
    **dubious code**
      ^^^^^^^^^^^^
      [E123] This is the high-level description of
      the error that is hopefully more understandable.
[cool Unicode art]

Benefits:

simplified API (especially for the dprintf-less interface) and
focused UI with minimum eye movement.

Bonus: compatibility with LSP.

LSP backend: support the `source` property

:bulb: Core should be conceptually LSP-independent

As a general principle I think we should design our interface to match what we think is correct, not what the current LSP protocol wants us to do. For example, an LSP diagnostic always requires the following data:

A file URI.
A range within that file.

For 1, if no file URIs make sense, the current workaround is to use the project file or a folder. For 2, if no ranges make sense, the current workaround is to use the fake span [0,0) (the very beginning of the file). All of these hacks are LSP-specific and I propose burying them in Asai_lsp not in Asai. This applies to the entire library---I believe it is better to question the LSP protocol (or any existing protocol) more.

Actionable item: move all LSP-related hacks to Asai_lsp, or at least hide them from the public interface.

Report multiple unrelated errors at once?

Accept newlines `\n` in the terminal backend

Currently, the terminal backend is not happy with control characters (for good reasons), but maybe it is more user-friendly to turn \n into pp_force_newline especially for beginners.

@mmcqd Maybe we can have a raw option to turn off the conversion?

An example showing how error messages from a library can be embedded

It's one of the design goals to trivialize the embedding of error messages from a library into the main application. The end users should not be able to (easily) tell whether it's some library or the main program generating the error message. The current API already provides the mechanism to do so, but adding an example should help us examine the current design.

Support GitHub workflow commands

https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-a-notice-message

🧮 Helper functions to recover byte offsets?

I recently learned that the position type from Fmlib_parse does not contain byte offsets. While I think it is more efficient if the parser library can provide byte offsets directly, we can also provide some (inefficient) helper function recovering the information by scanning the content from the start, counting the newline sequences.

This function should be added along with, or after the resolution of issue #87.

Yet² another good suggestion/question from @mikeshulman.

🖥️ Add example configurations for LSP servers

It would be a good idea to add some example configurations for various editors, so we could just copy+paste when using asai.lsp in other projects.

Unfortunately, the situation with lsp-mode is pretty grim; see emacs-lsp/lsp-mode#3625

📚 Specify odoc >= 2.0

We now use @include, an odoc extension, in multiple places.

Recheck "grapheme clusters" and "code points"

🔍 Test Explicator

🔰 Is the quickstart tutorial good enough?

Currently, there's a quickstart tutorial. Is it good enough? Only a real experiment can tell. @jonsterling I think it's time to try out this library!

🦀 Review compatibility with Rust

This library is heavily influenced by LSP and is not 100% compatible with the Rust design. Perhaps it's good to review the Rust design document and note the differences:

https://rustc-dev-guide.rust-lang.org/diagnostics.html

🗞️ A different name for `Logger`?

I'm a bit concerned that someone will try to use this library together with another library for logging (e.g., syslog), rendering our use of the name Logger at odds. @TOTBWF suggested using the name Reporter instead, which I think is very reasonable. I wonder if there's any other suggestion? The resolution of this issue shall be implemented in 0.2.0.

🔣 The TTY backend should eat up or replace all control characters and LS `U+2028` and PS `U+2029`.

I have two proposals for the replacement:

□
�

One exception is CR. To handle CRLF smoothly, CR should probably just be removed. Another is (horizontal) tabs, which is discussed in #66.