Git Product home page Git Product logo

jamesohortle / unicodehover Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 3.0 12.73 MB

Hover over a Unicode escape in VS Code to see the glyph of the character, its description and a link to its webpage!

License: MIT License

TypeScript 53.47% JavaScript 11.29% Python 16.87% TeX 3.43% Java 3.75% CSS 0.70% HTML 3.48% Haskell 7.01%
vscode-extension unicode unicode-data tool visual-aid vscode unicode-escape python javascript typescript

unicodehover's Introduction

UnicodeHover

Demo of UnicodeHover

CircleCI

UnicodeHover lets you see a glyph of the character represented by a Unicode escape. Let's say you have a regex to remove all non-printable characters, such as DEL.

unprintables = re.compile(r"[\u0000-\u001f]")

Or perhaps your coworkers don't have the necessary fonts to display glyphs and have used escapes so that their editors don't show them mojibake or � (U+FFFD).

Maybe your favourite language doesn't even support Unicode literals in source files after a certain version (looking at you, Haskell) so you have to represent them with an escape.

In any case, it would be handy to immediately have information on the characters being processed instead of, e.g., going to an external website and searching for the codepoint.

Usage/Features

Simply place your cursor over the escape sequence and a panel will hover over it, showing you the glyph in question.

  • Recognizes the code points as used by the Unicode Consortium (U+ followed by 4 to 6 hexadecimal digits) in any file.
  • Recognizes Unicode escape sequences in Python, JavaScript (TypeScript), TeX (LaTeX), Java, HTML, CSS and Haskell files.
  • Renders a glyph of the character using a system font (see Requirements).
  • Provides a one-line description of the character in English.
  • Provides a link to the Unicode Table page (no affiliation) for the character for further information.

Requirements

  • Works on Python, JavaScript, TypeScript, TeX, LaTeX, Java, HTML, CSS and Haskell files.
  • Needs a system font that defines a glyph for the character to be displayed.

Known Issues

Pull requests (in particular for new languages) are welcome.

  • Non-printable characters, by definition, do not normally have glyphs associated with them and so usually no glyphs will be displayed. However, the description and link will still be shown. If a font somewhere on your system defines a glyph for a non-printable character, it will be displayed.
  • The hover for Haskell currently does not work well; see #12. Any help on this is gladly welcomed!

Release Notes

See the changelog.

Data sources

The data for this project were taken from the Unicode Consortium's Unicode Data collection. The data follows their licensing (cf.: terms of use).

The pronunciations for Tangut are from Tangut database v4.0 and are the work of Marc Miyake, used here with his permission.

License

This extension is intended to be used by any- and everyone. It uses the MIT License.

About the icon

The character in the icon is U+1234 ETHIOPIC SYLLABLE SEE, which is part of the Geʽez script used for several Ethiopic languages, in particular Amharic. Although SEE is most likely pronounced seː, it represents the idea that you can "see" glyphs as easily as 1, 2, 3(, 4). It also just looks pretty!

Thanks to Misato Inoue for design help with the icon!

unicodehover's People

Contributors

dependabot[bot] avatar jamesohortle avatar somarlyonks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

unicodehover's Issues

TexHover will not work without LaTeX-Workshop

Despite

"activationEvents": [
	"onLanguage:python",
	"onLanguage:javascript",
	"onLanguage:latex"
],

in package.json, the extension fails to detect \char, \Uchar and ^^... escapes in .tex files.

Adding

"languages": [{
        "id": "latex",
        "extensions": [ ".tex" ],
        "aliases": [ "latex", "LaTeX", "tex", "TeX" ]
}]

did not help.

Provide brief description of character (useful if character is non-printable)

We should provide information next to the glyph of the short English name given to the codepoint.

This is useful for non-printable characters, control characters and characters for which no font is available with the required glyphs.

Example format:

Null # Non-printable so need description.
  Space # Whitespace, so need description to distinguish between, say, space and thin non-break space
ሴ Ethiopic Syllable See # Description is nice
இ Tamil Letter I # Description is nice
࿚ Tibetan Mark Trailing Mchan Rtags # No font available for rare character
傷 Ideograph wound, injury; fall ill from CJK # Description is nice
࿚ Cat Face with Tears of Joy # No font found

`U+0023 NUMBER SIGN` (#) etc. affects the markdown string

Hovering over

"\u0023"

leads to

Number Sign (UnicodeHover)

being rendered. I.e., we make the markdown string via

new vscode.MarkdownString("# Number Sign (UnicodeHover)");

which leads to a title being rendered.

All symbols that control how markdown is displayed need to be escaped.

Improve Unihan (in particular Vietnamese)

There is a paucity of data for Unihan (Unified CJK Ideographs) and some things should be improved:

  1. The name Unified CJK does not include V for Vietnamese, despite roughly 9,000 ideographs being used in Chữ Nôm (and half of those used only in Chữ Nôm).
  2. I personally find it frustrating when (No description) is shown for CJKV that has readings/meanings. Perhaps we can provide an option to show readings (enable per language?) instead of showing nothing at all or in addition to any description.

Add support for NerdFont glyphs

As a user of NerdFont glyphs and unicode escape sequences,
I want to see the glyphs for NerdFont codepoints in the hover tooltip
So that I can more easily read Oh-My-Posh templates (for example, see Agnoster in the oh-my-posh repo)

Haskell support is thwarted by vscode-haskell

Haskell support is being implemented in branch haskell.

As with other languages, a language support extension that interacts with a language server is required for parsing of source files. In this case, vscode-haskell and haskell-language-server are needed.

Unfortunately, the language extension's hovers appear to blocking ours.

Consider the following test file (src/test/test.hs):

module Main where

import Lib ()

main :: IO ()
{-
 - U+0000 -> Null.
 - U+0020 ->  Space.
 - U+1234 -> ሴ Ethiopic Syllable See.
 - U+0B87 -> இ Tamil Letter I.
 - U+0FDA -> ࿚ Tibetan Mark Trailing Mchan Rtags.
 - U+50B7 -> 傷 Ideograph wound, injury; fall ill from CJK.
 - U+1F639 -> 😹 Cat Face with Tears of Joy.
 - \0000
-}

main =
  putStrLn "\0000"

In Output, under Haskell (UnicodeHover) we can see the below when we hover over putStrLn's argument.

2020-09-09 22:13:09.250435 [ThreadId 207] - Hover request at position 18:15 in file: /Users/jim/UnicodeHover/src/test/test.hs
2020-09-09 22:13:12.031032 [ThreadId 195] - finish: CodeAction (took 0.00s)
2020-09-09 22:13:12.03127 [ThreadId 195] - finish: CodeAction:PackageExports (took 0.00s)
2020-09-09 22:13:17.739479 [ThreadId 213] - DocumentHighlight request at position 14:9 in file: /Users/jim/UnicodeHover/src/test/test.hs
2020-09-09 22:13:17.977093 [ThreadId 195] - finish: CodeAction (took 0.00s)
2020-09-09 22:13:17.97733 [ThreadId 195] - finish: CodeAction:PackageExports (took 0.00s)
2020-09-09 22:18:04.452604 [ThreadId 219] - GhcIde.hover entered (ideLogger)
2020-09-09 22:18:04.452924 [ThreadId 219] - Hover request at position 18:14 in file: /Users/jim/UnicodeHover/src/test/test.hs

The first and last lines of the log above indicate that vscode-haskell is indeed parsing the files correctly, but appears to be blocking UnicodeHover from adding a hover panel.

The output looks like this:
Screen Shot 2020-09-09 at 10 24 13 pm

If we hover over the comment, the UnicodeHover panels do appear, but they are displaced all the way above the start of the comment.
Screen Shot 2020-09-09 at 10 22 07 pm

We should get in touch with the vscode-haskell team to figure out how to fix this.

Page without advertising, Unicode URL

Hello,

It is better if the URL for the Unicode page is changed slightly in the unicodeHoverUtils.js.
The current site symbl.cc or unicode-table.com has a lot of advertising.

I changed it like this and it works very well:

- const externalLink = "https://unicode-table.com/en/" + (codePoint <= MAX_CODE_POINT ? codePoint.toString(16).toUpperCase() : "");
+ const externalLink = "https://compart.com/unicode/u+" + (codePoint <= MAX_CODE_POINT ? codePoint.toString(16).toUpperCase() : "");

Many greetings
Detlef Paschke

Support Java

Super easy to do, apparently. Match using regex below.

/\\u+[a-fA-F\d]{4}/
  • Java doesn't support Unicode escapes above the 16 bit range (i.e., escapes that require 5 or more hex digits), and uses surrogate pairs instead. Hence, we need only match exactly 4 hex digits.
  • Multiple us are supported. This is for code that has been ASCII-fied so that escape sequences in the original UTF-X encoded files can be indicated (they start with \uu; newly-escaped characters will be \u in the ASCII-converted source).

Add option to display all string

if option is true

const str = "\u0061\u0062\u0063\u0064\u0065"

hover the \u0061 or other characters in the str

show all string

"abcde"

Provide better info for Tangut

Although Tangut is extinct and only reconstructed phonologies exist, it would be nice to find a reputable source for phonologies and include them. Currently Tangut characters have unhelpful descriptions:

𗀀 Tangut ideograph L2008-0008 (UnicodeHover)

Desired:

𗀀 Tangut ideograph dow1 (L2008-0008) (UnicodeHover)

Furthermore, we should probably figure out how to better transmit the reference/source for the characters, defined here.

This website Tangut.info provides phonetics for characters from Lǐ Fànwén's books 夏漢字典 (1997 & 2008 eds).

We may be able to request use of the data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.