Git Product home page Git Product logo

go-runewidth's Introduction

go-runewidth

Build Status Codecov GoDoc Go Report Card

Provides functions to get fixed width of the character or string.

Usage

runewidth.StringWidth("つのだ☆HIRO") == 12

Author

Yasuhiro Matsumoto

License

under the MIT License: http://mattn.mit-license.org/2013

go-runewidth's People

Contributors

dmitshur avatar hymkor avatar itchyny avatar johejo avatar klauspost avatar kurehajime avatar markus-oberhumer avatar mattn avatar samwhited avatar schachmat avatar shogo82148 avatar tklauser avatar tty2 avatar wedaly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-runewidth's Issues

Conflict with rivo/uniseg

I am trying to install go fiber 2.40.0 using gov1.15.
I encountered an error saying something like:

github.com/mattn/[email protected]/runewidth.go:7:2: found packages uniseg(doc.go) and main (gen_breaktest.go) in ...

Has anyone else ever encountered this before?

Width of Box Drawing characters and LANG=zh_CN.UTF-8

Hello!

I maintain a golang library for drawing ASCII tables at https://github.com/jedib0t/go-pretty and this is one of the few dependencies I have, to calculate rune width for drawing the tables. Sample: https://go.dev/play/p/I6uxssyXxhN?v=goprev

Now, a couple of users reported some alignment issues, and after some investigation I figured that the Width returned for Box Drawing characters were not the expected values when LANG=zh_CN.UTF-8 or when EastAsianWidth=true is set in go-runewidth.

To replicate the bug, I create this program -- say foo.go:

package main

import (
	"fmt"
	"strings"

	"github.com/mattn/go-runewidth"
)

func main() {
	boxDrawingChars := []string{
		"+", "-", "=",
		"┏", "┳", "┓",
		"┣", "╋", "┫",
		"┗", "┻", "┛",
		"━", "┃",
	}

	cellWidth := 8
	for _, boxDrawingChar := range boxDrawingChars {
		padding := strings.Repeat(" ", cellWidth-runewidth.StringWidth(boxDrawingChar))
		fmt.Printf("| %s%s |\n", boxDrawingChar, padding)
	}
}

Output:

$ LANG=en_US.UTF-8 go run foo.go
| +        |
| -        |
| =        |
| ┏        |
| ┳        |
| ┓        |
| ┣        |
| ╋        |
| ┫        |
| ┗        |
| ┻        |
| ┛        |
| ━        |
| ┃        |

$ LANG=zh_CN.UTF-8 go run foo.go 
| +        |
| -        |
| =        |
| ┏       |
| ┳       |
| ┓       |
| ┣       |
| ╋       |
| ┫       |
| ┗       |
| ┻       |
| ┛       |
| ━       |
| ┃       |

Is this behavior right, or am I using runewidth.RuneWidth/StringWidth incorrectly?

should rune like tab `\t` have width?

currently on my Linux machine it's 0, and in terminal it's 8, but for most of the IDE, it's customizable.
i don't know if there's other char like this and should i just define the width of it my self?

Feature request: Add support for zero-width-joiners

It would be great if you could add support for zero-width joiners (ZWJ). I have the following code example which doesn't work as expected:

package main

import (
	"fmt"

	runewidth "github.com/mattn/go-runewidth"
)

func main() {
	e := "👨‍👨‍👧"
	r := []rune(e)
	var widths []int
	for _, c := range r {
		widths = append(widths, runewidth.RuneWidth(c))
	}
	fmt.Printf("%s : len=%d numrunes=%d width=%d widths=%v runes=%X\n", e, len(e), len(r), runewidth.StringWidth(e), widths, r)
}

The output is:

👨‍👨‍👧 : len=18 numrunes=5 width=6 widths=[2 0 2 0 2] runes=[1F468 200D 1F468 200D 1F467]

Specifically, width should be 2 instead of 6. I found this article which explains how they work. It does not only affect emojis but also characters in some languages.

This came up in rivo/tview#161. It would be great if support for ZWJ could be added so I can implement support for these Unicode characters in tview. I understand that not all kinds of combinations are supported and it's probably difficult to figure out which ones are. But assuming these characters are supported will help a lot. I don't expect users to try to print ZWJ combinations which are not supported anyway.

Thanks!

Erroneous interpretation of Na leads to width-zero mathematical symbols

According to the Unicode® Standard Annex #11 Na stands for narrow:

ED5. East Asian Narrow (Na): All other characters that are always narrow and have explicit fullwidth or wide counterparts. These characters are implicitly narrow in East Asian typography and legacy character sets because they have explicit fullwidth or wide counterparts. All of ASCII is an example of East Asian Narrow characters.

Therefore, the characters that are currently considered to belong to the nonassigned table should have width 1, not width 0.

Two of these characters are commonly used in quantum mechanics: |α⟩⟨α|

EDIT: This issue is fixed by #44. Please merge that PR.

Is width for EN DASH intended to be 2 instead of 1?

Hi,

Consider the following three similar unicode characters:

'-' - Unicode Character 'HYPHEN-MINUS' (U+002D)
'–' - Unicode Character 'EN DASH' (U+2013)
'—' - Unicode Character 'EM DASH' (U+2014)

From shurcooL/markdownfmt#7 (comment), I've learned that go-runewidth considers the width of the first character to be 1, and the width of second and third characters to be 2.

Is that intended?

I'm not sure how to test this reliably, but in most environments it seems that EN DASH has width that's closer to 1 than 2.

Any thoughts on this?

hello,I have a problem.

bash-3.2$ go get -u -d github.com/coreos/etcd/...
# cd .; git clone https://github.com/mattn/go-runewidth /Users/admin/go/src/github.com/mattn/go-runewidth
fatal: could not create work tree dir '/Users/admin/go/src/github.com/mattn/go-runewidth': Permission denied
package github.com/mattn/go-runewidth: exit status 128

Semantic Versioning: `ZeroWidthJoiner` Removal

ZeroWidthJoiner was removed after v0.0.9: https://github.com/mattn/go-runewidth/blob/v0.0.9/runewidth.go#L14

The next version was v0.0.10, but this introduced a breaking API change.

While being v0 means you can introduce breaking API changes, would it be possible to get a v1 release that can ensure API stability?

It's fine to just keep cutting new versions when API changes happen, but right now it makes managing Go Module dependencies rather painful, since it just assumes patch versions don't introduce breaking changes.

Define width?

This is a question about how you are defining "width"? I'm mostly looking for a solution that gives me character width in monospaced fonts. So example in #39 and #36, the "width" would still be 2 as a flag although is considered 1 character in modern renders, it still takes up the space of 2 normal characters.

Regional Indicators (Flags) and Grapheme Clusters

Here's a short example that illustrates an issue with flags (or "regional indicators"):

fmt.Println(runewidth.StringWidth("🇩🇪")) // Should be "2", outputs "4".

The flag consists of two code points which are processed separately by runewidth. But most modern systems will combine them into one flag emoji.

This is part of a larger topic which I describe in more detail here: gdamore/tcell#264. It doesn't just affect flags but also characters in e.g. Arabic and Korean where there are more sophisticated rules than "combining characters" and zero-width joiners (which you added with #20).

I don't know exactly how you calculate the widths of characters. I'm also not sure how you would solve flags as well as some of the other rules described in the Unicode specification but it would sure be nice as printing these flags currently gives me trouble in tview. There have been multiple issues asking for better support for different languages and emojis so it seems that there are quite a few people who use the terminal with these characters.

(Maybe my new package uniseg can help you here.)

make ngrok release-server error

src/github.com/mattn/go-runewidth/runewidth.go:7:2: found packages uniseg (doc.go) and main (gen_breaktest.go) in /root/ngrok/src/github.com/rivo/uniseg
make: *** [Makefile:8: deps] Error 1

License

Which license did you adopt for this product? Thanks.

Wrong width for flag symbols

runewidth.StringWidth(🇩🇰) returns 2.

I haven't looked into this at all, and I have no idea what I should expect, but a width of 1 seems reasonable.

incorrect rune width for box drawing characters in east asian encoding

When using an east asian encoding, the following runes are given a width of 2 but they should be 1: ─┌└┐┘│.

To reproduce:

export LC_CTYPE="ja_JP.UTF-8"
(in go program)
runewidth.RuneWidth('─') // returns 2

looking at the runewidth_table.go file, the culprit is {0x24EB, 0x254B} in the ambiguous table. I'm not sure how to update this; the file is auto-generated.

In terminal apps which render box characters this can lead to broken rendering:
image

Let me know if there's anything else I can add. Thanks :)

Rune width of certain CP437 chars like ♦ is 2 instead of 1

I'm trying to port an old DOS program using tcell (which uses RuneWidth). My program has a table mapping CP437 char code to rune, and then I print that rune to the screen. I'm in the terminal with fixed width fonts, so I expect all chars to be the same width.

The issue is RuneWidth('\u2666') and some other characters is returning width 2 instead of 1, which makes tcell allocate 2 chars for it and causes "gaps" in the rendering. Here's playground code showing which chars do this: https://play.golang.org/p/Hjq3GOC0Pcd -- output is:

RuneWidth('☺') = 2
RuneWidth('☻') = 2
RuneWidth('♥') = 2
RuneWidth('♦') = 2
RuneWidth('♣') = 2
RuneWidth('♠') = 2
RuneWidth('♂') = 2
RuneWidth('♀') = 2
RuneWidth('♪') = 2
RuneWidth('♫') = 2
RuneWidth('☼') = 2
RuneWidth('↕') = 2
RuneWidth('‼') = 2
RuneWidth('↔') = 2

I believe it's happening because these are treated as Emoji characters. Is this behavior expected? If so, how do I work around this in tcell?

Variation Selectors 1 - 256 Report Width = 1

Variation Selectors 1-256 (Unicode range 0xFE00-0xFE0F and 0xE0100-0xE01EF report as width = 1. These are nonprintable characters and should report width 0. I think it would make sense to add them to the nonprint table. I can submit a PR if that sounds good.

The width of Box-drawing characters

Check this for the definition of box-drawing (BD below) characters.

I found that these characters are defined to be of ambiguous width, so passing these to RuneWidth returns 2 in my environment. This is somehow inconvenient since AFAIK, terminal fonts tend to interpret BD characters in half-width.

Is it possible to remove these characters from the ambiguous table? I can make the PR if you think this sounds sane.

Thanks.

`─` not expect

func main() {
	b := `─` // unicode 0x2500
	fmt.Println(runewidth.StringWidth(b))
}

on windows/mac get: 2
on linux get: 1

The go1 tag is out of date; can you update or remove it?

Hi,

Currently, your go1 tag points to commit ce86f93. So when someone does go get -u github.com/mattn/go-runewidth, it will check out that revision.

However, you have newer commits that add Truncate and fix bugs on master, that are not available:

go1...master

Can you either update go1 tag to point to latest stable version (I'm guessing 39104c7), or simpler yet, remove it and let master be the latest go gettable version. You can use feature branches for development and merge them into master when they're ready.

I'm guessing this was an unintended situation, but please let me know if that's not the case. Thanks.

Wrong width reported for some characters

It appears that StringWidth reports the length of certain runes incorrectly. The problem seems to be centered around languages used primarily in India (Tamil, Telugu, and Hindi are examples).

Sample program that shows the problem:

package main

import (
	"fmt"
	"github.com/mattn/go-runewidth"
	"strings"
)

func main() {
	words := []string{
		"English",
		"हिन्द",
		"தமிழ்",
		"ไทย",
		"עברית",
	}

	for _, w := range words {
		max := 12 - runewidth.StringWidth(w)
		fmt.Printf("|%s%s|\n", w, strings.Repeat(" ", max))
	}
}

The output is shows the misalignment in the 2nd and 3rd rows (sorry, but pasting here won't work since Github seems to force "Liberation Mono" as the monospace font and this font appears to have its own issues). I've tried this on terminals, browsers, etc, always with similar results.

Width is 1 when it should be 2

I stumbled over a character that, when output to the console directly, takes up two characters. But StringWidth() gives me 1. This is because the first rune of this character has a width of 1 and that's what's being used, see here. I know I wrote this code and I'm sure that you cannot simply add up the widths of individual runes ("🏳️‍🌈" would then have a width of 4 which is obviously wrong) and using the first rune's width worked fine so far. But it turns out that it fails in some cases.

I'm not familiar with Indian characters but it seems to me that the second rune is a modifier that turns the character from a width of 1 into a width of 2. Are you aware of any logic that we could add to go-runewidth that makes this right?

Here's example code that illustrates the issue:

package main

import (
	"fmt"

	runewidth "github.com/mattn/go-runewidth"
)

func main() {
	s := "खा"
	fmt.Println("0123456789")
	fmt.Println(s + "<")
	fmt.Printf("String width: %d\n", runewidth.StringWidth(s))
	var i int
	for _, r := range s {
		fmt.Printf("Rune %s  (%d) width: %d\n", string(r), i, runewidth.RuneWidth(r))
		i++
	}
}

Output (on macOS with iTerm2):

image

Broken benchmark tests

c9bd7d1 and 43a826d broke benchmark tests

$ go test -bench . -benchmem
--- FAIL: BenchmarkRuneWidthAll
    benchmark_test.go:27: got 1293942, want 1293932
goos: linux
goarch: amd64
pkg: github.com/mattn/go-runewidth
cpu: 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz
BenchmarkRuneWidth768-4                   650364              1877 ns/op               0 B/op          0 allocs/op
--- FAIL: BenchmarkRuneWidthAllEastAsian
    benchmark_test.go:27: got 1432568, want 1432558
BenchmarkRuneWidth768EastAsian-4           85194             14217 ns/op               0 B/op          0 allocs/op
--- FAIL: BenchmarkString1WidthAll
    benchmark_test.go:62: got 1295990, want 1295980
BenchmarkString1Width768-4                  9513            125876 ns/op           86016 B/op       3072 allocs/op
--- FAIL: BenchmarkString1WidthAllEastAsian
    benchmark_test.go:62: got 1436664, want 1436654
BenchmarkString1Width768EastAsian-4         8168            142574 ns/op           86016 B/op       3072 allocs/op
BenchmarkTablePrivate-4                      656           1798150 ns/op               0 B/op          0 allocs/op
BenchmarkTableNonprint-4                     402           2982255 ns/op               0 B/op          0 allocs/op
BenchmarkTableCombining-4                    264           4511447 ns/op               0 B/op          0 allocs/op
BenchmarkTableDoublewidth-4                  222           5379437 ns/op               0 B/op          0 allocs/op
BenchmarkTableAmbiguous-4                    183           6475643 ns/op               0 B/op          0 allocs/op
BenchmarkTableEmoji-4                        222           5272836 ns/op               0 B/op          0 allocs/op
BenchmarkTableNarrow-4                       522           2255628 ns/op               0 B/op          0 allocs/op
BenchmarkTableNeutral-4                      144           8281886 ns/op               0 B/op          0 allocs/op
FAIL
exit status 1
FAIL    github.com/mattn/go-runewidth   19.880s

possible regression ?

Hi,

Updating go-runewidth from v0.0.4 to v0.0.5 break my tests in https://github.com/MichaelMure/go-term-text. go-term-text is a package doing text formatting for the terminal, relying on go-runewidth to get the character width.

Here is example of before/after:
image

image

Notice that after switching to 0.0.5, the text go further than it should. As the algorithm remain unchanged, I suspect go-runewidth return a different length. Would that be possible ? If so, why ?

go-app-builder fails on windows because of "syscall" in runewidth_windows.go

Im using the package github.com/jhillyerd/go.enmime from an AppEngine classic project where the syscall package is not available.

The go.enmime in turn imports this package github.com/mattn/go-runewidth.

Unfortunately running the project from dev_appserver.py on windows results in:

go-app-builder: Failed parsing input: parser: bad import "syscall" in github.com\mattn\go-runewidth\runewidth.go from GOPATH

I had to change the file runewidth_windows.go to following to make the project build:

package runewidth

import (
	//"syscall"
)

var (
	//kernel32               = syscall.NewLazyDLL("kernel32")
	//procGetConsoleOutputCP = kernel32.NewProc("GetConsoleOutputCP")
)

// IsEastAsian return true if the current locale is CJK
func IsEastAsian() bool {
	return false
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.