Git Product home page Git Product logo

mahonia's People

Contributors

axgle avatar haoqis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mahonia's Issues

用utf8转成gbk再转回去 转不回去了

package main

import (
"fmt"
"github.com/axgle/mahonia"
)

func main() {

str :="你好" 
fmt.Println("UTF-8 to GBK: ",ConvertToString(str,"utf8","gbk"))

// data :=ConvertToString(str,"utf8","gbk")
fmt.Println("GBK to UTF-8: ",ConvertToString(ConvertToString(str,"utf8","gbk"),"gbk","utf8"))

}

func ConvertToString(src string, srcCode string, tagCode string) string {
srcCoder := mahonia.NewDecoder(srcCode)
srcResult := srcCoder.ConvertString(src)
tagCoder := mahonia.NewDecoder(tagCode)
_, cdata, _ := tagCoder.Translate([]byte(srcResult), true)
result := string(cdata)
return result
}

decoder bug

In shiftjis decoder , error convert '' to '¥'

如果需要转换的string中有英文符号,会无法转换

···
e.DOM.Find("p").Each(func(i int, s *goquery.Selection) {
text := s.Text()
result := mahonia.NewDecoder("gbk").ConvertString(text)
fmt.Println(result)
})
···
这是一段爬取代码,text里面保存的是gbk编码的字符串。
我发现只要这个text里面有英文的“”双引号,双引号里面的内容都没有被转码。
输出的结果类似于
···
我是正常的中文鈥満焐氖谴竺ā⒙躺氖切
···
后面的乱码就是在英文的双引号中的文字。
但如果我把整个html页面包括div,li标签等都打印出来,就可以转码正常。
代码类似于:
···
c.OnHTML("#ArtContent", func(e *colly.HTMLElement) {
result := mahonia.NewDecoder("gbk").ConvertString(string(e.Response.Body))
fmt.Println(result)
···
在这里result 是完全转换成中文了,没有乱码。

Import of code.google.com/p/mahonia

The file "mahoniconv/mahoniconv.go" imports "code.google.com/p/mahonia". As code.google.com is shutting down, this should be changed to github.com/axgle/mahonia.

A problem when converting jis-string to utf-8

In this call
mahonia.NewDecoder("shift-jis").ConvertString(string(s))

if string(s) contain char "", it will be converted to "¥"( full-width), which may lead to an error when creating file.

The following code can avoid this problem without changing original package
strings.Replace(mahonia.NewDecoder("shift-jis").ConvertString(string(s)), "¥", "", -1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.