roytam1 / rtoss Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/rtoss
Automatically exported from code.google.com/p/rtoss
it will still need using 64bit to open >4GB files because it use memory mapping to open file.
for now it opens in 64bit build, but only displaying (filesize % 4GB) bytes, as a result many places needed to be changed for 64bit.
What steps will reproduce the problem?
1. Open any text file encoded in GB18030 with 4-byte char sequences.
2. Auto detect file encoding.
What is the expected output? What do you see instead?
It is expected to decode the file as GB18030 or GBK.
GreenPad decodes the file as Shift-JIS or Turkish.
What version of the product are you using? On what operating system?
SVN revision 166. XP SP3.
Please provide any additional information below.
I have made local modifications to use CP54936 instead of CP936 for Chinese
text, and implemented CharNext and detection for GB18030, as the M$ Windoze API
sucks. Here is the patch attached.
It seems to work in my test cases, but I hope anyone could conduct more
thorough tests to make it robust enough for merging into mainstream hopefully.
However the patch has one known problem, that 1-byte Euro sign (0x80) in CP936
no longer works in CP54936. Maybe it would be a better solution to separate GBK
and GB18030 handling routines.
Original issue reported on code.google.com by [email protected]
on 1 Jan 2011 at 6:37
Attachments:
Try with this random gibberish I typed in Windows-1252
It is now auto-detected as UTF-8 by the latest chardet.dll (when I build it myself or I take it from your latest GreenPad-1.08.3 build).
However it contains invalid sequences:
Actually just this sequence is detected as UTF-8:
é"èr
in cp1252 ie: E9 22 E8 72
which is an invalid sequence. E9
defines a 3 byte sequence so the following byte should be of the form 10xxxxxx
also E8 cannot be there.
Maybe some invalid sequences could be tolerated but they should be a minority.
If I use an older chardet.dll version I do not have the problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.