Git Product home page Git Product logo

ude's People

Contributors

errepi avatar

Watchers

 avatar

ude's Issues

pureascii detection issue

What steps will reproduce the problem?
1. create a text file with just the character "3"
2. save it and run detection.
3. notice that it gives detection failed

What is the expected output? What do you see instead?
expected it to report the file as ascii(happens on any file that had the 
number 3 in it)

What version of the product are you using? On what operating system?
last updated version on windows xp

Please provide any additional information below.

noticed that the code is looking for EscAscii characters and it is looking 
for 0x33 instead of 0x1b. 0x33 is the number 3 and not an escape character.
not sure if there is such an issue anywhere else in the code

Original issue reported on code.google.com by rbhatt%[email protected] on 2 Dec 2009 at 5:29

BUG in SBCSGroupProber class in function Reset

http://ude.googlecode.com/svn/trunk/src/Library/Ude.Core/SBCSGroupProber.cs

existing code:

public override void Reset ()
{
    int activeNum = 0;
...

SHOULD be:

public override void Reset ()
{
    activeNum = 0;
...

in many cases this bug will cause fail to detect right charset because class 
member activeNum is currently always 0 because in Reset function local variable 
used, see this piece of code:
} else if (st == ProbingState.NotMe) {
   isActive[i] = false;
   activeNum--;
   if (activeNum <= 0) {
      state = ProbingState.NotMe;
      break;
   }
}

I fixed it locally but want that other developer didn't spent much time 
debugging the same issue)

attached file is where bug is reproduced (charset is KOI8-R)

Original issue reported on code.google.com by [email protected] on 25 May 2012 at 6:40

Attachments:

Detection fails on particular, simple ANSI file

What steps will reproduce the problem?
1. Save an ANSI file containing the text "CONFIG: main 30000000"
2. Run the library and/or exe on it

What is the expected output? What do you see instead?

I expect ANSI detected.

What version of the product are you using? On what operating system?

The library shows null for charset, and the exe shows "detection failed".

Please provide any additional information below.

I don't know if this is how the library is intended to work, but I think it 
would be more useful to detect ANSI if all the characters fit into ANSI. Or at 
least support this behavior optionally.

Original issue reported on code.google.com by [email protected] on 14 Sep 2014 at 4:59

UTF-16 without BOM not detected correctly

What steps will reproduce the problem?
1. Create a text file encoded as UTF-16 little endian.
2. Edit hex and remove the BOM from the file.  Yes, this is purposely modifying 
the file to cause a problem but I have been encountering many examples of 
UTF-16 encoded files lacking a BOM as provided to me from other applications.  
And not having a BOM does not invalidate the file.
3. Test Ude.Example by passing path to this BOM-less UTF-16LE file
4. When UniversalDetector is called the first check is to look for a BOM.
5. Not having a BOM, the evaluation passes to the deeper analysis which returns 
a result of encoding = ANSI 1252 which is wrong.

What is the expected output? 

Expected output is encoding = "UTF-16"

What do you see instead?

"Charset: ASCII, confidence: 1"


What version of the product are you using? On what operating system?

Ude C# port with all current code changes applied
Window 7 Ultimate SP1 64-bit

Please provide any additional information below.

Larger files (1000kb+) lacking the BOM tend to show result of "Charset: 
windows-1252, confidence: 0.5"

Original issue reported on code.google.com by [email protected] on 17 Sep 2012 at 10:52

EUCTW: System.IndexOutOfRangeException

The problem is CharDistributionAnalyser.HandleOneChar call for EUCTW detection.

size of charToFreqOrder array is 5376 but tableSize is deffind as 8102 and
this check is wrong
if (order < tableSize) <--
 { // order is valid
   if (512 > charToFreqOrder[order])
     freqChars++;
 }

I have take a look in Java code and this part of code is changed to

if (order < charToFreqOrder.Length)
{ // order is valid
  if (512 > charToFreqOrder[order])
    freqChars++;
}

we don't need tableSize any more and there will be no Exception at this
place in future.


Original issue reported on code.google.com by [email protected] on 16 Nov 2009 at 3:12

Returns UTF-8 for Cyrillic text

What steps will reproduce the problem?
1. Define Cyrillic text, "Это пример кириллического 
текста".
2. Feed the CharsetDetector with stream to this text.
3. Result charset is "UTF-8" with Confidence 1.0

What is the expected output? 
Charset is koi-8

What do you see instead?
UTF-8

What version of the product are you using? 
Ude, C# port 

On what operating system?
Windows 7/8, x64

Original issue reported on code.google.com by [email protected] on 8 Dec 2012 at 8:19

Cannot Find .sln for windows usage

What steps will reproduce the problem?
1. DL the tarball
2. Extract
3. Look for .sln

What is the expected output? What do you see instead?
Should be there somewhere...  Its not.

What version of the product are you using? On what operating system?
0.1 windows xp

Please provide any additional information below.
Is there a workaround?  Should I just build my own solution form the 
source?

Original issue reported on code.google.com by [email protected] on 13 Jul 2009 at 11:53

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.