Git Product home page Git Product logo

Comments (5)

brilee avatar brilee commented on September 3, 2024

Can you give me an example of the sgf file that it's running into issues on?

I suspect it's an sgf file that violates the standards, so having the file itself would be useful to be able to reproduce and verify the fix.

from mugo.

greatken999 avatar greatken999 commented on September 3, 2024

most sgff file use gb18030 codec in china ,so i changed load_data_sets.py :
line 48

wqy00.zip

#with open(file) as f:
 with open(file,'rt',encoding='gb18030',errors='iqnore') as f:

to fix bug 👍 :
366 sgfs found.
Estimated number of chunks: 17
Traceback (most recent call last):
File "main.py", line 94, in
argh.dispatch(parser)
File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "main.py", line 49, in preprocess
test_chunk, training_chunks = parse_data_sets(*data_sets)
File "/home/ken/ai/go/MuGo/load_data_sets.py", line 140, in parse_data_sets
test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions)
File "/home/ken/ai/go/MuGo/load_data_sets.py", line 60, in split_test_training
positions_w_context = list(positions_w_context)
File "/home/ken/ai/go/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf
for position_w_context in replay_sgf(f.read()):
File "/usr/lib64/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 5: invalid continuation byte

from mugo.

brilee avatar brilee commented on September 3, 2024

Oh.. ugh, this makes me sad.
So, the SGF file should declare that its encoding is GB18030; I can't just assume it. Most western-generated SGFs assume UTF-8, so putting in this new assumption would just break the other half of SGFs.

The other issue is that the HA property should be a number http://www.red-bean.com/sgf/go.html#types , not "Wu played first", even though that was the convention back then. I can't really ask you to go fix whatever SGF editor created these files, though, so I think the best I could do is just have a try-except to try different encodings.

from mugo.

greatken999 avatar greatken999 commented on September 3, 2024

Yes,I fix this bug changed sgf_wrapper.py to 👍
try:
metadata = GameMetadata(
result=sgf_prop(props.get('RE')),
handicap=int(sgf_prop(props.get('HA', [0]))),
board_size=19)

except:
metadata = GameMetadata(
result=sgf_prop(props.get('RE')),
handicap=0,
board_size=19)
f=open("./error.txt",'a')
traceback.print_exc(file=f)
f.flush()
f.close()

from mugo.

greatken999 avatar greatken999 commented on September 3, 2024

Hi brilee:

encoding bug fixed , tested ok both utf-8 and GB18030 sgf files.
need rum "pip3 install cchardet" to install cchardet modulle first

change load_data_sets.py line 48 to:
import cchardet as chardet

def get_positions_from_sgf(file):
with open(file,'rb') as f:
result = chardet.detect(f.read())['encoding']
f.close
with open(file,'rt',encoding=result,errors='iqnore') as f:

from mugo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.