Git Product home page Git Product logo

crimekgassitant's People

Contributors

liuhuanyong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crimekgassitant's Issues

词向量

运行问题分类的代码会报没有词向量,可以提供一下运行代码时的已训练好的词向量吗?

执行法务咨询自动问答时报错

执行法务咨询自动问答脚本,运行 python crime_qa.py后,正常输入问题程序崩溃出错,出错完整内容如下```
loaded 300785 word embedding, finished
question:我要离婚
GET http://127.0.0.1:9200/crime_data/crime/_search?size=20 [status:N/A request:0.000s]
Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 172, in perform_request
response = self.pool.urlopen(method, url, body, retries=Retry(False), headers=request_headers, **kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 343, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f967b5e55c0>: Failed to establish a new connection: [Errno 111] Connection refused
GET http://127.0.0.1:9200/crime_data/crime/_search?size=20 [status:N/A request:0.000s]
Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 172, in perform_request
response = self.pool.urlopen(method, url, body, retries=Retry(False), headers=request_headers, **kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 343, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f967b5e55f8>: Failed to establish a new connection: [Errno 111] Connection refused
GET http://127.0.0.1:9200/crime_data/crime/_search?size=20 [status:N/A request:0.000s]
Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 172, in perform_request
response = self.pool.urlopen(method, url, body, retries=Retry(False), headers=request_headers, **kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 343, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f967b5e5780>: Failed to establish a new connection: [Errno 111] Connection refused
GET http://127.0.0.1:9200/crime_data/crime/_search?size=20 [status:N/A request:0.000s]
Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 172, in perform_request
response = self.pool.urlopen(method, url, body, retries=Retry(False), headers=request_headers, **kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 343, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f967b5ad400>: Failed to establish a new connection: [Errno 111] Connection refused
Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 172, in perform_request
response = self.pool.urlopen(method, url, body, retries=Retry(False), headers=request_headers, **kw)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 343, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/cc/anaconda3/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/cc/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f967b5ad400>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "crime_qa.py", line 137, in
final_answer = handler.search_main(question)
File "crime_qa.py", line 105, in search_main
candi_answers = self.search_es(question)
File "crime_qa.py", line 42, in search_es
res = self.search_specific(question)
File "crime_qa.py", line 35, in search_specific
searched = self.es.search(index=self._index, doc_type=self.doc_type, body=query_body, size=20)
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/client/init.py", line 660, in search
doc_type, '_search'), params=params, body=body)
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/home/cc/anaconda3/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 181, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f967b5ad400>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f967b5ad400>: Failed to establish a new connection: [Errno 111] Connection refused)

build_qa_database.py插数的时候有个小问题

数据量少于BULK_COUNT未执行插数,修改了一下:

def init_ES():
    pie = ProcessIntoES()
    # 创建ES的index
    pie.create_mapping()
    start_time = time.time()
    index = 0
    count = 0
    action_list = []
    BULK_COUNT = 1000  # 每BULK_COUNT个句子一起插入到ES中

    for line in open(pie.music_file, 'r', encoding='utf8'):
        if not line:
            continue
        item = json.loads(line)
        index += 1
        action = {
            "_index": pie._index,
            "_type": pie.doc_type,
            "_source": {
                "question": item['question'],
                "answers": '\n'.join(item['answers']),
            }
        }
        action_list.append(action)
        if index >= BULK_COUNT:
            pie.insert_data_bulk(action_list=action_list)
            index = 0
            count += 1
            print(count)
            action_list = []
            
    if index < BULK_COUNT:
        pie.insert_data_bulk(action_list=action_list)
    
    end_time = time.time()
    print("Time Cost:{0}".format(end_time - start_time))

请教词向量训练的有关问题

背景:
刚接触这个领域,请教老师问题

过程:

对2G多的那个train.json中的fact提取,分词,再用word2vec训练出词向量,结果:1280257个词,4.66G。

我看您训练后的只有1G多,觉得可能和没有去除停用词有关,可能和分词后没有去重有关,去掉停用词后,1440045个词,5.24G,数量不减反增,没想明白为什么。

问:

1)一般情况下对语料分词后要不要去除停用词,如果去掉的话,在用词向量表示文档的时候,会不会丢失语义,比如:导致,由于,传说等词,且数字需不需要去掉,因为日期,电话号码等在某些领域很多,是有意义的。

2)在分词的时候,每读取一行语料,分词,写入词文件,这样势必会产生很多相同的词语,这个时候要不要去重,不知道您是怎么做的。

感谢。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.