Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练时报错 #5

Open
yahuuu opened this issue Jan 6, 2020 · 2 comments
Open

训练时报错 #5

yahuuu opened this issue Jan 6, 2020 · 2 comments

Comments

@yahuuu
Copy link

yahuuu commented Jan 6, 2020

当我运行ValueError: dimension mismatch.py尝试训练时遇到
Traceback (most recent call last):
File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in
data = Pool().map(jieba.lcut, data)
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
raise self._value
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks
put(task)
File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects报错。
分析应该是这句代码问题data = Pool().map(jieba.lcut, data)

为解决这个问题,当我替换为:
data = [d for d in map(jieba.cut, data)]
在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?

@zhangmin4215
Copy link

当我运行ValueError: dimension mismatch.py尝试训练时遇到
Traceback (most recent call last):
File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in
data = Pool().map(jieba.lcut, data)
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
raise self._value
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks
put(task)
File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects报错。
分析应该是这句代码问题data = Pool().map(jieba.lcut, data)

为解决这个问题,当我替换为:
data = [d for d in map(jieba.cut, data)]
在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?

根据自己的需求,把主函数里的一些语句去掉注释后运行

@hsipeng
Copy link

hsipeng commented Jan 15, 2021

当我运行ValueError: dimension mismatch.py尝试训练时遇到
Traceback (most recent call last):
File "C:/work/py/78stars_SpamMessage-master/token_and_save_to_file.py", line 38, in
data = Pool().map(jieba.lcut, data)
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
raise self._value
File "C:\Users\yah\Anaconda3\lib\multiprocessing\pool.py", line 431, in _handle_tasks
put(task)
File "C:\Users\yah\Anaconda3\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\yah\Anaconda3\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects报错。
分析应该是这句代码问题data = Pool().map(jieba.lcut, data)

为解决这个问题,当我替换为:
data = [d for d in map(jieba.cut, data)]
在运行test.py时候ValueError: dimension mismatch。多线程那里有什么问题,如果替换为单线程怎么写?

python3.6 之后 多线程需要是一个 外部函数,不能直接在class 里面 运行jieba.lcut ,需要修改一下,当然也可以使用单线程

def cut_words(data):
    return jieba.lcut(data)

if __name__ == '__main__':
    # 多线程
    pool = Pool(processes=6)
    data = pool.map(cut_words, data)
    save_tokenlization_result(data, target)
    # 单线程 
    # data2words = []
    # for words in data:
    #     temp = jieba.cut(words)
    #     data2words.append(temp)
    # save_tokenlization_result(data2words, target)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants