-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Add a dict to dataset with add_column
and update it with map
, but get wrong result.
#42190
Comments
This is because in this line:
the same dict object is used in for each element in the resulting list. After updating this to:
I get the following expected result:
Please feel free to re-open the issue if I missed anything. |
@scottjlee Thank you so much. It works for me. |
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened + What you expected to happen
add_column
to add a empty dictmeta
to dataset and then usemap
to update themeta
dict, but get a wrong dataset.But I get :
RAY_DEDUP_LOGS=0 python test.py
Versions / Dependencies
Package Version Editable project location
absl-py 1.3.0
accelerate 0.20.3
addict 2.4.0
aie-ipyleaflet 0.15.1
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
alabaster 0.7.12
albumentations 1.3.0
alembic 1.11.1
aliyun-python-sdk-core 2.13.36
aliyun-python-sdk-kms 2.16.0
altair 4.2.2
anaconda-client 1.11.0
anaconda-navigator 2.3.1
anaconda-project 0.11.1
anyio 3.6.2
appdirs 1.4.4
applaunchservices 0.3.0
appnope 0.1.2
appscript 1.1.2
APScheduler 3.10.1
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
array-record 0.4.0
arrow 1.2.2
astor 0.8.1
astroid 2.11.7
astropy 5.1
asttokens 2.4.1
astunparse 1.6.3
async-timeout 4.0.2
atomicwrites 1.4.0
attrs 23.1.0
audioread 3.0.0
Automat 20.2.0
autopage 0.5.1
autopep8 1.6.0
av 10.0.0
Babel 2.9.1
backcall 0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile 1.0
backports.weakref 1.0.post1
base58 2.1.1
bcrypt 3.2.0
beautifulsoup4 4.11.1
binaryornot 0.4.4
bitarray 2.5.1
bitsandbytes 0.38.0
bkcharts 0.2
black 22.3.0
bleach 4.1.0
blinker 1.6.2
blis 0.7.9
bokeh 2.4.3
boltons 23.0.0
boto3 1.16.49
botocore 1.19.63
Bottleneck 1.3.5
Brotli 1.0.9
brotlipy 0.7.0
cachetools 5.2.0
catalogue 2.0.8
certifi 2022.12.7
cffi 1.15.1
cfgv 3.3.1
chardet 4.0.0
charset-normalizer 3.1.0
chex 0.1.7
click 8.1.3
cliff 4.3.0
cloudpickle 2.0.0
clu 0.0.9
clyent 1.2.2
cmaes 0.9.1
cmd2 2.4.3
codecarbon 2.2.3
colorama 0.4.5
colorcet 3.0.0
coloredlogs 15.0.1
colorlog 6.7.0
commonmark 0.9.1
conda 23.3.1
conda-build 3.22.0
conda-content-trust 0.1.3
conda-pack 0.6.0
conda-package-handling 1.9.0
conda-repo-cli 1.0.20
conda-token 0.4.0
conda-verify 3.4.2
confection 0.0.3
constantly 15.1.0
contextlib2 21.6.0
contourpy 1.0.7
cookiecutter 1.7.3
courlan 0.9.3
crcmod 1.7
cryptography 38.0.1
cssselect 1.1.0
cycler 0.11.0
cymem 2.0.7
Cython 0.29.32
cytoolz 0.11.0
daal4py 2021.6.0
dask 2022.7.0
data-juicer 0.1.0
dataclasses 0.6
datasets 2.11.0
datashader 0.14.1
datashape 0.5.4
datasketch 1.5.9
dateparser 1.1.8
debugpy 1.5.1
decorator 4.4.2
defusedxml 0.7.1
descartes 1.1.0
diff-match-patch 20200713
diffusers 0.16.1
dill 0.3.4
distlib 0.3.6
distributed 2022.7.0
dlib 19.24.2
dm-tree 0.1.8
docopt 0.6.2
docstring-parser 0.15
docutils 0.18.1
easydict 1.10
editdistance 0.6.2
einops 0.6.1
embeddings 0.0.8
emoji 2.2.0
en-core-web-md 3.5.0
entrypoints 0.4
et-xmlfile 1.1.0
etils 1.3.0
evaluate 0.3.0
exceptiongroup 1.1.2
executing 2.0.1
fairscale 0.4.12
Faker 18.9.0
fastapi 0.95.1
fastcore 1.5.27
fastdownload 0.0.7
fastjsonschema 2.16.2
fastprogress 1.0.3
fasttext 0.9.2
ffmpeg 1.4
ffmpeg-python 0.2.0
ffmpy 0.3.0
filelock 3.11.0
fire 0.4.0
flake8 4.0.1
Flask 1.1.2
flatbuffers 2.0.7
flax 0.6.11
fonttools 4.39.3
frozendict 2.3.8
frozenlist 1.3.3
fsspec 2023.3.0
ftfy 6.1.1
future 0.18.2
fuzzywuzzy 0.18.0
gast 0.4.0
gdown 4.7.1
gensim 4.1.2
gin-config 0.5.0
gitdb 4.0.10
GitPython 3.1.31
glob2 0.7
gmpy2 2.1.2
google-auth 2.21.0
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
googleapis-common-protos 1.59.1
gradio 3.35.2
gradio_client 0.2.7
graphviz 0.20.1
greenlet 1.1.1
grpcio 1.50.0
h11 0.14.0
h5py 3.7.0
harvesttext 0.8.1.8
HeapDict 1.0.1
hjson 3.1.0
holoviews 1.15.0
htmldate 1.4.3
httpcore 0.17.0
httpx 0.24.0
huggingface-hub 0.15.1
humanfriendly 10.0
hvplot 0.8.0
hyperlink 21.0.0
hypothesis 6.80.0
identify 2.5.5
idna 3.4
imagecodecs 2021.8.26
imagededup 0.3.2
imageio 2.9.0
imageio-ffmpeg 0.4.7
imagesize 1.4.1
imgaug 0.4.0
immutabledict 2.2.4
importlib 1.0.4
importlib-metadata 4.11.3
importlib-resources 5.12.0
incremental 21.3.0
inflate64 0.3.1
inflection 0.5.1
iniconfig 1.1.1
intake 0.6.5
internetarchive 3.5.0
intervaltree 3.1.0
ipadic 1.0.0
ipykernel 6.15.2
ipython 8.18.1
ipython-genutils 0.2.0
ipywidgets 7.6.5
isodate 0.6.1
isort 4.3.21
itemadapter 0.3.0
itemloaders 1.0.4
itsdangerous 2.0.1
jax 0.3.25
jaxlib 0.3.25
jdcal 1.4.1
jedi 0.18.1
jellyfish 0.9.0
jieba 0.42.1
Jinja2 3.1.2
jinja2-time 0.2.0
jiwer 2.2.0
jmespath 0.10.0
joblib 1.2.0
json-tricks 3.16.1
json5 0.9.6
jsonargparse 4.21.1
jsonlines 3.1.0
jsonpatch 1.32
jsonplus 0.8.0
jsonpointer 2.1
jsonschema 4.17.3
jupyter 1.0.0
jupyter_client 7.3.4
jupyter-console 6.4.3
jupyter_core 4.11.1
jupyter-server 1.18.1
jupyterlab 3.4.4
jupyterlab-pygments 0.1.2
jupyterlab-server 2.10.3
jupyterlab-widgets 1.0.0
just-testsimhash-pybind 0.0.1
jusText 3.0.0
kaleido 0.2.1
kenlm 0.0.0
keras 2.12.0
keyring 23.4.0
kiwisolver 1.4.4
kornia 0.6.8
langcodes 3.3.0
langid 1.1.6
lazy-object-proxy 1.6.0
Levenshtein 0.21.1
libarchive-c 2.9
libclang 16.0.0
librosa 0.8.0
linkify-it-py 2.0.0
livereload 2.6.3
llvmlite 0.39.1
lmdb 1.3.0
locket 1.0.0
loguru 0.5.3
lpips 0.1.4
ltp 4.2.13
ltp-core 0.1.4
ltp-extension 0.1.10
lxml 4.9.2
lz4 3.1.3
Mako 1.2.4
Markdown 3.3.4
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mccabe 0.6.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
megatron-util 1.3.2
mesh-tensorflow 0.1.21
mistune 0.8.4
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
ml-collections 0.1.1
ml-datasets 0.2.0
ml-dtypes 0.2.0
mmcls 0.24.1
mmdet 2.25.3
mock 2.0.0
modelscope 1.9.5
moviepy 1.0.3
mpmath 1.2.1
msgpack 1.0.3
multidict 6.0.4
multipledispatch 0.6.0
multiprocess 0.70.12
multivolumefile 0.2.3
munkres 1.1.4
murmurhash 1.0.9
mypy 1.0.1
mypy-extensions 0.4.3
navigator-updater 0.3.0
nbclassic 0.3.5
nbclient 0.5.13
nbconvert 6.4.4
nbformat 5.5.0
nest-asyncio 1.5.5
networkx 2.8.4
nh3 0.2.15
ninja 1.11.1
nlpaug 1.1.11
nltk 3.5
nodeenv 1.7.0
nose 1.3.7
notebook 6.4.12
numba 0.56.4
numexpr 2.8.3
numpy 1.23.5
numpydoc 1.4.0
nuscenes-devkit 1.1.9
oauthlib 3.2.2
olefile 0.46
onnxruntime 1.13.1
OpenCC 1.1.6
opencc-python-reimplemented 0.1.7
opencv-python 4.6.0.66
opencv-python-headless 4.6.0.66
openpyxl 3.0.10
opt-einsum 3.3.0
optax 0.1.5
optuna 2.10.0
orjson 3.8.10
oss2 2.16.0
packaging 23.2
pai-easycv 0.7.0
pandas 2.0.0
pandocfilters 1.5.0
panel 0.13.1
param 1.12.0
parsel 1.6.0
parso 0.8.3
partd 1.2.0
pathlib 1.0.1
pathspec 0.9.0
pathy 0.10.2
patsy 0.5.2
pbr 5.11.1
pdfminer 20191125
pdfminer.six 20221105
pdfplumber 0.9.0
pep8 1.7.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.1.2
pkginfo 1.8.2
platformdirs 2.5.2
plotly 5.14.1
pluggy 1.0.0
ply 3.11
pooch 1.7.0
portalocker 2.7.0
poyo 0.5.0
pre-commit 3.2.1
preshed 3.0.8
prettytable 3.5.0
proglog 0.1.10
prometheus-client 0.14.1
promise 2.3
prompt-toolkit 3.0.41
Protego 0.1.16
protobuf 3.20.3
psutil 5.9.0
psycopg2 2.8.6
ptyprocess 0.7.0
pure-eval 0.2.2
py 1.11.0
py-cpuinfo 9.0.0
py-data-juicer 0.1.2 /Users/mazhijian/Documents/Project_2023/P01_LLM/C02_Solutions/data-juicer
py4j 0.10.9.7
py7zr 0.20.5
pyarrow 12.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pybcj 1.0.1
pybind11 2.10.4
pyclipper 1.3.0.post4
pycocotools 2.0.6
pycodestyle 2.8.0
pycosat 0.6.3
pycparser 2.21
pycryptodome 3.15.0
pycryptodomex 3.18.0
pyct 0.4.8
pycurl 7.45.1
pydantic 1.7.4
pydeck 0.8.1b0
PyDispatcher 2.0.5
pydocstyle 6.1.1
pydub 0.25.1
pyerfa 2.0.0
pyflakes 2.4.0
pyglove 0.3.0
Pygments 2.15.1
PyHamcrest 2.0.2
PyJWT 2.6.0
pylint 2.14.5
pyls-spyder 0.4.0
pyltp 0.4.0
Pympler 1.0.1
pynvml 11.5.0
pyobjc-core 8.5
pyobjc-framework-Cocoa 8.5
pyobjc-framework-CoreServices 8.5
pyobjc-framework-FSEvents 8.5
pyodbc 4.0.34
pyOpenSSL 22.0.0
pyparsing 3.0.9
pyperclip 1.8.2
pypinyin 0.49.0
pyplumber 0.1.9
pyppmd 1.0.0
PyQt5-sip 12.11.0
pyquaternion 0.9.9
pyrsistent 0.19.3
PySocks 1.7.1
pyspark 3.4.0
pytest 7.1.2
pytest-timeout 1.4.2
pythainlp 4.0.2
python-crfsuite 0.9.9
python-dateutil 2.8.2
python-docx 0.8.11
python-Levenshtein 0.21.1
python-louvain 0.16
python-lsp-black 1.2.1
python-lsp-jsonrpc 1.0.0
python-lsp-server 1.5.0
python-multipart 0.0.6
python-pptx 0.6.21
python-slugify 8.0.1
python-snappy 0.6.0
pytorch-metric-learning 1.6.3
pytz 2023.3
pytz-deprecation-shim 0.1.0.post0
pyvi 0.1.1
pyviz-comms 2.0.2
PyWavelets 1.3.0
PyYAML 5.4.1
pyzmq 23.2.0
pyzstd 0.15.9
QDarkStyle 3.0.2
qstylizer 0.1.10
QtAwesome 1.0.3
qtconsole 5.3.2
QtPy 2.2.0
qudida 0.0.4
queuelib 1.5.0
rapidfuzz 2.13.2
ray 2.7.1
rdflib 6.3.2
readme-renderer 42.0
recommonmark 0.7.1
redis 4.5.5
regex 2022.7.9
requests 2.28.2
requests-file 1.5.1
requests-oauthlib 1.3.1
requests-toolbelt 1.0.0
resampy 0.4.2
responses 0.18.0
rfc3986 2.0.0
rich 13.3.5
rope 0.22.0
rouge 1.0.1
rouge-score 0.1.2
rsa 4.9
Rtree 0.9.7
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
ruamel-yaml-conda 0.15.100
s3transfer 0.3.7
sacrebleu 2.0.0
sacremoses 0.0.53
safetensors 0.4.0
schema 0.7.5
scikit-image 0.19.3
scikit-learn 1.2.2
scikit-learn-intelex 2021.20221004.121333
scipy 1.11.3
Scrapy 2.6.2
seaborn 0.11.2
selectolax 0.3.13
semantic-version 2.10.0
Send2Trash 1.8.0
sentencepiece 0.1.95
seqeval 1.2.2
seqio 0.0.16
seqio-nightly 0.0.15.dev20230702
service-identity 18.1.0
setuptools 68.0.0
Shapely 1.8.5.post1
shotdetect-scenedetect-lgss 0.0.3
simhash-py 0.4.2
simhash-pybind 0.0.2
simplejson 3.18.0
sip 6.6.2
six 1.16.0
sklearn 0.0.post1
sklearn-crfsuite 0.3.6
smart-open 5.2.1
smmap 5.0.0
sniffio 1.3.0
snowballstemmer 2.2.0
sortedcollections 2.1.0
sortedcontainers 2.4.0
soundfile 0.12.1
soupsieve 2.3.1
spacy 3.5.0
spacy-legacy 3.0.12
spacy-loggers 1.0.4
spacy-pkuseg 0.0.32
Sphinx 5.0.2
sphinx-autobuild 2021.3.14
sphinx-rtd-theme 1.2.2
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.0
sphinxcontrib-jquery 4.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
spyder 5.3.3
spyder-kernels 2.3.3
SQLAlchemy 1.4.39
srsly 2.4.5
stack-data 0.6.3
stanza 1.7.0
starlette 0.26.1
statsmodels 0.13.2
stevedore 5.1.0
streamlit 1.25.0
subword-nmt 0.3.8
sympy 1.10.1
t5 0.9.4
tables 3.6.1
tabulate 0.8.10
TBB 0.2
tblib 1.7.0
tenacity 8.2.2
tensorboard 2.12.3
tensorboard-data-server 0.7.1
tensorboard-plugin-wit 1.8.1
tensorflow-datasets 4.9.2
tensorflow-estimator 2.12.0
tensorflow-hub 0.13.0
tensorflow-io-gcs-filesystem 0.32.0
tensorflow-metadata 1.13.1
tensorflow-text 2.12.1
tensorstore 0.1.40
termcolor 2.1.0
terminado 0.13.1
terminaltables 3.1.10
testpath 0.6.0
text-unidecode 1.3
textdistance 4.2.1
texttable 1.6.7
tf-slim 1.1.0
tfds-nightly 4.9.2.dev202307030045
thinc 8.1.10
thinc-apple-ops 0.1.3
thop 0.1.1.post2209072238
threadpoolctl 2.2.0
three-merge 0.1.1
tifffile 2021.7.2
timm 0.6.11
tinycss 0.4
tld 0.13
tldextract 3.2.0
tokenizers 0.13.3
toml 0.10.2
tomli 1.2.3
tomlkit 0.11.1
toolz 0.12.0
torch 2.1.1
torch-struct 0.5
torchmetrics 0.10.3
torchvision 0.16.1
tornado 6.1
tqdm 4.66.1
trafilatura 1.6.0
traitlets 5.1.1
traittypes 0.2.1
trankit 1.1.1
transformers 4.31.0
twine 4.0.2
Twisted 22.2.0
typer 0.7.0
types-mock 5.0.0.7
types-requests 2.31.0.1
types-setuptools 68.0.0.0
types-urllib3 1.26.25.13
typeshed-client 2.3.0
typing 3.7.4.3
typing_extensions 4.5.0
tzdata 2023.3
tzlocal 4.3
uc-micro-py 1.0.1
ujson 5.4.0
ukkonen 1.0.1
Unidecode 1.2.0
urllib3 1.26.15
uvicorn 0.21.1
validators 0.20.0
virtualenv 20.17.1
w3lib 1.21.0
Wand 0.6.11
wasabi 0.10.1
watchdog 2.1.6
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.58.0
websockets 11.0.1
Werkzeug 2.0.3
wget 3.2
whatthepatch 1.0.2
wheel 0.40.0
widgetsnbextension 3.5.2
wrapt 1.14.1
wurlitzer 3.0.2
xarray 0.20.1
xgboost 1.5.2
xlrd 2.0.1
XlsxWriter 3.0.3
xlwings 0.27.15
xtcocotools 1.12
xxhash 3.1.0
xyzservices 2022.9.0
yacs 0.1.8
yapf 0.31.0
yarl 1.8.2
zh-core-web-md 3.5.0
zhconv 1.4.3
zhon 1.1.5
zict 2.1.0
zipp 3.8.0
zope.interface 5.4.0
zstandard 0.21.0
Reproduction script
Source code in test.py
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: