332 executable smartnoise #341

ghost · 2024-03-05T08:19:45Z

原先由於要繞過 smartnoise 的資料前處理，因此使用了其提供的 IdentityTransformer，卻發現會引起其他問題。後來接受 smartnoise 的前處理方式：因為其對類別資料的前處理只有轉換為 LabelEncoder，不需要 epsilon。為了修正此 bug，做了以下變更：

將 encoder、discretizing 的輸出皆調整為 pd.Categorical。 5f93456 987447e

接受 smartnoise 的前處理方式。 8994fc6

新增與調整 demo 檔案。 1801d9b af94b5e

ghost · 2024-03-05T08:26:28Z

修正對應的 README 文件，將 mwem 方法移除。2be8c03

ghost · 2024-03-07T03:04:03Z

smartnoise GAN 系 synthesizer 可用（dpctgan, pategan）

將 smartnoise 中的 gan 方法整合進入至套件 cb0ce89

進行輸出型別的調整以對應上述整合 279524d

提供 smartnoise 在套件中使用的內部範例文件 b426ed3

對應 README 文件修改 b79a33a

matheme-justyn · 2024-03-07T09:08:57Z

由於 #350 的調整，將說明書檔名改成 2024-01-11-Synthesizer.md - 08eb8da

matheme-justyn

我在 branch 332-executable-smartnoise 上用 executable_smartnoise.ipynb 試著對 adult-income [1.] 執行 issue332()，得到跟 #332 幾乎一樣的結果：

smartnoise-aim: ValueError: Synthesizer aim not found
smartnoise-mwem: MemoryError: Unable to allocate 1.87 TiB for an array with shape (256842399744,) and data type int64
smartnoise-mst: ModuleNotFoundError: No module named 'disjoint_set'
smartnoise-pacsynth: ValueError: Input contains NaN.

我沒有辦法復現你修繕的結果。我建議

先在你可以執行成功的環境下，列出你相關套件的版本，我們先盤點一次是否跟 requirement.txt 一致
如果不一致，例如 'disjoint_set'，是否可以加入 pyproject.toml
如果有相依性衝突，嘗試在 pyproject.toml 上手動強制升級所需套件並輸出 requirement.txt，之後在一個全新的環境下，依照 requirement.txt 安裝，再嘗試一次 smartnoise 這些檢查

另外我看到你有 issue332_gan()，我的理解是打開 scaler_inhibit = True 就可以解決 Scaler 不應該設定的問題，但我不管開跟關都獲得相同錯誤：

smartnoise-dpctgan: RuntimeError: all elements of input should be between 0 and 1
smartnoise-patectgan: RuntimeError: all elements of input should be between 0 and 1

如果你把 dev 抓到這個 branch，需要改用 benchmark://adult-income，此時檔名為 adult-income.csv，但這才是我們之前用的 adult

docs/_posts/2024-01-11-Synthesizer.md

matheme-justyn · 2024-03-08T00:46:08Z

我先執行了一次 poetry lock 試圖提版，以下是有提升的部分：

boto3: 1.34.42 -> 1.34.58
botocore: 1.34.42 -> 1.34.58
importlib-metadata: 7.0.1 -> 7.0.2
importlib-resources: 6.1.1 -> 6.1.3
nvidia-nvjitlink-cu12: 12.3.101 -> 12.4.99
pyparsing: 3.1.1 -> 3.1.2
pytest: 8.0.0 -> 8.0.2
sqlalchemy: 1.4.51 -> 1.4.52

感覺都沒有立即跟這個 issue 有關，但我們一步步解，我 commit poetry.lock 在 2e8ddd8

ghost · 2024-03-08T01:33:20Z

1、3 是我有更新套件跟加裝 smartnoise doc 說要裝的套件
2 不是我們的問題無法解決但已經在doc上移除這個方法
4 這個我之前沒遇過也許版本更新可以解決

ghost · 2024-03-08T01:51:36Z

剛剛再去確認第四點的問題，根據 smartnoise 的官方文件: "... To achieve this dimensional fidelity, the pac-synth synthesizer will sometimes generate rows with missing values."，因此合成資料本身會產生 NA 值，導致 inverse_transform 失敗。看起來 pacsynth 這樣的行為是無法被控制的，因此我建議看 Processor 這邊我能不能做一些調整，或者直接不使用這個 method。

matheme-justyn · 2024-03-08T03:01:39Z

更新 pyproject.toml 關於用 Poetry 設定環境的指令 - 6817af1

由於 poetry 軟體相依性問題過於嚴格（見 poetry/issues/697）
會傾向以 pip install 可安裝的方式做最大公約數來建議我們的套件部署方式
poetry 或 conda 的安裝未來會考慮移動到其他地方做參考 (e.g. 手冊的 about 之類)

ghost · 2024-03-08T03:19:40Z

conda create -n re python=3.10
conda activate re
pip install peotry
poetry install
pip install ipykernel
pip install pyyaml
pip install boto3
pip install sdv
pip install smartnoise-synth # Error can be ignored
pip install anonymeter
pip install git+https://github.com/ryan112358/private-pgm.git
pip install --upgrade torch # Error can be ignored

matheme-justyn · 2024-03-08T03:43:41Z

conda create -n re python=3.10
conda activate re
pip install peotry
poetry install
pip install ipykernel
pip install pyyaml
pip install boto3
pip install sdv
pip install smartnoise-synth # Error can be ignored
pip install anonymeter
pip install git+https://github.com/ryan112358/private-pgm.git
pip install --upgrade torch # Error can be ignored

還需要加
pip install requests

備註會遇到的 error:

> pip install smartnoise-synth # Error can be ignored
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
rdt 1.9.2 requires Faker<20,>=17, but you have faker 15.3.4 which is incompatible.

> pip install --upgrade torch # Error can be ignored
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
smartnoise-synth 1.0.3 requires torch<2.0.0, but you have torch 2.2.1 which is incompatible.

其實這兩個問題都是已知，smartnoise 在 Faker 上卡住 rdt，以及 torch 上版本依賴太舊。我們現在用 pip 強制安裝 smartnoise-synth 1.0.3，我接著會做測試、測試完會重編 pyproject.toml 跟 requirement.txt，再看效果

CC @mileschangmoda

…w/PETsARD into 332-executable-smartnoise

matheme-justyn · 2024-03-08T06:10:18Z

我本機測試完畢，並做出改動 - b536c87

測試
- smartnoise-aim: 通過
- smartnoise-mst: 通過
- smartnosie-mwem: MemeryError 已知問題，未來不使用
- smartnoise-pacsynth: NaN 已知問題，未來不使用
- smartnoise-dpctgan: 通過
- smartnoise-patectgan: 通過
修改
- requirement.txt: 使用 pip freeze 導出
- requirement-dev.txt: 刪除
- poetry.lock: 刪除
- pyproject.toml: 移除跟 Poetry 有關的章節

以後必須要以 requirement.txt 為核心，暫時不以 pyproject.toml 管理版本
下一步我會用 SageMaker 從零安裝 requirement.txt 測試

matheme-justyn · 2024-03-08T06:22:48Z

SageMaker 報錯 - a6d4b2a

> !pip install -r ../../requirements.txt
ERROR: Ignored the following versions that require a different python version: 0.1.3 Requires-Python >=3.6,<3.9; 0.1.3.dev0 Requires-Python >=3.6,<3.9; 0.1.3.dev1 Requires-Python >=3.6,<3.9; 0.1.4 Requires-Python >=3.6,<3.9; 0.1.4.dev0 Requires-Python >=3.6,<3.9; 0.2.0 Requires-Python >=3.6,<3.9; 0.2.0.dev0 Requires-Python >=3.6,<3.9; 0.2.1 Requires-Python >=3.6,<3.9; 0.2.1.dev0 Requires-Python >=3.6,<3.9; 0.2.2 Requires-Python >=3.6,<3.9; 0.2.2.dev0 Requires-Python >=3.6,<3.9; 0.2.2.dev1 Requires-Python >=3.5,<3.9; 0.2.2.dev2 Requires-Python >=3.6,<3.9; 0.2.2.dev3 Requires-Python >=3.6,<3.9; 0.3.0 Requires-Python >=3.6,<3.10; 0.3.0 Requires-Python >=3.6,<3.9; 0.3.0.dev0 Requires-Python >=3.5,<3.9; 0.3.0.dev0 Requires-Python >=3.6,<3.10; 0.3.0.dev1 Requires-Python >=3.6,<3.9; 0.3.0.post1 Requires-Python >=3.6,<3.10; 0.3.1 Requires-Python >=3.5,<3.8; 0.3.1 Requires-Python >=3.6,<3.9; 0.3.1.dev0 Requires-Python >=3.5,<3.8; 0.3.1.dev0 Requires-Python >=3.6,<3.9; 0.3.1.dev1 Requires-Python >=3.6,<3.9; 0.3.1.dev2 Requires-Python >=3.6,<3.9; 0.3.2 Requires-Python >=3.5,<3.9; 0.3.2.dev0 Requires-Python >=3.5,<3.8; 0.3.2.dev0 Requires-Python >=3.6,<3.9; 0.3.2.dev1 Requires-Python >=3.5,<3.9; 0.3.3 Requires-Python >=3.5,<3.9; 0.3.3.dev0 Requires-Python >=3.5,<3.9; 0.4.0 Requires-Python >=3.5,<3.9; 0.4.0 Requires-Python >=3.6,<3.9; 0.4.0.dev0 Requires-Python >=3.5,<3.9; 0.4.0.dev0 Requires-Python >=3.6,<3.9; 0.4.0.dev1 Requires-Python >=3.6,<3.9; 0.4.1 Requires-Python >=3.6,<3.9; 0.4.1.dev0 Requires-Python >=3.6,<3.9; 0.4.1.dev1 Requires-Python >=3.6,<3.9; 0.4.2 Requires-Python >=3.6,<3.9; 0.4.2.dev0 Requires-Python >=3.6,<3.9; 0.4.3 Requires-Python >=3.6,<3.9; 0.4.3.dev0 Requires-Python >=3.6,<3.9; 0.4.3.dev1 Requires-Python >=3.6,<3.9; 0.4.4.dev0 Requires-Python >=3.6,<3.9; 0.5.0 Requires-Python >=3.6,<3.10; 0.5.0 Requires-Python >=3.6,<3.9; 0.5.0.dev0 Requires-Python >=3.6,<3.9; 0.5.0.dev1 Requires-Python >=3.6,<3.10; 0.5.0.dev1 Requires-Python >=3.6,<3.9; 0.5.1 Requires-Python >=3.6,<3.10; 0.5.1 Requires-Python >=3.6,<3.9; 0.5.1.dev0 Requires-Python >=3.6,<3.10; 0.5.1.dev0 Requires-Python >=3.6,<3.9; 0.5.1.dev1 Requires-Python >=3.6,<3.10; 0.5.1.dev1 Requires-Python >=3.6,<3.9; 0.5.1.dev2 Requires-Python >=3.6,<3.10; 0.5.1.dev3 Requires-Python >=3.6,<3.10; 0.5.2 Requires-Python >=3.6,<3.10; 0.5.2.dev0 Requires-Python >=3.6,<3.10; 0.5.2.dev0 Requires-Python >=3.6,<3.9; 0.5.2.dev1 Requires-Python >=3.6,<3.10; 0.5.3.dev0 Requires-Python >=3.6,<3.10; 0.6.0 Requires-Python >=3.6,<3.10; 0.6.0.dev0 Requires-Python >=3.6,<3.10; 0.6.1 Requires-Python >=3.6,<3.10; 0.6.1.dev0 Requires-Python >=3.6,<3.10; 0.7.0 Requires-Python >=3.6,<3.10; 0.7.0.dev0 Requires-Python >=3.6,<3.10
ERROR: Could not find a version that satisfies the requirement pywin32==306 (from versions: none)
ERROR: No matching distribution found for pywin32==306

實際執行後 SDV 沒裝起來

matheme-justyn · 2024-03-08T06:35:13Z

更新: 移除對 pywin32 的需求，試圖在 SageMaker 繼續安裝其他地方，仍然失敗 - 364e860

> !pip install -r ../../requirements.txt
Requirement already satisfied: pexpect>4.3 in /home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages (from ipython==8.22.2->-r ../../requirements.txt (line 29)) (4.9.0)
INFO: pip is looking at multiple versions of rdt to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r ../../requirements.txt (line 66) and Faker==15.3.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested Faker==15.3.4
    rdt 1.9.2 depends on Faker<20 and >=17

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

由於不能靠 requirements.txt 完整安裝當前功能的依賴套件，我擔心這件事情不應該跟使用者說「你就照順序 pip install 就好」，我尋求 @mileschangmoda 的建議，在解決 requirement.txt 的疑慮之前，我傾向不 approval 這個 PR

ghost · 2024-03-08T06:46:56Z

如果使用 conda 讀入 requirements.txt 建立環境，會出現以下錯誤訊息：

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides requested //github.com/ryan112358/private-pgm.git@5b9126295c110b741e5426ddbff419ea1e60e788
  - nothing provides requested anonymeter 1.0.0
  - nothing provides requested graphviz 0.17
  - nothing provides requested opacus 0.14.0
  - nothing provides requested opendp 0.8.0
  - nothing provides requested pac-synth 0.0.8
  - nothing provides requested prompt-toolkit 3.0.43
  - nothing provides requested pure-eval 0.2.2
  - nothing provides requested python-dateutil 2.9.0.post0
  - nothing provides requested smartnoise-sql 1.0.3
  - nothing provides requested smartnoise-synth 1.0.3
  - nothing provides requested stack-data 0.6.3
  - nothing provides requested torch 2.2.1
  - nothing provides requested tzdata 2024.1
  - package rdt-1.9.2-pyhd8ed1ab_0 requires faker >=17,<20, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ //github.com/ryan112358/private-pgm.git@5b9126295c110b741e5426ddbff419ea1e60e788 does not exist (perhaps a typo or a missing channel);
├─ anonymeter 1.0.0  does not exist (perhaps a typo or a missing channel);
├─ faker 15.3.4  is requested and can be installed;
├─ graphviz 0.17  does not exist (perhaps a typo or a missing channel);
├─ opacus 0.14.0  does not exist (perhaps a typo or a missing channel);
├─ opendp 0.8.0  does not exist (perhaps a typo or a missing channel);
├─ pac-synth 0.0.8  does not exist (perhaps a typo or a missing channel);
├─ prompt-toolkit 3.0.43  does not exist (perhaps a typo or a missing channel);
├─ pure-eval 0.2.2  does not exist (perhaps a typo or a missing channel);
├─ python-dateutil 2.9.0.post0  does not exist (perhaps a typo or a missing channel);
├─ rdt 1.9.2  is not installable because it requires
│  └─ faker >=17,<20 , which conflicts with any installable versions previously reported;
├─ smartnoise-sql 1.0.3  does not exist (perhaps a typo or a missing channel);
├─ smartnoise-synth 1.0.3  does not exist (perhaps a typo or a missing channel);
├─ stack-data 0.6.3  does not exist (perhaps a typo or a missing channel);
├─ torch 2.2.1  does not exist (perhaps a typo or a missing channel);
└─ tzdata 2024.1  does not exist (perhaps a typo or a missing channel).

mileschangmoda · 2024-03-08T06:57:32Z

「你就照順序 pip install 就好」

這是可行的，但必須確保 pip install 可以保持不變
例如改用 pip install <package_name>== 等

matheme-justyn · 2024-03-11T02:16:27Z

version conflict have been solved in #355 (CI check) and will be in #349 (README.md), so after discussion, I believe we can merge this branch.

matheme-justyn

LGTM

Alex Chen added 5 commits March 5, 2024 10:34

Explore the bug

1801d9b

change the output of transform to categorical

5f93456

edit TableTransformer in smartnoise

8994fc6

Edit inverse_transform dtype

987447e

demo file

af94b5e

ghost added the bug Something isn't working label Mar 5, 2024

ghost added this to the 20240314, User Story beta testing milestone Mar 5, 2024

ghost requested a review from matheme-justyn March 5, 2024 08:19

ghost self-assigned this Mar 5, 2024

ghost linked an issue Mar 5, 2024 that may be closed by this pull request

Executable Smartnoise on adult #332

Closed

remove mwem part

2be8c03

Alex Chen added 4 commits March 7, 2024 10:42

add GAN method

cb0ce89

revert transform output type to numerical rather than categorical

279524d

demo update

b426ed3

update doc for the new methods

b79a33a

ghost added the enhancement New feature or request label Mar 7, 2024

20240307, 2024-01-11-Synthesizer.md by #350

08eb8da

matheme-justyn requested changes Mar 7, 2024

View reviewed changes

docs/_posts/2024-01-11-Synthesizer.md Show resolved Hide resolved

20240308, poetry lock

2e8ddd8

20240308, update poetry require poetry install

6817af1

Merge branch 'dev' into 332-executable-smartnoise

e3b2dda

matheme-justyn added 3 commits March 8, 2024 13:54

20240308, reset requirements.txt and related work

b536c87

20240308, reset requirements.txt and related work

8b79339

Merge branch '332-executable-smartnoise' of https://github.com/nics-t…

357aebf

…w/PETsARD into 332-executable-smartnoise

20240308, failed test on SageMaker

a6d4b2a

20240308, remove pywin32

364e860

matheme-justyn approved these changes Mar 11, 2024

View reviewed changes

matheme-justyn merged commit 3069eb2 into dev Mar 11, 2024

matheme-justyn deleted the 332-executable-smartnoise branch March 11, 2024 02:16

matheme-justyn mentioned this pull request Mar 11, 2024

Executable Smartnoise on adult #332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

332 executable smartnoise #341

332 executable smartnoise #341

ghost commented Mar 5, 2024 •

edited by ghost

Loading

ghost commented Mar 5, 2024

ghost commented Mar 7, 2024

matheme-justyn commented Mar 7, 2024

matheme-justyn left a comment •

edited

Loading

matheme-justyn commented Mar 8, 2024

ghost commented Mar 8, 2024

ghost commented Mar 8, 2024

matheme-justyn commented Mar 8, 2024 •

edited

Loading

ghost commented Mar 8, 2024

matheme-justyn commented Mar 8, 2024 •

edited

Loading

matheme-justyn commented Mar 8, 2024 •

edited

Loading

matheme-justyn commented Mar 8, 2024

matheme-justyn commented Mar 8, 2024 •

edited

Loading

ghost commented Mar 8, 2024

mileschangmoda commented Mar 8, 2024

matheme-justyn commented Mar 11, 2024

matheme-justyn left a comment

332 executable smartnoise #341

332 executable smartnoise #341

Conversation

ghost commented Mar 5, 2024 • edited by ghost Loading

ghost commented Mar 5, 2024

ghost commented Mar 7, 2024

smartnoise GAN 系 synthesizer 可用 （dpctgan, pategan）

matheme-justyn commented Mar 7, 2024

matheme-justyn left a comment • edited Loading

Choose a reason for hiding this comment

matheme-justyn commented Mar 8, 2024

ghost commented Mar 8, 2024

ghost commented Mar 8, 2024

matheme-justyn commented Mar 8, 2024 • edited Loading

ghost commented Mar 8, 2024

matheme-justyn commented Mar 8, 2024 • edited Loading

matheme-justyn commented Mar 8, 2024 • edited Loading

matheme-justyn commented Mar 8, 2024

matheme-justyn commented Mar 8, 2024 • edited Loading

ghost commented Mar 8, 2024

mileschangmoda commented Mar 8, 2024

matheme-justyn commented Mar 11, 2024

matheme-justyn left a comment

Choose a reason for hiding this comment

ghost commented Mar 5, 2024 •

edited by ghost

Loading

smartnoise GAN 系 synthesizer 可用（dpctgan, pategan）

matheme-justyn left a comment •

edited

Loading

matheme-justyn commented Mar 8, 2024 •

edited

Loading

matheme-justyn commented Mar 8, 2024 •

edited

Loading

matheme-justyn commented Mar 8, 2024 •

edited

Loading

matheme-justyn commented Mar 8, 2024 •

edited

Loading