hotfix to ensure sample-names are always strings #49

charles-cowart · 2023-04-12T20:26:59Z

No description provided.

charles-cowart · 2023-04-12T21:16:48Z

Confirmed that Qiita dependency is failing to install due to codecov dependency having been removed from Pipy/pip:
home-assistant/core#91283. Looking into a solution.

charles-cowart · 2023-04-18T05:18:00Z

Codecov updated to use their new native binary. The stdout from the codecov step seems to imply that codecov isn't fully set up. Looking at the earlier version using the v1 shell script, it appears this was also the case so this PR won't be a step backwards. I will make it a low-priority issue in GitHub to get codecov running for this project.

charles-cowart · 2023-04-18T05:45:39Z

Hi @wasade! If you don't mind reviewing this since Antonio's still out, I'd greatly appreciate it, thanks! No rush.

qp_klp/process_amplicon_job.py

wasade · 2023-04-18T15:28:47Z

qp_klp/process_amplicon_job.py

                                       delimiter='\t',
-                                       index_col='sample_name').to_dict(
-                    'index')
+                                       index_col='sample_name')


index_col does not interact as expected with dtype. This will still incur a type conversion. Converting it to str after the fact is lossy. Please set remove, and set an explicit index with set_index after parse

>>> df = pd.read_csv('clinical_metadata_formatted.tsv', sep='\t', dtype=str, index_col='sample-id') >>> df.index[:5] Float64Index([10317.000106498, 10317.000107092, 10317.000106499, 10317.000107093, 10317.000106684], dtype='float64', name='sample-id')

@wasade I can see it - it didn't happen in the case that spawned this hot fix but in general if the index column gets interpreted as a numeric directly instead of getting casted to string first with the other columns before then being converted when it becomes the index, you would get potential data loss.

I think we're in agreement. Please do not allow the index column to be cast to numeric.

qp_klp/process_metagenomics_job.py

wasade

Please add a regression test that asserts that data from numeric-like index values is not lost. Specifically, construct a file to parse, which at a minimum contains "123.000" and "1e-3" as sample IDs, and assert any relevant parsing logic does not alter these IDs

charles-cowart · 2023-04-19T00:52:41Z

@wasade all tests, including the new one are passing. Ready for review!

wasade

I think this is okay thanks!

hotfix to ensure sample-names are always strings

56b13c7

charles-cowart added 5 commits April 12, 2023 16:41

Testing updates to code-coverage

29781c4

Fix test_klp test

486832d

test

b7123b0

test

9f21702

remove assertion

44bef64

charles-cowart requested review from antgonza and wasade April 18, 2023 05:18

Remove CI debugging stmt. Issue was w/Qiita.

c85dc7f

wasade requested changes Apr 18, 2023

View reviewed changes

regression-test added.

3625127

wasade approved these changes Apr 19, 2023

View reviewed changes

charles-cowart merged commit 6a2cbcb into qiita-spots:main Apr 19, 2023

charles-cowart deleted the sample_names_are_strings_hotfix branch April 19, 2023 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotfix to ensure sample-names are always strings #49

hotfix to ensure sample-names are always strings #49

charles-cowart commented Apr 12, 2023

charles-cowart commented Apr 12, 2023

charles-cowart commented Apr 18, 2023

charles-cowart commented Apr 18, 2023

wasade Apr 18, 2023

charles-cowart Apr 18, 2023

wasade Apr 18, 2023

wasade left a comment

charles-cowart commented Apr 19, 2023

wasade left a comment

hotfix to ensure sample-names are always strings #49

hotfix to ensure sample-names are always strings #49

Conversation

charles-cowart commented Apr 12, 2023

charles-cowart commented Apr 12, 2023

charles-cowart commented Apr 18, 2023

charles-cowart commented Apr 18, 2023

wasade Apr 18, 2023

Choose a reason for hiding this comment

charles-cowart Apr 18, 2023

Choose a reason for hiding this comment

wasade Apr 18, 2023

Choose a reason for hiding this comment

wasade left a comment

Choose a reason for hiding this comment

charles-cowart commented Apr 19, 2023

wasade left a comment

Choose a reason for hiding this comment