Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add OSSBackend for reading data from aliyun oss #834

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

liuxianyi
Copy link

@liuxianyi liuxianyi commented Dec 16, 2022

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

currently, mmengine can't support the OSS of Aliyun, so I pull a request to add this function.

Modification

  1. modify several files under tests/test_fileio, specifically, mmengine/fileio/backends/OSS_backend.py for function-related read and write. mmengine/fileio/file_client.py add into '_prefix_to_backends'.
  2. for new demand, add a test unit in tests/test_fileio/test_backends/test_OSS_backend.py for ensuring the function formally operate.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@CLAassistant
Copy link

CLAassistant commented Dec 16, 2022

CLA assistant check
All committers have signed the CLA.

@zhouzaida zhouzaida changed the title liuxianyi/oss backend [CodeCamp #1490] Add OSSBackend for reading data from aliyun oss Dec 18, 2022
@zhouzaida
Copy link
Collaborator

zhouzaida commented Dec 18, 2022

Hi @liuxianyi , thanks for your contribution. Added __init__.py files in tests are not required and can be removed.

@liuxianyi
Copy link
Author

ok, but I don't how can I test those units without init.py.
i always encounter the following issue:

ImportError: Failed to import test module: test_fileio
Traceback (most recent call last):
  File "/home/goog/miniconda3/envs/torch/lib/python3.7/unittest/loader.py", line 154, in loadTestsFromName
    module = __import__(module_name)
ModuleNotFoundError: No module named 'tests.test_fileio'

so, can you give a solution?

@zhouzaida
Copy link
Collaborator

What command did you run to test unit tests?

@liuxianyi
Copy link
Author

this causes error.

python -m unittest tests/test_fileio/test_backends/test_OSS_backend.py

@zhouzaida
Copy link
Collaborator

this causes error.

python -m unittest tests/test_fileio/test_backends/test_OSS_backend.py

You can directly run pytest tests/test_fileio/test_backends/test_OSS_backend.py -s -v.

@codecov
Copy link

codecov bot commented Dec 19, 2022

Codecov Report

Base: 78.66% // Head: 78.88% // Increases project coverage by +0.21% 🎉

Coverage data is based on head (1bb877f) compared to base (fe26c65).
Patch coverage: 68.26% of modified lines in pull request are covered.

❗ Current head 1bb877f differs from pull request most recent head 2b8526d. Consider uploading reports for the commit 2b8526d to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #834      +/-   ##
==========================================
+ Coverage   78.66%   78.88%   +0.21%     
==========================================
  Files         128      129       +1     
  Lines        9348     9485     +137     
  Branches     1848     1877      +29     
==========================================
+ Hits         7354     7482     +128     
- Misses       1679     1684       +5     
- Partials      315      319       +4     
Flag Coverage Δ
unittests 78.88% <68.26%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmengine/fileio/file_client.py 92.37% <ø> (ø)
mmengine/registry/build_functions.py 80.20% <ø> (ø)
mmengine/registry/default_scope.py 100.00% <ø> (ø)
mmengine/visualization/vis_backend.py 84.49% <ø> (ø)
mmengine/visualization/visualizer.py 93.19% <ø> (ø)
mmengine/runner/checkpoint.py 52.07% <11.11%> (+8.11%) ⬆️
mmengine/model/weight_init.py 35.63% <33.33%> (+0.12%) ⬆️
mmengine/optim/optimizer/optimizer_wrapper.py 62.06% <71.42%> (-0.44%) ⬇️
mmengine/fileio/backends/oss_backend.py 77.87% <77.87%> (ø)
mmengine/fileio/backends/__init__.py 100.00% <100.00%> (ø)
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@liuxianyi
Copy link
Author

this causes error.

python -m unittest tests/test_fileio/test_backends/test_OSS_backend.py

You can directly run pytest tests/test_fileio/test_backends/test_OSS_backend.py -s -v.

i get the same error:

OpenMMLab/mmengine$ pytest tests/test_fileio/test_backends/test_oss_backend.py -s -v
============================================================== test session starts ==============================================================
platform linux -- Python 3.7.11, pytest-7.2.0, pluggy-1.0.0 -- /home/goog/miniconda3/envs/torch/bin/python
cachedir: .pytest_cache
rootdir: /media/Harddisk/goog/OpenMMLab/mmengine, configfile: pytest.ini
collected 0 items / 1 error                                                                                                                     

==================================================================== ERRORS =====================================================================
_____________________________________ ERROR collecting tests/test_fileio/test_backends/test_oss_backend.py ______________________________________
ImportError while importing test module '/media/Harddisk/goog/OpenMMLab/mmengine/tests/test_fileio/test_backends/test_oss_backend.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/goog/miniconda3/envs/torch/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_fileio/test_backends/test_oss_backend.py:8: in <module>
    from mmengine.fileio.backends import OSSBackend
E   ModuleNotFoundError: No module named 'mmengine'
============================================================ short test summary info ============================================================
ERROR tests/test_fileio/test_backends/test_oss_backend.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================== 1 error in 0.05s ================================================================

@zhouzaida
Copy link
Collaborator

Have you installed mmengine in your local environment?

@liuxianyi
Copy link
Author

Have you installed mmengine in your local environment?
I understand, thank you very much.
That is to say, I need to install the package that I had modified. Then run the test unit.

@zhouzaida
Copy link
Collaborator

Yes, you need to install mmengine with pip install -e . -v.

@liuxianyi
Copy link
Author

Yes, you need to install mmengine with pip install -e . -v.

ok, thank you

@liuxianyi
Copy link
Author

get the following error which didn't cause by my code modified by me.

AttributeError: module 'numpy' has no attribute 'bool'

截屏2022-12-20 13 25 39

@HAOCHENYE
Copy link
Collaborator

You can update the branch with the latest main.

@HAOCHENYE HAOCHENYE added this to the 0.5.0 milestone Jan 12, 2023
mmengine/fileio/backends/oss_backend.py Show resolved Hide resolved
mmengine/fileio/backends/oss_backend.py Outdated Show resolved Hide resolved
mmengine/fileio/backends/oss_backend.py Show resolved Hide resolved
mmengine/fileio/backends/oss_backend.py Outdated Show resolved Hide resolved
@zhouzaida zhouzaida modified the milestones: 0.5.0, 0.7.1 Mar 13, 2023
Comment on lines +39 to +41
access_key_id: Optional[str] = None,
access_key_secret: Optional[str] = None,
path_mapping: Optional[dict] = None):
Copy link
Collaborator

@zhouzaida zhouzaida Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be convenient if an endpoint is passed as an initialization parameter. Users can use the interface without an endpoint.

from mmengine.fileio import OSSBackend

backend = OSSBackend(
    access_key_id=xxx',
    access_key_secret='xxx',
    endpoint='oss-cn-hangzhou.aliyuncs.com')
# this can cover most user cases
backend.put_text('hello world!', 'oss://mmengine-test/text1.txt')
# also support the URL containing the endpoint
# users maybe want to use another endpoint so users can pass an endpoint
backend.put_text('hello world!', 'oss://oss-cn-hanzhou.aliyuncs.com/mmengine-test/text1.txt')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,but it may occur that the Endpoint conflicts with folder names, resulting in regular match failures.

Copy link
Collaborator

@zhouzaida zhouzaida Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an access_key_id correspond to multiple endpoints? If not, the file path can not contain an endpoint.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an access_key_id may be for multiple endpoints, so I think that the endpoint needs to be considered.

Comment on lines 536 to 538
root_path = obj_name
if root_path and not root_path.endswith('/'):
raise TypeError('`dir_path` must endswith "/" ')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After implementing the is_dir method, there is no need to require root_path ending with /.

@zhouzaida zhouzaida changed the title [CodeCamp #1490] Add OSSBackend for reading data from aliyun oss [Feature] Add OSSBackend for reading data from aliyun oss Mar 19, 2023
f'does not equals src file size {total_size}, please recopy.'
return str(dst)

def list_dir_or_file(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this method also support oss://endpoint/bucket/?

Comment on lines +396 to +402
cls.oss_dir = 'oss://oss-cn-hangzhou.aliyuncs.com/mmengine/'
local_oss_ak_path = cls.test_data_dir / 'AccessKey.csv'
access_key = pd.read_csv(local_oss_ak_path, sep=',')
cls.access_key_id = access_key.loc[0, 'AccessKey ID']
cls.access_key_secret = access_key.loc[0, 'AccessKey Secret']
print(f'access_key_id: {cls.access_key_id},\
access_key_secret: {cls.access_key_secret}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be convenient to read them from environments and then other developers can also test this unit test.

cls.oss_dir = os.getenv('MMEngine_OSS_TEST_DIR')
cls.access_key_id = os.getenv('MMENGINE_OSS_ACCESS_KEY_ID')
cls.access_key_secret = os.getenv('MMENGINE_OSS_ACCESS_KEY_SECRET')

@OpenMMLab-Assistant001
Copy link

Hi @liuxianyi !We are grateful for your efforts in helping improve this open-source project during your personal time.

Welcome to join OpenMMLab Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA
If you have a WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:)

Thank you again for your contribution❤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants