Native hdfs client offers much better performance than webhdfs clients. However calling native client for hadoop operations have an additional overhead of starting jvm. pyhdfs-client brings the performance of native hdfs client without any overhead of starting jvm on every command execution.
- HDFS client for python
- Easy to integrate with python applications
- Better Performance than webhdfs clients
- Provide native hadoop client performance without any overhead
- Support both UNIX and Windows
- Multiple instances of HDFS client enabled.
- [fix] Temporary folder deletion
- [fix] Java process shutdown issues on UNIX
pip install pyhdfs-client
Requirements: hadoop binaries and py4j installed
>>> from pyhdfs_client.pyhdfs_client import HDFSClient
>>> hdfs_client = HDFSClient()
>>> ret, out, err = hdfs_client.run(['-ls', '/'])
>>> print(out)
Found 1 items
drwxr-xr-x - gp supergroup 0 2021-03-21 01:10 /f1
>>> hdfs_client.stop() # to terminate hdfs client
- Any contribution for enhancements and bug fixes is welcome.
- This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.