Releases: vepadulano/PyRDF
Releases · vepadulano/PyRDF
PyRDF 0.2.1
Executive Release Notes
Minor release to adapt to latest changes in PyROOT.
Improvements
- Updated CI environments to use ROOT 6.20.
- Make sure all the arguments to the
RDataFrame
constructor are correctly passed to the workers in a Spark environment. Thanks to @watson-ij.
Bugfixes
- Force the use of python strings as arguments to PyROOT functions to avoid an impossible conversion from
std::string
toconst char*
on the C++ side.
PyRDF 0.2.0
Executive Release Notes
New
- Added support for friend trees in a distributed environment. The information about the friend tree's name and filename(s) is retrieved during the Map phase and a TChain is built with it. This TChain is then added to the TChain of the main tree before initializing the RDataFrame.
- Logging capabilities have been added to PyRDF. Call the function
PyRDF.create_logger
in a python script to start seeing log outputs! - PyRDF docs are now hosted on Read the Docs! Go check them out at https://pyrdf.readthedocs.io/en/latest/
- Initial support for different
RDataFrame
operations in a distributed environment:Count
Sum
Snapshot
- Initial support for the
AsNumpy
pythonization ofRDataFrame
in a distributed environment.
Improvements
distribute_files
function that defines how a backend should send files (headers, libraries etc.) to the workers is now an abstract method of theDist
class.- Paths to all the files needed for the analysis are now stored in
set
instead oflist
. The functions that deal with retrieving those paths have been changed accordingly to use the correct methods. - PyRDF is now completely free of dot notation for
import
statements. - Attribute errors on
RDataFrame
orOperation
instances now show the correct class name instead of the PyRDF specific implementation (e.g.HeadNode
forRDataFrame
), see #62 for examples. - Switching to a
Local
backend after a first trigger of the computational graph in aSpark
backend now also stops theSparkContext
. - The
ROOT.gDirectory
value now doesn't get changed by the execution of the computational graph. This solves an issue with writing histograms coming from a distributed computation to aROOT.TFile
opened before triggering the graph.
Bugfixes
- When a dataframe with zero entries is being processed in a distributed environment, the execution fallsback to the local environment instead.
- When a header is sent to a Spark worker its directory is correctly added to the list of paths in which ROOT looks for headers (ROOT.IncludePath).
- Improvements in docstrings and PEP8 compliance.
- The
npartitions
attribute of theSpark
class now is synchronized with the changes that may happen during a distributed execution if the number of clusters in the ROOT file is less than the number of partitions issued by the user. - Fix an issue with pickle protocol 2 when pickling an instance of
ROOT.ndarray
PyRDF 0.1.0
Executive release notes
Initial functional release of a python wrapper
around ROOT's RDataFrame with supported for the
Spark distributed backend.
New
- Fully compatible with Python 2.7 and 3.7
- Run in a python notebook on SWAN connected to a Spark cluster
- New tutorials, synchronized with ROOT RDataFrame tutorials
- New documentation to show the usage of PyRDF on SWAN
- Users can send C++ headers and shared libraries needed for their
analysis to the Spark executors and use them during distributed
execution - Documentation available on GitHub Pages
Improvements
- Improve logic for the management of the computational graph. Now
it is Python version independent and sends to the distributed
workers only the minimal information required for the execution
of the operations on the RDataFrame
Bugfixes
- Python 3 bugs:
- Import statements now use paths relative to the main folder of
the project - Integers previously declared as
long
now are only integers - Division of integers is now correctly declared as
floor()
division
- Import statements now use paths relative to the main folder of