07 Apr 12:39

vepadulano

PyRDF 0.2.1 Latest

Latest

Executive Release Notes

Minor release to adapt to latest changes in PyROOT.

Improvements

Updated CI environments to use ROOT 6.20.
Make sure all the arguments to the RDataFrame constructor are correctly passed to the workers in a Spark environment. Thanks to @watson-ij.

Bugfixes

Force the use of python strings as arguments to PyROOT functions to avoid an impossible conversion from std::string to const char* on the C++ side.

Assets 2

10 Sep 09:12

JavierCVilla

PyRDF 0.2.0

Executive Release Notes

New

Added support for friend trees in a distributed environment. The information about the friend tree's name and filename(s) is retrieved during the Map phase and a TChain is built with it. This TChain is then added to the TChain of the main tree before initializing the RDataFrame.
Logging capabilities have been added to PyRDF. Call the function PyRDF.create_logger in a python script to start seeing log outputs!
PyRDF docs are now hosted on Read the Docs! Go check them out at https://pyrdf.readthedocs.io/en/latest/
Initial support for different RDataFrame operations in a distributed environment:
- Count
- Sum
- Snapshot
Initial support for the AsNumpy pythonization of RDataFrame in a distributed environment.

Improvements

distribute_files function that defines how a backend should send files (headers, libraries etc.) to the workers is now an abstract method of the Dist class.
Paths to all the files needed for the analysis are now stored in set instead of list. The functions that deal with retrieving those paths have been changed accordingly to use the correct methods.
PyRDF is now completely free of dot notation for import statements.
Attribute errors on RDataFrame or Operation instances now show the correct class name instead of the PyRDF specific implementation (e.g. HeadNode for RDataFrame), see #62 for examples.
Switching to a Local backend after a first trigger of the computational graph in a Spark backend now also stops the SparkContext.
The ROOT.gDirectory value now doesn't get changed by the execution of the computational graph. This solves an issue with writing histograms coming from a distributed computation to a ROOT.TFile opened before triggering the graph.

Bugfixes

When a dataframe with zero entries is being processed in a distributed environment, the execution fallsback to the local environment instead.
When a header is sent to a Spark worker its directory is correctly added to the list of paths in which ROOT looks for headers (ROOT.IncludePath).
Improvements in docstrings and PEP8 compliance.
The npartitions attribute of the Spark class now is synchronized with the changes that may happen during a distributed execution if the number of clusters in the ROOT file is less than the number of partitions issued by the user.
Fix an issue with pickle protocol 2 when pickling an instance of ROOT.ndarray

Assets 2

09 May 09:21

dpiparo

PyRDF 0.1.0

Executive release notes

Initial functional release of a python wrapper
around ROOT's RDataFrame with supported for the
Spark distributed backend.

New

Fully compatible with Python 2.7 and 3.7
Run in a python notebook on SWAN connected to a Spark cluster
New tutorials, synchronized with ROOT RDataFrame tutorials
New documentation to show the usage of PyRDF on SWAN
Users can send C++ headers and shared libraries needed for their
analysis to the Spark executors and use them during distributed
execution
Documentation available on GitHub Pages

Improvements

Improve logic for the management of the computational graph. Now
it is Python version independent and sends to the distributed
workers only the minimal information required for the execution
of the operations on the RDataFrame

Bugfixes

Python 3 bugs:
- Import statements now use paths relative to the main folder of
  the project
- Integers previously declared as long now are only integers
- Division of integers is now correctly declared as floor() division

Assets 2