Skip to content

Releases: vepadulano/PyRDF

PyRDF 0.2.1

07 Apr 12:39
70e6f4a
Compare
Choose a tag to compare

Executive Release Notes

Minor release to adapt to latest changes in PyROOT.

Improvements

  • Updated CI environments to use ROOT 6.20.
  • Make sure all the arguments to the RDataFrame constructor are correctly passed to the workers in a Spark environment. Thanks to @watson-ij.

Bugfixes

  • Force the use of python strings as arguments to PyROOT functions to avoid an impossible conversion from std::string to const char* on the C++ side.

PyRDF 0.2.0

10 Sep 09:12
5b6d188
Compare
Choose a tag to compare

Executive Release Notes

New

  • Added support for friend trees in a distributed environment. The information about the friend tree's name and filename(s) is retrieved during the Map phase and a TChain is built with it. This TChain is then added to the TChain of the main tree before initializing the RDataFrame.
  • Logging capabilities have been added to PyRDF. Call the function PyRDF.create_logger in a python script to start seeing log outputs!
  • PyRDF docs are now hosted on Read the Docs! Go check them out at https://pyrdf.readthedocs.io/en/latest/
  • Initial support for different RDataFrame operations in a distributed environment:
    • Count
    • Sum
    • Snapshot
  • Initial support for the AsNumpy pythonization of RDataFrame in a distributed environment.

Improvements

  • distribute_files function that defines how a backend should send files (headers, libraries etc.) to the workers is now an abstract method of the Dist class.
  • Paths to all the files needed for the analysis are now stored in set instead of list. The functions that deal with retrieving those paths have been changed accordingly to use the correct methods.
  • PyRDF is now completely free of dot notation for import statements.
  • Attribute errors on RDataFrame or Operation instances now show the correct class name instead of the PyRDF specific implementation (e.g. HeadNode for RDataFrame), see #62 for examples.
  • Switching to a Local backend after a first trigger of the computational graph in a Spark backend now also stops the SparkContext.
  • The ROOT.gDirectory value now doesn't get changed by the execution of the computational graph. This solves an issue with writing histograms coming from a distributed computation to a ROOT.TFile opened before triggering the graph.

Bugfixes

  • When a dataframe with zero entries is being processed in a distributed environment, the execution fallsback to the local environment instead.
  • When a header is sent to a Spark worker its directory is correctly added to the list of paths in which ROOT looks for headers (ROOT.IncludePath).
  • Improvements in docstrings and PEP8 compliance.
  • The npartitions attribute of the Spark class now is synchronized with the changes that may happen during a distributed execution if the number of clusters in the ROOT file is less than the number of partitions issued by the user.
  • Fix an issue with pickle protocol 2 when pickling an instance of ROOT.ndarray

PyRDF 0.1.0

09 May 09:21
Compare
Choose a tag to compare

Executive release notes

Initial functional release of a python wrapper
around ROOT's RDataFrame with supported for the
Spark distributed backend.

New

  • Fully compatible with Python 2.7 and 3.7
  • Run in a python notebook on SWAN connected to a Spark cluster
  • New tutorials, synchronized with ROOT RDataFrame tutorials
  • New documentation to show the usage of PyRDF on SWAN
  • Users can send C++ headers and shared libraries needed for their
    analysis to the Spark executors and use them during distributed
    execution
  • Documentation available on GitHub Pages

Improvements

  • Improve logic for the management of the computational graph. Now
    it is Python version independent and sends to the distributed
    workers only the minimal information required for the execution
    of the operations on the RDataFrame

Bugfixes

  • Python 3 bugs:
    • Import statements now use paths relative to the main folder of
      the project
    • Integers previously declared as long now are only integers
    • Division of integers is now correctly declared as floor() division