v2.7.0-rc1
Pre-releaseRelease Notes
✨ Highlights
🚅 Rework Pipeline.run()
logic to better handle cycles
Pipeline.run()
internal logic has been heavily reworked to be more robust and reliable than before. This new implementation makes it easier to run Pipeline
s that have cycles in their graph. It also fixes some corner cases in Pipeline
s that don't have any cycle.
📝 Introduce LoggingTracer
With the new LoggingTracer
, users can inspect in the logs everything that is happening in their Pipelines in real time. This feature aims to improve the user experience during experimentation and prototyping.
⬆️ Upgrade Notes
-
Removed
Pipeline
init argumentdebug_path
. We do not support this anymore. -
Removed
Pipeline
init argumentmax_loops_allowed
. Usemax_runs_per_component
instead. -
Removed
PipelineMaxLoops
exception. UsePipelineMaxComponentRuns
instead. -
The deprecated default converter class
haystack.components.converters.pypdf.DefaultConverter
used byPyPDFToDocument
has been removed.Pipeline YAMLs from
haystack<2.7.0
that use the default converter must be updated in the following manner:# Old components: Comp1: init_parameters: converter: type: haystack.components.converters.pypdf.DefaultConverter type: haystack.components.converters.pypdf.PyPDFToDocument # New components: Comp1: init_parameters: converter: null type: haystack.components.converters.pdf.PDFToTextConverter
Pipeline YAMLs from
haystack<2.7.0
that use custom converter classes can be upgraded by simply loading them withhaystack==2.6.x
and saving them to YAML again. -
Pipeline.connect()
will now raise aPipelineConnectError
ifsender
andreceiver
are the same Component. We do not support this use case anymore.
🚀 New Features
-
Added component
StringJoiner
to join strings from different components to a list of strings. -
Improved serialization/deserialization errors to provide extra context about the delinquent components when possible.
-
Enhanced DOCX converter to support table extraction in addition to paragraph content. The converter supports both CSV and Markdown table formats, providing flexible options for representing tabular data extracted from DOCX documents.
-
Added a new parameter
additional_mimetypes
to the FileTypeRouter component.This allows users to specify additional MIME type mappings, ensuring correct
file classification across different runtime environments and Python versions.
-
Introduce a
LoggingTracer
, that sends all traces to the logs.It can enabled as follows:
import logging from haystack import tracing from haystack.tracing.logging_tracer import LoggingTracer logging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING) logging.getLogger("haystack").setLevel(logging.DEBUG) tracing.tracer.is_content_tracing_enabled = True # to enable tracing/logging content (inputs/outputs) tracing.enable_tracing(LoggingTracer())
-
Fundamentally rework the internal logic of
Pipeline.run()
. The rework makes it more reliable and covers more use cases. We fixed some issues that madePipeline
s with cycles unpredictable and with unclear Components execution order. -
Each tracing span of a component run is now attached with the pipeline run span object. This allows users to trace the execution of multiple pipeline runs concurrently.
⚡️ Enhancement Notes
- Add
streaming_callback
run parameter toHuggingFaceAPIGenerator
andHuggingFaceLocalGenerator
to allow users to pass a callback function that will be called after each chunk of the response is generated. - The
SentenceWindowRetriever
now supports thewindow_size
parameter at run time, overwriting the value set in the constructor. - Add output type validation in
ConditionalRouter
. Settingvalidate_output_type
toTrue
will enable a check to verify if the actual output of a route returns the declared type. If it doesn't match aValueError
is raised. - Reduced
numpy
usage to speed up imports. - Improved file type detection in
FileTypeRouter
, particularly for Microsoft Office file formats like .docx and .pptx. This enhancement ensures more consistent behavior across different environments, including AWS Lambda functions and systems without pre-installed office suites. - The
FiletypeRouter
now supports passing metadata (meta
) in therun
method. When metadata is provided, the sources are internally converted toByteStream
objects and the metadata is added. This new parameter simplifies working with preprocessing/indexing pipelines. SentenceTransformersDocumentEmbedder
now supportsconfig_kwargs
for additional parameters when loading the model configurationSentenceTransformersTextEmbedder
now supportsconfig_kwargs
for additional parameters when loading the model configuration- Previously,
numpy
was pinned to<2.0
to avoid compatibility issues in several core integrations. This pin has been removed, and haystack can work with bothnumpy
1.x
and2.x
. If necessary, we will pinnumpy
version in specific core integrations that require it.
⚠️ Deprecation Notes
- The
DefaultConverter
class used by thePyPDFToDocument
component has been deprecated. Its functionality will be merged into the component in 2.7.0.
🐛 Bug Fixes
- Serialized data of components are now explicitly enforced to be one of the following basic Python datatypes:
str
,int
,float
,bool
,list
,dict
,set
,tuple
orNone
. - Addressed an issue where certain file types (e.g., .docx, .pptx) were incorrectly classified as 'unclassified' in environments with limited MIME type definitions, such as AWS Lambda functions.
- Fixes logs containing JSON data getting lost due to string interpolation.
- Use forward references for Hugging Face Hub types in the
HuggingFaceAPIGenerator
component to prevent import errors. - Fix the serialization of
PyPDFToDocument
component to prevent the default converter from being serialized unnecessarily. - Revert change to
PyPDFConverter
that broke the deserialization of pre2.6.0
YAMLs.