First Release of SEAL (2023-05-16) #187
alexander-schranz
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Schranz Search - First Release of SEAL
Monorepository for SEAL a Search Engine Abstraction Layer with support to different search engines
Documentation | Packages
Elasticsearch | Opensearch | Meilisearch | Algolia | Solr | Redisearch | Typesense
PHP | Symfony | Laravel | Spiral | Mezzio | Yii
Hello and welcome 👋,
About six month ago at the beginning of December 2022 I started the "Schranz-Search" project, which later out of that SEAL was born. At first more the project starteed as a research around different search engines which are around. At that time with a very limited knowledge about alternatives to Elasticsearch I was very curious what exists "beyond the tellerrand". With the support of different communities around Twitter, Reddit, Meetups, .. I could create a list of different search engines, and the list was bigger then expected and still grows.
My personally experience being a Core Developer at Sulu CMS a Symfony based CMS was limited to Elasticsearch. After having a look at the different search engines which did exists. I had to sortout which ones make sense to add to such an abstraction and are mostly used by the PHP community. Beside Opensearch, which should as a fork of Elasticsearch be a easy way to support, I did have a look at Algolia and Meilisearch and had so the first punch of search engines together I wanted to support. And so the start was created for SEAL the Search Engine Abstraction Layer.
Avoiding bringing complexity and search jargons to the end user
Search engines can be complex and they all have their own terms for different things. The target for the project was to hide the complexity of different search engines behind a easy understandable interface and so be very beginner friendly. The important part here was how the definition of the data which wanted to be added to the search engine need to be structured. Different search engines have different terms to define their mappings, fields, options, ... In the search engine abstraction layer wanted to avoid this kind of terms like
doc_values: true
,index: true
,TAGS
,keyword
or other special terms of the different search engines. In the research I did stumble over Meilisearch definitions and really liked how they are targetting this issue. Instead of using some special search jargons terms in Meilisearch you are just telling what you want todo with the data fields you are indexing / saving. So a simple configuration inspired by Meilisearch was shipped to SEAL by using simple understandable words likesearchable
,filterable
andsortable
. So the followingSchema
definitions was born:To be near as possible to PHP with the definitions the following types where supported
Text
,Integer
,Float
,Boolean
andDateTime
. This way all kind of different PHP Types are represented, with themultiple
flag every type could also be an array of data. And with a special type calledObject
even assocative arrays could be added.Strict vs. Dynamic Schema
There was nearly no discussion for me about going with a dynamic schema, I always wanted to go with a strict Schema like it is defined for databases. The first case was not all search engines supporting dynamic schemas. If you are new the search engines this means that you can push any data to it and by some kind of magic the search engines put that field into a specific type and configuration e.g. a string will by a text type in elasticsearch and so on, but if the first inputted string looks like a date it is a date field type and additional text will fail. My experience with this kind of mechanism was really bad and I only recommend it for quick prototyping. To go with a fixed and strict schema I wanted to prevent unwanted magic and add support for a wider range of search engines which do not support that kind of magic.
Creating a single interface to communicate with the search engine
After defining the definitions of the fields. The next and most important part was how the create the interface for the user of the library to communicate with the search engines. I'm really a big fan of @frankdejonge work with Flysystem, an abstraction for local and remote filesystems. It uses a single class and the Adapter Design Pattern to communicate with the different systems. That was the pattern we definitely can reuse for our abstraction. Another library which did also have an impact of the architecture is @doctrine, in the first implementation of SEAL I did go with a
SchemaManager
and aConnection
, which is very similar how the Doctrine/DBAL works. After some implementation of different Adapters I decided to split theConnection
class into two seperate classes theIndexer
and theSearcher
. Thx here to @wachterjohannes and @Toflar who helped me find a good way for splitting the read and write and so make things like aReadWriteAdapter
a lot easier. But back to the more important class theEngine
, which is responsible for providing a single interface for the end user of the library to comunicate with there different search engines. For this we added the following methods to it:The usage of
string
representation of theindex
make it easier for the end user, without any imports or loading they are able with an instance of theEngine
to add, delete, search and manage there search engines indexes. Internal theEngine
forwards theIndex
instance and so the configured fields to theAdapter
so that the adapter can work with it.Fighting the search engines
The main difficulty was to fight the different search engines mappings, schemas, field definitions to match into the defined Field with options with
searchable
,filterable
andsortable
. For example to make a field onlyfilterable
and notsearchable
I first thought its enough to index it in Elasticsearch as a Keyword. But still if you did search for the whole word it did still show up the document in the result. After some deep diving into Elasticsearch and Lucene I found out that I could achieve it by configure the fieldindex: false
butdoc_values: true
. This was the only solution I found for this kind of options on my side that Elasticsearch behave the expected way. The most easiest thing as our own mapping implemented the same way was the support for Meilisearch as it uses nearly the same type of configurations. For Algolia I first thought it is the same, but sorting in Algolia requires additional replica indexes. This is also why a strict schema is required for the Search Engine abstraction that we now at creating time of the Indexes which Indexes we need to create. So at the creating time of the Indexes for Algolia we create in the AlgoliaSchemaManager additional replicas which have the specific sorting defined. At search time we are using that replica and it returns us then the result in the expected order.Beside Elasticsearch, Opensearch, Algolia and Meilisearch I also later added the support for Solr (because used widely in the @typo3 community), RediSearch (personally a big fan of @redis) and Typesense (which did come up sometimes in my research on Reddit). With some kind of community help from the different Search Engines I could implement Solr via its Cloud mode and Typesense via some changes in the core mapping. The thing I could not solve a long time was the support for RediSearch. The problem there was that different DIALECTS did exist and using the latest DIALECT I was not able to make a field
searchable
andfilterable
at the same time. Another big issue was that filters did not work on fields containing-
which for example everyuuid
does contain. As there where 2 open issues open ad RediSearch repository since 2-3 years I thought I will probably need to cancel the support for RediSearch. After some Twitter conversersation about the Developer Experience of the Search with Redis CEO Rowan Trollope:Some nice people from the RediSearch team did push me into the right direction how we could still achieve the things needed at current state. Instead of using the same field for
searchable
andfilterable
we are duplicating the field and so we have for example a searchablecategory
field and a filterablecategory__raw
field. This is even similar how Elasticsearch his handling Text and Keyword combination of a field. This was create and we could close the RediSearch Issue.Being dependency transparent
With the splitting of SEAL into different packages and while the SEAL core even has at current state only a dependency to PHP 8.1 and greater. The different adapter packages could define there own supported dependencies like here. This way the abstraction is not hiding any dependency for specific adapters, this means if you want to use a specific adapter like elasticsearch the elasticsearch adapter will install you already all required dependency of it and are not hidden behind optional dependencies. Beside that it was for me very important to get things started quickly and so the Getting Started documentation shows also how you can fast get your favorite search engine software running with docker compose.
Implementing Filters
A todays search in my opinion can not exist without a support of some kind of filters. Why you maybe want to filter blogs by categories, a search today can get very complex specially in E-Commerce system. So it was a minimum requirement for the search engines supported by the abstraction that they support atleast some basic filters also. This make it possible to not only create a nice search on a website but also great overview pages with nice filters. So different
Conditions
where added to make filtering possible:This filters can then be used via the
SearchBuilder
which can be created easily over theEngine
instance:This way we got all together to support and communicate with the different search engines all kind of pages with searches.
Frameworks support
A standalone library is nice, but today I think is very important that we also provide an easy way to use such kind of library also inside different Frameworks. As a core developer at Sulu CMS my first choice of Framework Integration was providing a
Bundle
for @symfony ecosystem. This was also one of the easist for me to implement, even I think Symfony Bundles are the most complex things releated to other frameworks. But with Symfony new AbstractBundle class introduced in Symfony 6.1 it did make a lot easier as a Bundle is not longer splitted into 3 different classes (Bundle, Extension, Configuration). The next Framework of choice to provide an integration for was @laravel ecosystem. As I already had experience with such kind of implemention via my Schranz Templating it was possible for me to provide also the specific services for Laravel via an ownServiceProvider
. My next framework of choice was @spiral, maybe not that widely used but a very modern and for me a nice mix of Symfony and Laravel orientated framework with a very helpful small community. Via a Spiral Bootloader we were able to provide the configuration and services for our library to the Spiral ecosystem.For an easy configuration I did go for a DSN like configurations for the Adapters, which for example in Symfony are already used for Doctrine Databases or Symfony Messenger Buses. So via a change of a single Environment Variable another a Adapter could be used:
After writing the first parts of the documentation I was seeking for some people testing it out. One of the first ones giving a lot of feedback was @froschdesign which is comming from @laminas and Mezzio Framework. At that point I did not yet have integration for Laminas Mezzio Framework. But with the help of @froschdesign I was able to even provide for this kind of Framework an integration for. I personally a fan of its very simple and understandable Architecture of the Mezzio Framework. But the integration was a little bit more difficulty then I did expected, as there was not an easy way to provide services based on configurations. With the help of @froschdesign we did find a solution to go over a custom PSR-11 Container which did solve all our problems and we could so provide the integration. With now better knowledge about this kind of integrations and some help of @samdark I also was able to provide another Integration into the @yiisoft Framework Ecosystem.
This way the library could be used now easily in different kind of frameworks. For all kind of frameworks there were also CLI/Console commands created so they can via the Frameworks CLI tools also manage there indexes. For Laravel integration even Facades were created so the Services provided by the integration layer are also available over this Facades.
Writing documentation
One of the most important part I think for every library which is published is writing the documentation. As I had some special usecases for the documentation I did stick with the tools there with rst and python
sphinx
tool. The most important part of the documentation is the Getting Started documentation, the target while writing it was that everybody should be able to get into the library as fast as possible. So that documentation already should target the different Framework integrations, via the specialsphinx-tabs
extension, I think was able provide a good user experience for the [Getting Started documentation](https://schranz-search.github.io/schranz-search/getting-started/index.html. And a developer even not reading the rest of the documentation should be able to get all kind of other functionality together.Still a Introduction documentation was added to show the basic structure of the project and explain some basic terms used inside the documentation to avoid confusion. The parts were already rewritten to avoid some kind of confusion, still feedback is specially welcome about the documentation.
The last part of writing the documentation was then to document all kind of feature which the core library provides. This goes over the Schema Definitions, Indexing Operations and different kind of Search & Filter Conditions. Also the result of links of the whole research was added to an own Research documentation. Which should in future still be extended with any kind of interesting links about Searches, even UI & UX should be listed there.
Target of the project?
The target of the project is that it should be way to go to implement any searchable content in PHP over this library. As Core Developer of @sulu CMS, I'm looking forward to make this library the way to integrate searches not only in Sulu but also other kind of content management systems. As we already have a deep connection to some guys at @contao and @typo3, I'm looking forward to get there some cooperation together to make this library an abstraction which every kind of CMS or every kind of System and PHP Application can easily use. This means that even more integration layers should exist in the future, if you already have some knowledge about a System or Framework you like to have an integration for, let us know via a Github Issue. The same exists if you have knowledge about a not yet supported Search engine you would like to see in the abstraction be supported. But the library at all is not the only target of the project, the Research Documentation should even be a source for everybody working with Search Engines and should be a living collection about search engines and interesting links around it, if you have anything to share about search engines I'm very happy to add the link to this documentation.
ODM and Datamapper Support?
One of the most frequently asked question is if there will be support for directly store and read objects build on own classes. SEAL itself is designed to be the lower level library working with array data like a SQL insert. The ODM or datamapper package will be on its own and build on top of SEAL package. Building ODM based on SEAL is currently to early and at current state we recommend instead using a Serializer or Normalizer like
Symfony Serializer
which make the normalization and denormalization from array to object and back easy. Still for all want to join the discussion about the ODM/Datamapper package have a look at this issue.What is coming next?
The current next task is more about testing and getting feedback. With this first tagged version I want to get more people to test out the library and so get more feedback about the current implemention. Find out if there are still things which confuses, and checkout where the things maybe can be improved and what kind of errors maybe can be avoided with clear documentation and namings. So the focus is here provide the best possible developer experience which we can and make this library beginner friendly for even very unexperienced developers which where not yet in contact with any kind of search engines. If anything in the documentation or library confuses you please feel free to a issue Github Issue or Github Discussion for it.
I'm really looking forward for all kind of feedback. At this state I want really thank you for all which did take the time testing the even unreleased version yet.
Already want to thank here @froschdesign, @butschster, @samdark, @vjik, @Toflar, @wachterjohannes, the Reddit PHP Community and my Twitter followers and all others who did give already some feedback and or provided helps with different Framework integrations.
Sincerly looking forward for your feedback,
Alex
Full list of released packages provided by the Schranz Search project:
This discussion was created from the release First Release of SEAL (2023-05-16).
Beta Was this translation helpful? Give feedback.
All reactions