-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Official helm chart released by the Airlfow Community #211
Comments
Hi @potiuk For me "official" helm chart lacks several features which makes transition not feasible for me. I really like this chart because it offers turn key solution does not require addition configuration steps like the official one. Currently I miss the following features from the official chart:
Adding
|
Thanks, @andormarkus for your comment, we definitely won't support I understand why you need it but this is not the right approach. Changing resources like that is not correct in my opinion. For installing the pip package -- you should do it when building the docker file itself -- not on the fly when installing using chart. Once built, the Docker Image should be immutable. This is very important for Production systems when running critical workloads. For Pools (and even Connections & Vars): You should use the API: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/post_pool And lastly, for Users Management, I would generally recommend using LDAP or OAUTH or other auth-backends. |
Yeah. My opinion is very close to @kaxilss - the "extraPipPackages" and post install hooks are not really following the way how productionized installation should be done. In my opinion (and that's an opinion only) they introduce more problems than they solve, because you add unnecessary variability and delays where it is easily done by just building a custom image, which is both - easy, straightforward and recommended. The official image by Airflow is really a "reference image" and there are several ways you can take to have your own custom image - either by customizing the build process, or extending the image the usual way all images are extended (with FROM: command). As @kaxil mentioned - for Airflow 2.0+ we have now APIs and even official python client that can be used more easily (and if there are any more features needed, we are happy to discuss about adding them). Again - this is my opinion, not a source of truth. I think it's up to @thesuperzapper and other maintainers to decide what to do - if they want to continue maintaining the helm chart and support all the potential issues, or incompatibilities coming from future releases of Airflow - this is fine. Apache 2.0 license perfectly allows that, as long as it clear it is not a community-managed project (in the past people were confused, whether the chart is a community managed project or not). And if not, then we could help with providing transition paths. Happy to do so. Just as a comment - one thing that we might have to take a look at is use of the Airflow Logo and Apache name in the project though (this is a PMC responsibility to take care about it) so that it is not easily confused with Apache-managed project (https://www.apache.org/foundation/marks/#guidelines). I have no idea how whether it is a problem for the ASF to have similarly named charts with Airflow logo and Apache name in both in the way they can be easily confused (I made the mistake myself actually). I think we might ask for a legal advice on ASF on that whether this is any problem for the foundation at all (I believe ASF is very cautious about ASF brand and it's important that the brand is used properly) - but before we go there and ask, I'd love to hear what the long term planning is for the helm chart from the maintainers - so that we know whether there is anything to ask for. |
We are deploying everything with Terraform (AWS, Kubernetes, Helm) every helm chart is deployed with API: Pools: Connections: Users: Variables: |
@andormarkus Aah .. yes you can override Default Pool slots via Environment Variable, documentation is missing on that page -- I will add that right now. Use Or update configs via Good Point regarding "Rest API and Terraform Resources" -- From a quick search looks like Rest API Terraform provider from MasterCard is relatively popular. Example Usage: https://github.com/Mastercard/terraform-provider-restapi/blob/master/examples/dummy_users_with_fakeserver.tf Or this one: https://registry.terraform.io/providers/fmontezuma/restapi/latest/docs Will that work for your use case? |
Actually there is a terraform provider that one of the PMC members of Airlfow wrote, using the new REST API: https://github.com/houqp/terraform-provider-airflow We might consider making it also part of "community managed" resources. Maybe you can check it out? |
Hey @thesuperzapper - WDYT ? Do you plan to encourage some kind of transitio (and can we help) or do you want to support your helm chart in the future? We are preparing a talk for Airflow summit about the "state of Airflow with K8S" so I think it would be great to know what we can tell the Airflow users. For me both scenarios are OK - having community-managed helm chart and this one, as well as helping people with transition to the community one, but I think it would be great to agree on the direction. |
I am happy to maintain this chart for as long as it is still used widely, which will probably be for a while yet as:
Personally, I would love there to only be 1 chart in the future (if this possible given the different opinions), but we are probably a wee-way from that yet (see above comments). However, once the charts become equal (in features, and usability), I will happily start pushing users across, and I expect that would be a natural process anyway. Note, I'm open to contributing to the "official" chart to help move this process along, I'm just not sure how best to start. |
Perfect. That's cool. I think we can close the ticket then and move to the 'collaboration' mode. I think we are more than happy to welcome the PRs to the chart - as long as those are small fixes - it's just a PR to open, if this is something substantial - just write a message to devlist, or open a discussion (if this is something that might require brainstorming) or create an issue in the Airflow Repository. This is the best way to start. From the quick look at the differences you pointed out:
All of this has to go through the usual way we do it in Airflow community - propose, discuss, get to consensus, if consensus is not reachable and change is significant - voting. The absolute normal process for an Apache project. And it would be great if you actively participate in it - knowing that we - as community - might reach different conclusions and directions after deliberate discussion. I will also reach out to ASF trade to check if there are any issues with naming of the charts - I do not think so, but I want to double-check for sure if there are any guidances. I will post a JIRA ticket about it here when I open it. |
Thanks for keeping me in the loop. I pretty much agree with everything that has been said. (don't know if I have still a word to say, as I do not maintain the project anymore factually). The end goal is to transition everybody to the official chart, once it has all the features required. Sure it is not optimal to maintain two charts, ideally, the official chart would get all the missing features from this one and the choice will be natural for everybody. The differences between the two of them might be quite tricky to document, maybe we can start a new markdown document (MIGRATION.md?) in here that will evolve over time with the status of this chart and the status of the official chart. It would do:
I also think we can put a little emphasis in our README to document a bit more clearly the position of this project:
But we may need help from the airflow community for this documentation effort. |
Happy to help with that. |
Hey @thesuperzapper . I looked at th description of ASF policies and I think (without reaching out to the trademarks team) that you should change the logo/description to include "Powered By Apache Airflow". It is nicely described here: https://www.apache.org/foundation/marks/contact#products
And the https://www.apache.org/foundation/press/kit/ contains information and tools on how to create "Powered By" logo using the original project's logo. This is to avoid the confusion which project is the "Apache" one and which one is just "Powered by Apache". Are you ok with changing this accordingly? Or should we reach-out to trademarks@a.o to clarify ? In the latter case I will need your e-mail (you can contact me via my email on my Github Page in such case). |
@potiuk I have replaced the Airflow logo with the "powered by airflow logo", and changed the description to the following (which should remove any possibility for confusion):
|
Thanks a lot @thesuperzapper ! Would it also be possible to change the logo here: https://artifacthub.io/packages/helm/airflow-helm/airflow |
Thanks, @thesuperzapper and @gsemet for your comments. Firstly, I appreciate all the efforts in maintaining this chart for several years and helped the community -- so just wanted to says Thanks on behalf of the community and Airflow PMC members. Secondly, just want to add some notes to the comments regarding feature-parity:
The Docs for the stable version of the official Helm Chart are at: https://airflow.apache.org/docs/helm-chart/ It will keep on improving and will love any and all feedback on what we can include over there.
ACK, any help there would be 100% appreciated. You and team have done a great job at that.
The official Helm Chart does support old versions of 1.10.x too -- please do let me know if you find any issues while running any 1.10.x version
Pools -- maybe, needs discussions. But for all the others we don't need to have post-install jobs, since they can be done via Secretes Backend including Environment Variables.
This is definitely a NO-go for the official Chart. All the dependencies installation (system + Python) should be done in the docker-image itself to make an immutable image for the Helm Chart to consume. It does not make sense to install system dependencies in docker and install Python via Helm Chart. For migration, users can just install those deps in the docker file here. So we have a clear migration path. Maybe we can add docs for it.
We do support that, in fact we allow users to use any git-sync parameters via Environment Variables. Please correct me if I am missing something
Would love to have a Migration guide in both places -- https://airflow.apache.org/docs/helm-chart/stable/ as well as this repo itself. PRs are very welcome |
@kaxil Does airflow support updating the Secretes Backend from dag? Based on my limited experience I can not update the value like I can do on a "local" variable. |
Yes if using the "built-in" backends like Metadata DB. No, if using an external Secrets Backend like Hashicorp Vault. We added the concept of "Secrets Backend" in Airflow to allow users to Manager "Secrets" in an external system that is dedicated to just secrets like Vault which allows rotation of secrets, allowing read-only mode so sys-admins can control modification of it. If you want to just use a "variable" to modify in the DAG why are you setting it as Variable? You can just use an Environment Variable inside your DAG to override the Value. BTW while the secrets the righ-fully read-only -- if you have permissions to edit those external systems -- you could use VaultHook (same as other Hooks) to modify the secrets: https://airflow.apache.org/docs/apache-airflow-providers-hashicorp/stable/_api/airflow/providers/hashicorp/hooks/vault/index.html#airflow.providers.hashicorp.hooks.vault.VaultHook.create_or_update_secret |
That's not encouraging to hear, since this means that the official chart is not useful for me at all. I do NOT want to maintain a build infrastructure + Docker registry just to do a Maybe you misunderstood how extra package installation works in this chart here? There's really not much difference between git sync and pip installation for example. |
I think this is indeed up to discussion on how to answer some of the needs. I don't think the way in this helm chart is good. It has it's own challenges, but maybe there are other ways similar behavior can be achieved. I agree with @kaxil that the image should be immutable and this is specifically why I have designed the official docker image to be both - extendable and customizable (and I plan to add one or two more features for example automated installation from requirements.txt if they are placed in However I see that for hobbyist usage or in case when you just want to try things out and test the waters, an easier way of installing the packages, can be achieved. And you do not need even helm chart specific support for that. I see that more of answering similar needs as the Quick-Start of Airlfow is done: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html. We can easily built in a custom init_script that can actually perform the installation of As of recently we have the way to add custom init containers in the helm-chart, and we could also easily add such init script to the image and describe how to enable it for your helm chart if you want to do it. We could even have it as an option (clearly marked as development/test one) in the chart itself to use such init containers and clear path on how to migrate to a "serious" solution (i.e. how to easily build your own image using the very same @MarkusTeufelberger -> would that work for you and @kaxil - WDYT? |
@MarkusTeufelberger The official chart allows adding So you can run your What we don't want to do is "not recommend" doing that for production usage. I think the official Chart is now highly extensible, so if you know what you are doing, by all means, you can add all many more things in those Hopefully that will take care of your case. We can even add docs around that for "hobbyist" and make it clear that do not use it for Production -- that should hopefully take care of case @potiuk mentioned. |
To ensure a full reproductibility of the dag, the good option would be to use a lock file (using pip tools for instance). Because I agree, burning the dag image is costly for users, and relying on a pip install during the initialization of the container does the job if all dependencies are locked in a way or another. |
That's exactly what Example requirements.txt:
You can even use all kinds off ~= or > if you want. With 'pip freeze |
Usually you put loose dependencies in requirements.txt because updating it is quite boring. Pip-tools does the pip freeze for you. the best would be to use poetry/pep517. I do not use pip directly even the latest one, I don’t see how this would change anything if you do not have another “lock” file. i do not speak about things I don’t know, I just advise about things I do everyday on dev environment that have heavy reproductibility requirements. |
Yes it can. We use it all the time in airflow: Simply run Let me explain where I am coming from - so that you could understand the context better - I am not just dismissing Poetry (and pip tools) are - unfortunately - very opinionated in the way they work and do not handle well the way Airflow works - because Airflow is both library and application. Both poetry and pip tools made conscious decision to handle those two cases separately - either "open dependencies" for library or "fixed" dependencies for application. I used to believe this is correct (and it is in most cases) until I entered Airflow world ~ 3 years ago. Back then I also wanted to use poetry but it turned out it is not possible and it is not even today. Airflow (as application) has to have "golden" set of constraints (that allow it to be installed even if new dependency versions are released) but it should also (as library) allow to upgrade those dependencies when new versions are released without upgrading Airflow itself (so that people could write DAGs using newer versions of the dependencies) . Airflow when installed with all providers has ~ 500 dependencies (including transitive ones) and it makes it one of the most complex, when it comes to dependencies, projects in PIP. That's why we generate and use constraint files to install Airflow and the only tool of the three (pip/pip tools/poetry) - only PIP can handle constraint files. Another project using constraints in similar way to ours is openstack https://github.com/openstack/requirements/blob/d0b389d840c75216ab2cc10df849cb98990b1503/tox.ini#L10 There are two open issues currently in Poetry (partially inspired by the way how Airflow and Openstack need the constraints):
The original requests mentioning Airflow (closed long time before I commented on it) is here python-poetry/poetry#3225 and I explained there why constraints are needed. Until constraints are supported by other tools, I can only recommend PIP, unfortunately. |
BTW. We have a work in progress to move to PEP517 apache/airflow#16144 (while keeping PIP/constraints). |
Ok, if your answer is pip freeze > requirements.txt, you don’t know what poetry and pipenv do and more generally what lockfile are for. They are here especially because pip freeze is not enough. there is nothing special about airflow dependencies. Basically every serious Python application needs to freeze the requirement somehow. I manage a dozen apps at work with real, reproductivity requirements, and even pep517 is not enough actually. Pip freeze is just the first level of doing it, and I can bet you (your airflow user) will do it wrong at one point. Pip-tools will do it slightly better from the user workflow (ie, requirements.txt.in with loose requirements and requirements.txt with the result of pip freeze), then you can use pep517 process like airflow. I got kicked out of my proposal to integrate a lockfile mechanism in airflow back when pep517 wasn’t released yet, and I am kind of sad to see it is still a thing in 2021, where we have an official and efficient way to defining, updating and freezing dependencies like poetry and more generally pep517. in short, if the Python dependencies can be managed by a lock file in the dag (frozen requirements.txt, Pipfile.lock or poetry.lock), this would be equivalent of burning the deps in docker image, from the reproductibility point of view. So yes, installing dependencies from a requirements.txt is a good thing to have, but if you want your user not to do big mistakes, use pip tools or poetry. |
Can you please explain how you would want the user to install airflow from PIP with the fixed set of requirements using poetry and allowing the same user to upgrade the dependencies later? I would really love to see that. Please. |
Yeah that sounds good (but it will also install packages evertime pod is run). @MarkusTeufelberger just to reiterate installing pip packages via initContainer is a BAD solution, it means your packages are needlessly installed everytime the pod is restarted, that is BAD. Convenience does not mean a GOOD solution. Example Issue: #104 This is why installing them once in your docker file is a good solution. Anyone debugging tons of Production issues will tell you that. Please trust us on this if you can. What I proposed in the last comment is because you know what you were doing and wanted to continue doing it. We don't want users to do that unless they know. |
Yeah. Fully agree @kaxil and just to reiterate @MarkusTeufelberger - I cannot imagine this to be used in production. It has a ton of problems when running at scale - the big drawback is that when it grows, it is much slower to bootstrap. One thing for example that you might face soon will be automated, uncontrollable upgrade of airflow which you might not realize. New providers (will be released this week likely) will be 2.1+ only - which means that if you install a new provider this way, it will .... upgrade airflow together with it if you happen to use airflow 2.0. Those are the kind of problems that you DEFINITELY do not want to find out about when your container start already and operate on live database. That's why building your own image is a much better solution. This change is similar to what was used in the helm chart of @gsemet and @thesuperzapper and it is only useful for quick testing (in the doc update in the PR I made it very clear that production-ready solution is to build your own image). But also it is a bit "better" than the helm solution:
|
To be honest, all I need is a mysql client library and a helper API for a REST-API of a proprietary product. I'll keep your warnings in mind if I actually need to install anything with airflow in its dependency graph.
Unfortunately usually Dockerfiles are not deterministically built and unlike docker-compose, Kubernetes does not have a native path from Dockerfile --> Docker image --> running pod so it is up to every user to come up with their own build process for this scenario. Forcing (or even recommending) every user to continuously maintain a fork of an official image is a BAD solution in my books. #104 can be solved by installing dependencies into a volume and attaching that to pods instead of using init containers, but that's already a solution suggested there. The thing I want to have is an unmodified upstream container that I do not have to modify on each release with additional packages my plugins need installed and available. For the record, installing dependencies at pod startup time is also far from what I would see as an ideal solution to this problem ("I need libraries |
Just FYI. No fork is ever needed. The official Dockerfile is super versatile precisely for the reason that you should not have to fork it. It's specifically designed to handle two scenarios though:
See the https://airflow.apache.org/docs/docker-stack/build.htm The problem with Airflow is that you cannot have single 'binary" upstream image to handle all cases - because airflow deployment has so many varants (more than 60 providers you can choose from) + many people have their own custom requirements. The case 1) is super simple, you just have your own simple Dockerfile with The basic premise of K8S deployment is to use internal registry and images. You won't escape it no matter how hard you try. Each K8S installation already has internal image registry. It's part of K8S architecture. And if you have locally built image you do NOT need additional registry. You simply build image locally, push it to the registry of K8S and that's it. Same as you install Helm Chart updates. This is just another command to run. You Maybe that's where your "non-trivial" understanding is from is that you think you need something else. No you don't. You already have everything you need the moment you have K8S installed. For example when you use 'kind' as your K8S cluster, those are literally two commands:
That's it. You are done. Every K8S cluster deployment has similar capabilities. I am tempted to name it 'trivial" actually. Case 2) is more complex but also more powerful. The official image with "customization" is WAY more versatile. And the whole premise with it is that you can build your very custom image without forking it at all. It supports out-of-the-box:
All this is available in the image without the need of keeping the fork. You just have to place your own custom files in "docker-context-files" and pass the right build arguments. Does it look like, you can do better than that? I seriously doubt. If you would like to use K8S, I think you should simply use it as intended. The dynamically updating image while running is simply abomination. Something that is unnecessary and harmful. |
It seems you're mixing up a local image cache of some CRI implementations and a container registry? Otherwise, please point to the documentation of the internal registry component on https://kubernetes.io/docs/concepts/overview/components/
Mine don't even use
By "forking" I meant "taking on the responsibility of testing changes and deviating from upstream builds + release artifacts". "Custom image" kinda implies that. Option 1 just takes your version as the base, Option 2 means I need a completely custom image build process that might or might not work at all with no way of getting any support from upstream. It means that I either add a layer on top of the official image with my libraries or I completely rewrite the way the image is built using the way the official one is built as a guideline. Both will result in an image that's 100% unique to my process and that will need a rebuild whenever upstream releases either a new image or software version. There's no documentation on how to do that in place (because it concerns something outside of both https://github.com/apache/airflow/blob/f47e10c3885a028e7c45c10c317a7dbbff9e3ab9/Dockerfile#L326 alone already makes your released image no longer reproducible easily and if someone goes for your "case 2" approach, they can run into real trouble.
I really wish that they would. It would make a lot of things easier. Now try your example with a more production-like cluster deployment such as https://www.talos-systems.com/platform/talos-os-for-kubernetes/ - good luck.
Passive(?)-aggressiveness aside, another alternative would be to vendor libraries with custom plugins (check whole libraries with the plugins that use them into git in a state that they can be imported from that folder), so they are "installed" using the git-sync mechanism rather than upon startup of the pod. It is something that I am not too fond of as a solution, but it would also mean that there's a versioned and stable instance of a dependency available that can be managed outside airflow itself. I'm just not sure if that's a better (or intended) way to add dependencies while sticking with unmodified upstream images. |
Of course. If you watch my "Production Docker Image" talk from last year's summit - https://www.youtube.com/watch?v=wDr3Y7q2XoI it's where I specifically talked about "containers" vs "docker" and try to pass the message. But - I failed and went back to using I already stopped correcting people who are using But yeah. You can use kaniko, podman or others to build the image. That was just an example. No more, no less.
Correct. Thanks for pointing this out. This is the internal cache I referred to. Indeed it's not a fully-fledged registry. Which does not change the fact that you do not need external registry and you can simply load the image - especially when you have K8S installed for "test" purpose (Again - I cannot imagine anyone trying to dynamically update their image in production in "fully-fledged" cluster. You can contest that and dislike it, but this is simply how it works. If you chose K8S for deployment, I think you should live with consequences of the choice (or maybe try to change it by submitting proposals to K8S or by changing K8S to something else). Both kind and minikube support the "load" command to make your image available for the cluster (in the cache, not registry as you rightfully corrected me).
Here is the way how you do it in Talos: https://www.talos.dev/docs/v0.7/guides/configuring-pull-through-cache/#using-caching-registries-with-docker-local-cluster - it is actually even easier, rather than loading the images to the cluster, you configure Talos to load the images from local docker cluster cache. So all you need to do is to build the image. Can be used on Air-Gapped systems and will never reach out to any registry. Very handy actually. No registry needed. Which was my point actually.
Nope. I am just straightforward. K8S and docker were never meant to add extra "software" layers after images are built. Trying to do that "live" is abomination and I am happy to repeat that. Syncing/installing packages using Git-sync is a terrible idea as well. Container "layers" are the way to add new software on top of existing layers. This is how Containers work. This is what they were creating it. By installing software additionally inside Kubernetes, rather than adding them outside (by building containers) is terrible idea as well. Using Git to synchronize packages is really bad idea. If you have DAGs (which are source code), yeah - using Git-sync is great idea. But using it to synchronize binary packages to install, is - again - abomination. And I am not afraid of that word.
Documentation for the process is here: https://airflow.apache.org/docs/docker-stack/build.html#customizing-the-image It's rather comprehensive but imperfect (and we are improving it continuously and make the process simpler and more stable. The two steps left:
And I am going to add full backwards compatibility checks when this happens. |
And a comment here:
Yes. Full reproducibility is a non-goal for me (and for the image). The much more important feature for me is that whenever we build the image from scratch we get the latest security fixes with it. Not a 'reproducible binary copy' of what was there before. Plus all our build process has automated tests that test the built images - we have a sequence of automated tests in CI with every single Pull request that we run whenever we build the image, which checks a number of things. And yes this is a deliberate decision. |
Just a bit curious since I see that the official helm chart is going full Astronomer and we ran a POC a few months ago, I would like to check if one really problematic thing has been fixed since then in the deployment process. Since you build an immutable image for all the components at each change, whenever we were doing any change, being on the scheduler, webserver or worker, all the components were being redeployed/restarted. It was not that much of an issue losing the UI, just bad user experience in my opinion (at least the rocket was cool looking), but having the scheduler being restarted and not scheduling any task each time you do a change that is not even scheduler related (just to deploy a new DAG!!) for me was a real aberration and I don't understand how it makes sense in a production environment with time critical workflows. Is it still the behavior proposed by the official helm chart today or has a different strategy been put in place @potiuk @kaxil ? |
re: Official Helm Chart @talnicolas The Webserver does not restart for the official Helm Chart we have a You can also use git-sync for DAG deployment or persistent Volume -- so DAG deployment is separate. re: Astronomer Platform & restart The Webserver does not restart over there too anymore for Airflow >= 2.0.0 :) It also uses a |
Also to add to what @kaxil wrote and relate to your comment, I think deploying DAGs by running "helm upgrade" command to deploy new DAGs in production sounds super-weird to be honest. Helm Chart should be used to deploy "application" not to deploy another version of DAGs. DAGs are not "part" of airflow application. There are other methods - specifically git_sync that @kaxil mentioned which is also part of the official helm chart - there deploying new dags happens in the background without restarting pods. |
All I can say is, "Ugh". After all this time an 'official chart' is created. Those of us who wanted to deploy to k8s sooner than later found this chart and have been using it in production for quite sometime. I agree with many of the arguments provided. I currently use the pip install of extra packages for differing providers. It works for me. It does take a while for the pods to start. Maybe 23 seconds. But after that, it's running. The argument that a image is immutable is not a strong one IMO. Images are immutable. But a running pod/container is not. Whether it's baked into the image, or not, is debatable. And I think both sides can provide some really good arguments. Personally, I think users should have some flexibility being that we are discussing OSS. The fact is @thesuperzapper provided a solution that was pretty much abandoned and now there is a new release. I feel like it would have been better to approach @thesuperzapper many moons ago to see if he wanted to collaborate on an official release. Now there is schism, and it will likely be difficult for some of us to migrate to the official release. Color me super excited with trying to figure out how to migrate if we choose to. |
apache/airflow#16170 is the one that adds the pip install at pod start feature to the "official" chart, so that should already be fixed in the most recent release or an upcoming one (a bit hard to say, as https://github.com/apache/airflow/releases is a mess and it is hard to find anything there). |
Diversity of solutions is a value on its own. There is NOTHING wrong with having two solutions which are parallel. We discussed it many moons ago @cocampbe - you can search the airlfow Devlist and we decided to go with the Astronomer-originated solution officially supported by Airflow community (please search the archives). And it will continue to be so, including 3 people from Astronomer working hard few last months to get it super-robust, handling many cases and most of all fully end-2-end-tested (including full integration and unit test support - which is a HARD requirement for everything we do as Airflow Community. Just go and see how "robust" and "future-proof" it is. And we are not ad "odds" with @thesuperzapper. We actually collaborate to some extent - including @thesuperzapper hosting Melbourne chapter of Airlfow Summit. Heck - if you read the thread from the very beginnnig, we even offered our way in helping people to figure out ways to migrate to community chart and see whether we can figure out some "other" ways of handling their needs if we don't agree with the way it's done in this chart. The PR #16170 that @MarkusTeufelberger mention is a proof of that. While we think using "pip install" for production-grade image during PIP start is plainly wrong, we recognise the "development" and "test" value of it, so we added it (as _development-targeted option with a clear warning printed to the users that it should not be used in production (which you can of-course ignore but then you will not get community support for any issues that might result from ignoring it). It's even more than that in fact. Because we not only enabled this development mode for Helm users, but also for Docker-Compose users (because we've implemented it in the official Airflow Image, not in Helm nor Docker-Compose). And BTW. this is totally fine if you pick the @thesuperzapper' s Helm. Feel free. No-one is forcing you to use the official Apache Airflow-Community managed one. You just have to be prepared that Apache Way driven Airflow Community will not support your problems if you have them and the community built by the @thesuperzapper around the other chart will do it. It also means that we will make different choices and decisions (this is one of the reasons we decided to go with our own approach because we did not agree with some design choices in the helm chart of @thesuperzapper). And there is nothing wrong with that. It's OK to have different opinions and make different decisions as long as you discuss it and weight those other options in, Just let me repeat it again - we are not Helm-centric, neither even K8S-centric. We are Airflow-centric. Helm for us is only one of the ways how Airflow is deployed and it's very narrow view for us to limit the choices we make to only that part of our users. That what is big part of our "product" thinking. Our approach has te be much broader that choosing one deployment option over the other. I understand it "looks" like the If you wish to learn more about that - why we are doing it, how we took into account result of Airflow Survey that we run every year and what is the philosophy behind it, I heartily invite everyone who watches the discussion to watch our Airflow Loves Kubernetes talk with @kaxil at the Airflow Summit in two weeks. Some decisions we've made will be easier to understand then. |
@MarkusTeufelberger - I think you have that impression,, because you are looking in wrong place. "Releases" is simply a snapshot of whatever we already released (including providers). This page is automatically prepared by GitHub and with the amount of releases we have (including providers) it does look overwhelming. That's why in the official Airflow documentation we publish the Changelog which is much cleaner : https://airflow.apache.org/docs/apache-airflow/stable/changelog.html and each provider has it's own separate changelog (for example Amazon's one is here: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/index.html#changelog ) However, If you want to see what's coming. you have to look at the milestones: |
@potiuk I have apparently poured salt on a wound and I apologize for that. It's just frustrating. |
No salt :). No wound :). Just wanted to provide the full context. It's just my way of over-communicating when I see that some assumptions simply miss some bigger context. Really I am not frustrated or angry if you read it this way. I think if you visit devlist and issues more often you'll see I am just trying to be helpful (and apologies if it looks a bit 'patronising', I know it might certainly look like that but It's really not intended to be. |
For those following at home, I intend to maintain (and even add new features to) this chart as long as it's used. A little friendly competition often breeds innovation. I have updated the README with a section highlighting that there are two charts, and outlining the approach that this "community" chart is taking. However, let's make sure we refer to the charts by consistent names:
|
To be honest, I think "community" is quite a bit misleading name and it introduces a lot of confusion. Everything Apache Airflow releases is a product of the "Apache Airflow Community". And at the Summit's talk Airflow Loves Kubernetes we are going to talk with @kaxil that the chart is supported by the Apache Airflow Community. So I think we are going to refer to it as "Official Airflow Community Chart" from now on to be very precise this is not a product of some mysterious group - but the very same community that produces Apache Airflow. |
Hey @thesuperzapper - maybe we can propose another name for your chart that we can use, I really find it so confusing - several times I saw a question about the "community chart" in Slack and after asking for details (Which chart ?) I was pointed out to the https://airflow.apache.org/docs/helm-chart/stable/index.html . I really think people find it confusing because "Apache Airflow Community" is heavily used (and rightfully) to refer to the community build around Apache Airflow, which is built according to the Apache Way (Apache Software Foundation is the owner of Apache Airflow). And the "Community over code" is the ASF motto, so "community" as "distinctive" name is a very bad choice. We could use 'Non-official" name but it is obviously negative (and there are other non-official charts like Bitnami one that is also often used). So maybe some other "name" that we can jointly refer to would be great? Let me think about some proposals. |
Just to explain - I also seek the right name here as we want to talk about it at the talk as an alternative and since it's going to be recorded and likely popular (my last year's Airflow Prod Docker Image talk from last year got whooping 7K views), I think we need to come up with a consistent naming that will be used by everyone. I am thinking about some "positive" and "distinctive" names. I really do not want to leave people with an impression that there is something wrong with the "other" chart.
I like the last one most, because it's just a little modification of the current naming used by @thesuperzapper but one that brings clarity when it comes to distinction between the two, keeps the "community" part which is important for Matthew I think and is pretty informative. WDYT? |
My humble opinion, as the original author of this chart: « user-community chart » is fine and not so far from what we have know. |
I agree that When introducing it it's probably important to highlight that it it used to be the |
Cool. let's stick with that terminology then :). Surely - when we will talk about the Chart we want to tell shortly about the history and we want to make sure the credits are there for all the years when there was no Apache-Airflow-Community supported Helm Chart. I think many users might choose to stay with it and that's great if we can refer to history and explain the context, as it might sometimes (as in the discussion above) be not clear where the distinction originates from. |
Hello @thesuperzapper
The Airlfow community have finally released the "official" helm chart. Documentation here: http://airflow.apache.org/docs/helm-chart/stable/index.html
Release message/thread here: https://lists.apache.org/thread.html/r131d839158b8a7a92a7813183cae30d248be9e330ea2faaf9e654970%40%3Cdev.airflow.apache.org%3E
We've been discussing before about community providing support for the helm chart, and the community decided to continue with the helm chart donated by Astronomer. This is finally now fully available, released and will be maintained and supported by the community.
I wonder if you would like to continue having the separate chart and maintain it, or maybe you choose the path of deprecating it and helping the users to transition to the the "community managed" official chart. Having two charts in the
I think we are open to help with the transition if you choose that route and possibly even help with developing a transition documentation, how to switch etc. if you choose that route.
Let us know what you think.
The text was updated successfully, but these errors were encountered: