-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-1433] Implement FAB-backed webserver (WIP, Illustration-Only) #2536
Conversation
d81e477
to
4ccfabb
Compare
+1 :D
Kalpesh?
Would prefer cleaning up URIs rather than keeping everything under admin. |
Also, looks like tests are failing with:
|
Overall, LGTM. Following the approach we discussed taking, so no real surprises, which is nice. |
There was some discussion before about moving the webserver out into it's own python package (that talks to the airflow core over the API). Unless we can somehow make www2 work like www does now cleanly by default then I don't feel super great about having to maintain both www and www2 so maybe this is the right time to flesh out the API and create this separate package. For what it's worth I definitely see Airbnb (and other companies) switching to this (or something similar) in the future, in this case we(AirBnB) could only help support the www2 model and others who still wanted to use the old www model could support that one. |
@aoen Thanks for the feedback! (was trying to cc you on this PR but wasn't sure of your GH handle) I am all for more separations of concerns between airflow core and the webserver, and ultimately for fleshing out the API to serve both CLI and UI. However, given the current state of Airflow API, we still have a long way to go. For example, we probably want some discussions around:
On top of the discussions, implementing them will still definitely take some time. Putting RBAC on hold until API is done implies a major delay in introducing this feature. That's why I propose that we migrate to FAB first, implement RBAC, and while this going on, we can work on freshing out the API. FAB exposes a list of JSON REST APIs on the models, so that could help with implementing APIs as well if choose to leverage it. I understand that one of the major concern is maintaining two sets of wwws for a set period of time. I agree it’s not ideal, and I’m not a fan either, but assuming the community is okay with eventually deprecate the old www (or at least putting development effort on it to a halt), then we can aim to make this dual-www development process fairly minimal by merging it when it’s closer to being ready (not to mention there are still development needed on FAB to reach feature parity, it’ll likely take some time before it’s actually merge-ready). |
+1 Given that we're pretty close with the FAB migration, I would prefer to get www2 going, and start having people use that. The next step can be REST API work, but I don't want this project to explode. If you recall, all of this began with RBAC security, then we tied the FAB migration to it, now we're discussing REST API. It's too much to do at once. Let's get FAB done, and focus on the REST API subsequently. |
@jgao54 No React.js. It is deemed incompatible with ASF Projects due its patent clause in its license. imho www2 should be a separate package. We could just depend on it or say it is the recommended UI and then switch to it? |
Separate repository? Or same repository, separate package? If separate repo, within Apache, or outside? |
maybe for now a separate repo so we can iterate a bit faster? (No jiras etc?) |
I'm trying to tease out why folks are so keen on a separate repo. Things I can see are:
I don't see (1) as that compelling, actually. Ultimately, I don't see anyone else contributing until this stuff lands in master and people start using it anyway. Even if they did contribute, the overhead is going to be PR discussion (just like this) anyway. JIRAs are cheap. What I'm trying to avoid is the demand that we do one big bang commit. I'd rather do things iteratively. If that means we go a bit slower, I'm willing to make that tradeoff, especially if it means we won't end up on some forked version floating out on another repo. The concerns that I have with a separate repo are:
Ultimately, my biggest fear is (1), though. I don't want to spend a bunch of time on this and have it floating out in a non-master, non-released repository. I have seen this happen with other large efforts (e.g. Kafka transaction support). Unless you do things incrementally, and in master, you run a risk of ending up with abandonware. |
@bolkedebruin @aoen Any thoughts on the above? |
@criccomini My main concern was 2). I think some of the RBAC logic could still live in Core potentially, my main concern was forking the webserver code. @jgao54 If there is urgency in having the RBAC internally, you could fork it internally and then work on the API-zation of Airflow afterwards before merging. I think the API-zation itself is quite a big task unfortunately, so I'm not sure comfortable taking it on as tech debt. Curious what @bolkedebruin thinks. |
@aoen at WePay we've been trying to avoid forking Airflow as much as we can because of the additional maintenance that are introduced with a fork. It sounds like the biggest concern for folks is maintaining www2. If that's the only concern, we could potentially make all the changes in-place whenever the framework diverges. It's slower, takes longer to rip out the old version later, but will avoid separate packages. However, if the goal is separating view out of Core forever, then perhaps FAB is not the most ideal choice. |
@aoen it sounds like you're saying it's a hard requirement that the REST API must be resolved first. If we proceed with the REST API route, I don't think that FAB makes sense to use as a front end framework, as the largest benefit that it provides is the model view stuff and integration with SQL alchemy. If we move away from that to a REST API, building the UI client-side via a Javascript frontend seems preferable to me. May I suggest that you and @mistercrunch have an in-person discussion to get on the same page with your preferences. We met with Max a week or two ago, and agreed on the above approach. We then published the expected outcome on the mailing list, and implemented it accordingly. Raising your concern this late in the process is causing unnecessary churn. As Joy said, we're not excited about forking Airflow as it's unhealthy for both us and the community. If you'd like to talk in person, Joy and I could do a video chat, as well. Same goes for you, @bolkedebruin. |
I woud like to have a pragmatic approach here. I think the FAB approach solves a lot of issues and maintainability. To me the Rest API is mainly for integration purposes with other services and for security purposes (e.g. the link between tasks and scheduler/db). So I think they can live side by side, at least for some time. We just should try to keep logic duplication to a minimum. Thus in case where the Rest API makes sense, the UI should use this. Does this make sense? |
If there is a plan to work on this, I would be happy to volunteer in some of the development/testing/etc. I do think this is an important change to airflow that would allow for much more adoption and maintainability. I know within Pandora, there is a huge need for this, since we have regulations for being a publicly traded company that certain datasets scheduled by airflow cannot be run/touched by any besides a subset of people, but others depend on these tables/can read from them, and they'd like to be able to check on the job and more through the UI. Like you all stated, the granularity is not there. |
@Acehaidrey Thank you for your support! I've been working on this feature off a separate repo, and plan to create an updated PR in Q1 so it can be merged back into the main repo, and then release it as an alpha version for Airflow 1.10. I am currently working on the PR (adding cli support and integrating the configurations to airflow.cfg), should be able to have it out before end of the week. It's also dependent on the latest FAB release, which will include this commit in order to supports timezone. Any testing/issue identified/contributions would be welcome! |
Awesome @jgao54 thanks for taking on the big load :) I'm looking forward to trying it out when it's ready. I will definitely test and look for any issues etc. Let me know if I can help in any way |
I've created #3015. Looking forward your feedbacks. |
Sorry for late response, been trying to get on this! We're going to look into this this coming week :) |
Coming late into the conversation as Lyft is interested to participate in this effort. @aoen thanks for raising the REST API, certainly needs to get sorted out. Note that FAB is shipping much of the CRUD REST API out of the box and will apply the same RBAC as the UI. When we add per-DAG access to the CRUD, it will apply to the REST API the same way. Getting this CRUD UI is neat. Now CRUD is only a small portion of the needed REST API, other portion include the CLI-supporting endpoints, and cross communication endpoints (heartbeats and such). There's the question of whether to bring in the private REST API into or under the realm of FAB, or to bring in some of FAB's access controls into the private REST API. I could be convinced either way, I'd probably vouch towards consolidating for the sake having less moving parts, though the auth-related reqs are different for users and machines most of the time, and we'd have to make FAB support multi-auths in some cases. For Superset at Lyft for instance where we have such requirements (auth for user is different than auth for machines), we've hacked in support for both header-driver auth for machines along with oauth for users (in FAB). |
@jgao54 We can close this PR now can't we? |
Dear Airflow maintainers,
JIRA
The full discussion of introducing RBAC to Airflow can be found in this mailing list thread.
Description
This PR is not the complete diff, it contains the relevant files to illustrate how View-Level Access Control (VLAC) could be implemented with Flask-AppBuilder.
I have a functional work-in-progress branch that contains the complete diff here on this fork. Note the entire FAB-based webserver app resides in
airflow-resource/airflow/www2
. The reason I didn't include it here is that most of the diffs are a copy of the files inwww/
, with modifications on HTML templates to make webserver compatible with FAB.You could test out the FAB-based webserver with the following steps:
fab
branch$ pip install flask-appbuilder
$ fabmanager create-admin
$ cd incubator-airflow/airflow/www2
$ fabmanager run
Security
tab in the UI, create a new user with a Read-Only role, then log in with that user's credentials.Since it's still a work-in-progress, there are still some bugs in the UI. However, I'd love to get some feedback from the community around the overall approach + toss some questions/thoughts below:
cc: @mistercrunch @bolkedebruin @criccomini