Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1433] Implement FAB-backed webserver (WIP, Illustration-Only) #2536

Closed
wants to merge 1 commit into from

Conversation

jgao54
Copy link

@jgao54 jgao54 commented Aug 19, 2017

Dear Airflow maintainers,

JIRA

Description

This PR is not the complete diff, it contains the relevant files to illustrate how View-Level Access Control (VLAC) could be implemented with Flask-AppBuilder.

I have a functional work-in-progress branch that contains the complete diff here on this fork. Note the entire FAB-based webserver app resides in airflow-resource/airflow/www2. The reason I didn't include it here is that most of the diffs are a copy of the files in www/, with modifications on HTML templates to make webserver compatible with FAB.

You could test out the FAB-based webserver with the following steps:

  1. clone the repo and check out the fab branch
  2. install Flask-AppBuilder in your virtual env
    $ pip install flask-appbuilder
  3. create an admin account
    $ fabmanager create-admin
  4. go to the www2/ directory
    $ cd incubator-airflow/airflow/www2
  5. start the webserver
    $ fabmanager run
  6. log in with the admin account you created
  7. to check out the read-only UI, you can go into the Security tab in the UI, create a new user with a Read-Only role, then log in with that user's credentials.

Since it's still a work-in-progress, there are still some bugs in the UI. However, I'd love to get some feedback from the community around the overall approach + toss some questions/thoughts below:

  • FAB’s security model has the concept of users, roles, permissions and views. There’s a table mapping permission to view, a table mapping permission_view to role, and another table mapping role to user. The current DLAC design proposes specifying user:perm mapping in the DAG file, which would implies dynamically inserting roles, and that's a bit complicated to implement. After talking with @criccomini, he suggested it would simplify this process if we use role:perm mapping instead.
  • Security.py is currently just a demo, it introduces a Read Only role that prevents all write/execute access. The RBAC proposal has 5 default roles, but it would be useful to clarify what each role's default permissions should be (i.e. what's the difference between user and read-only roles)
  • I originally intended to share the static/ and template/ directories from www/, and just add conditionals whenever the frameworks diverge. However, given how much modifications are actually required, it was cleaner to replicate the whole thing and make change directly on the copy. This means until we rip out www/, we'll need to maintain both www/ and www2/ when making UI changes.
  • I tried to preserve the endpoints whenever possible (i.e. keeping the /admin route base) given there are many routes hardcoded into the html templates. However, if we no longer want to use /admin, we can change it.
  • The FAB-based webserver is currently pretty decoupled from the rest of airflow codebase, not to mention it has its own config file, eventually it needs to be integrated into airflow.cfg and the cli command.
  • There are still some modification required on the FAB Framework to make all the models/views work in Airflow (i.e. compound pks, formatted columns, etc.)

cc: @mistercrunch @bolkedebruin @criccomini

@criccomini
Copy link
Contributor

After talking with @criccomini, he suggested it would simplify this process if we use role:perm mapping instead.

+1 :D

The RBAC proposal has 5 default roles, but it would be useful to clarify what each role's default permissions should be (i.e. what's the difference between user and read-only roles)

Kalpesh?

I tried to preserve the endpoints whenever possible (i.e. keeping the /admin route base) given there are many routes hardcoded into the html templates. However, if we no longer want to use /admin, we can change it.

Would prefer cleaning up URIs rather than keeping everything under admin.

@criccomini
Copy link
Contributor

Also, looks like tests are failing with:

ImportError: No module named flask_appbuilder

@criccomini
Copy link
Contributor

Overall, LGTM. Following the approach we discussed taking, so no real surprises, which is nice.

@aoen
Copy link
Contributor

aoen commented Aug 22, 2017

There was some discussion before about moving the webserver out into it's own python package (that talks to the airflow core over the API). Unless we can somehow make www2 work like www does now cleanly by default then I don't feel super great about having to maintain both www and www2 so maybe this is the right time to flesh out the API and create this separate package. For what it's worth I definitely see Airbnb (and other companies) switching to this (or something similar) in the future, in this case we(AirBnB) could only help support the www2 model and others who still wanted to use the old www model could support that one.

@jgao54
Copy link
Author

jgao54 commented Aug 23, 2017

@aoen Thanks for the feedback! (was trying to cc you on this PR but wasn't sure of your GH handle)

I am all for more separations of concerns between airflow core and the webserver, and ultimately for fleshing out the API to serve both CLI and UI. However, given the current state of Airflow API, we still have a long way to go. For example, we probably want some discussions around:

  • Do we want to leverage the existing Flask framework or migrate to React.js or equivalent?
  • How is auth going to look like in the API world?
  • How do we maintain backward compatibility and versioning going forward?

On top of the discussions, implementing them will still definitely take some time. Putting RBAC on hold until API is done implies a major delay in introducing this feature.

That's why I propose that we migrate to FAB first, implement RBAC, and while this going on, we can work on freshing out the API. FAB exposes a list of JSON REST APIs on the models, so that could help with implementing APIs as well if choose to leverage it.

I understand that one of the major concern is maintaining two sets of wwws for a set period of time. I agree it’s not ideal, and I’m not a fan either, but assuming the community is okay with eventually deprecate the old www (or at least putting development effort on it to a halt), then we can aim to make this dual-www development process fairly minimal by merging it when it’s closer to being ready (not to mention there are still development needed on FAB to reach feature parity, it’ll likely take some time before it’s actually merge-ready).

@criccomini
Copy link
Contributor

That's why I propose that we migrate to FAB first, implement RBAC, and while this going on, we can work on fleshing out the API.

+1 Given that we're pretty close with the FAB migration, I would prefer to get www2 going, and start having people use that. The next step can be REST API work, but I don't want this project to explode. If you recall, all of this began with RBAC security, then we tied the FAB migration to it, now we're discussing REST API. It's too much to do at once. Let's get FAB done, and focus on the REST API subsequently.

@bolkedebruin
Copy link
Contributor

@jgao54 No React.js. It is deemed incompatible with ASF Projects due its patent clause in its license.

imho www2 should be a separate package. We could just depend on it or say it is the recommended UI and then switch to it?

@criccomini
Copy link
Contributor

criccomini commented Aug 23, 2017

@bolkedebruin

imho www2 should be a separate package

Separate repository? Or same repository, separate package?

If separate repo, within Apache, or outside?

@bolkedebruin
Copy link
Contributor

maybe for now a separate repo so we can iterate a bit faster? (No jiras etc?)

@criccomini
Copy link
Contributor

criccomini commented Aug 23, 2017

I'm trying to tease out why folks are so keen on a separate repo. Things I can see are:

  1. Faster iteration
  2. Actually wanting the frontend to be a separate package forever
  3. Licensing issues (e.g. if we want to use react)

I don't see (1) as that compelling, actually. Ultimately, I don't see anyone else contributing until this stuff lands in master and people start using it anyway. Even if they did contribute, the overhead is going to be PR discussion (just like this) anyway. JIRAs are cheap.

What I'm trying to avoid is the demand that we do one big bang commit. I'd rather do things iteratively. If that means we go a bit slower, I'm willing to make that tradeoff, especially if it means we won't end up on some forked version floating out on another repo.

The concerns that I have with a separate repo are:

  1. Risk of ending up with some fork that gets left behind as master evolves so quickly
  2. If we decide we want to use FAB's data model stuff for some of the REST API, having it in a separate repo is a non-starter if core wants to call it
  3. It could be a weird user experience, as it will make it much harder to have a single CLI (airflow). We'd have to have some airflow-webserver command or something.

Ultimately, my biggest fear is (1), though. I don't want to spend a bunch of time on this and have it floating out in a non-master, non-released repository. I have seen this happen with other large efforts (e.g. Kafka transaction support). Unless you do things incrementally, and in master, you run a risk of ending up with abandonware.

@criccomini
Copy link
Contributor

@bolkedebruin @aoen Any thoughts on the above?

@aoen
Copy link
Contributor

aoen commented Aug 25, 2017

@criccomini My main concern was 2). I think some of the RBAC logic could still live in Core potentially, my main concern was forking the webserver code.

@jgao54 If there is urgency in having the RBAC internally, you could fork it internally and then work on the API-zation of Airflow afterwards before merging. I think the API-zation itself is quite a big task unfortunately, so I'm not sure comfortable taking it on as tech debt. Curious what @bolkedebruin thinks.

@jgao54
Copy link
Author

jgao54 commented Aug 25, 2017

@aoen at WePay we've been trying to avoid forking Airflow as much as we can because of the additional maintenance that are introduced with a fork.

It sounds like the biggest concern for folks is maintaining www2. If that's the only concern, we could potentially make all the changes in-place whenever the framework diverges. It's slower, takes longer to rip out the old version later, but will avoid separate packages.

However, if the goal is separating view out of Core forever, then perhaps FAB is not the most ideal choice.

@criccomini
Copy link
Contributor

@aoen it sounds like you're saying it's a hard requirement that the REST API must be resolved first.

If we proceed with the REST API route, I don't think that FAB makes sense to use as a front end framework, as the largest benefit that it provides is the model view stuff and integration with SQL alchemy. If we move away from that to a REST API, building the UI client-side via a Javascript frontend seems preferable to me.

May I suggest that you and @mistercrunch have an in-person discussion to get on the same page with your preferences. We met with Max a week or two ago, and agreed on the above approach. We then published the expected outcome on the mailing list, and implemented it accordingly. Raising your concern this late in the process is causing unnecessary churn. As Joy said, we're not excited about forking Airflow as it's unhealthy for both us and the community.

If you'd like to talk in person, Joy and I could do a video chat, as well. Same goes for you, @bolkedebruin.

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Oct 22, 2017

I woud like to have a pragmatic approach here. I think the FAB approach solves a lot of issues and maintainability. To me the Rest API is mainly for integration purposes with other services and for security purposes (e.g. the link between tasks and scheduler/db).

So I think they can live side by side, at least for some time. We just should try to keep logic duplication to a minimum. Thus in case where the Rest API makes sense, the UI should use this.

Does this make sense?

@criccomini @aoen @jgao54

@Acehaidrey
Copy link
Contributor

If there is a plan to work on this, I would be happy to volunteer in some of the development/testing/etc. I do think this is an important change to airflow that would allow for much more adoption and maintainability. I know within Pandora, there is a huge need for this, since we have regulations for being a publicly traded company that certain datasets scheduled by airflow cannot be run/touched by any besides a subset of people, but others depend on these tables/can read from them, and they'd like to be able to check on the job and more through the UI. Like you all stated, the granularity is not there.
Let me know if there is a plan outlined anywhere to make this possible :)

@jgao54
Copy link
Author

jgao54 commented Feb 5, 2018

@Acehaidrey Thank you for your support! I've been working on this feature off a separate repo, and plan to create an updated PR in Q1 so it can be merged back into the main repo, and then release it as an alpha version for Airflow 1.10.

I am currently working on the PR (adding cli support and integrating the configurations to airflow.cfg), should be able to have it out before end of the week.

It's also dependent on the latest FAB release, which will include this commit in order to supports timezone.

Any testing/issue identified/contributions would be welcome!

@Acehaidrey
Copy link
Contributor

Awesome @jgao54 thanks for taking on the big load :) I'm looking forward to trying it out when it's ready. I will definitely test and look for any issues etc. Let me know if I can help in any way

@jgao54
Copy link
Author

jgao54 commented Feb 8, 2018

I've created #3015. Looking forward your feedbacks.
cc @Acehaidrey

@Acehaidrey
Copy link
Contributor

Sorry for late response, been trying to get on this! We're going to look into this this coming week :)

@mistercrunch
Copy link
Member

Coming late into the conversation as Lyft is interested to participate in this effort.

@aoen thanks for raising the REST API, certainly needs to get sorted out. Note that FAB is shipping much of the CRUD REST API out of the box and will apply the same RBAC as the UI. When we add per-DAG access to the CRUD, it will apply to the REST API the same way. Getting this CRUD UI is neat.

Now CRUD is only a small portion of the needed REST API, other portion include the CLI-supporting endpoints, and cross communication endpoints (heartbeats and such).

There's the question of whether to bring in the private REST API into or under the realm of FAB, or to bring in some of FAB's access controls into the private REST API. I could be convinced either way, I'd probably vouch towards consolidating for the sake having less moving parts, though the auth-related reqs are different for users and machines most of the time, and we'd have to make FAB support multi-auths in some cases. For Superset at Lyft for instance where we have such requirements (auth for user is different than auth for machines), we've hacked in support for both header-driver auth for machines along with oauth for users (in FAB).

@ashb
Copy link
Member

ashb commented May 24, 2018

@jgao54 We can close this PR now can't we?

@jgao54 jgao54 closed this May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants