Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work on authenticated Datasette instances #13

Closed
simonw opened this issue Nov 16, 2023 · 4 comments
Closed

Doesn't work on authenticated Datasette instances #13

simonw opened this issue Nov 16, 2023 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@simonw
Copy link
Collaborator

simonw commented Nov 16, 2023

Tried this plugin on Datasette Cloud and got this error:

CleanShot 2023-11-16 at 12 19 20@2x

The problem is here:

stuff = (
await datasette.client.get(
datasette.urls.table(database, table, "json") + "?" + query_string
)
).json()

stuff = (
await datasette.client.get(
datasette.urls.table(database, table, "json") + "?" + query_string
)
).json()

https://github.com/datasette/datasette-enrichments/blob/96cbf510115b3ab738360f88a6e7e8070575a171/datasette_enrichments/__init__.py#L79C1-L79C1

Those calls to datasette.client.get() don't attempt to pass through authentication information, so a Datasette instance with extra authentication plugins installed refuses them.

@simonw simonw added the bug Something isn't working label Nov 16, 2023
@simonw
Copy link
Collaborator Author

simonw commented Nov 16, 2023

Options:

  1. Pass through cookies and potentially other authentication headers too. I'll try this first.
  2. Modify Datasette itself to make it easier to call datasette.client.get() etc while passing through authentication credentials.

@simonw
Copy link
Collaborator Author

simonw commented Nov 16, 2023

I wrote this function:

async def get_with_auth(datasette, request, *args, **kwargs):
    cookies = kwargs.pop("cookies") or {}
    headers = kwargs.pop("headers") or {}
    # Copy across cookies from request
    for key, value in request.cookies.items():
        if key not in cookies:
            cookies[key] = value
    # Also the authorization header, if set
    if "authorization" in request.headers:
        headers["authorization"] = request.headers["authorization"]
    kwargs["cookies"] = cookies
    kwargs["headers"] = headers
    return await datasette.client.get(*args, **kwargs)

But it's not going to work! Because of this code:

async def enqueue(
self, datasette, db, table, filter_querystring, config, actor_id=None
):
# Enqueue a job
qs = filter_querystring
if qs:
qs += "&"
qs += "_size=0&_extra=count"
table_path = datasette.urls.table(db.name, table)
response = await datasette.client.get(table_path + ".json" + "?" + qs)
row_count = response.json()["count"]
await db.execute_write(CREATE_JOB_TABLE_SQL)
def _insert(conn):
with conn:
cursor = conn.execute(
"""
insert into _enrichment_jobs (
enrichment, status, database_name, table_name, filter_querystring,
config, started_at, row_count, error_count, done_count, cost_100ths_cent, actor_id
) values (
:enrichment, 'p', :database_name, :table_name, :filter_querystring, :config,
datetime('now'), :row_count, 0, 0, 0{}
)
""".format(
", :actor_id" if actor_id else ", null"
),
{
"enrichment": self.slug,
"database_name": db.name,
"table_name": table,
"filter_querystring": filter_querystring,
"config": json.dumps(config or {}),
"row_count": row_count,
"actor_id": actor_id,
},
)
return cursor.lastrowid
job_id = await db.execute_write_fn(_insert)
if self.runs_in_process:
await self.start_enrichment_in_process(datasette, db, job_id)
async def start_enrichment_in_process(self, datasette, db, job_id):
loop = asyncio.get_event_loop()
job_row = (
await db.execute("select * from _enrichment_jobs where id = ?", (job_id,))
).first()
if not job_row:
return
job = dict(job_row)
async def run_enrichment():
next_cursor = job["next_cursor"]
while True:
# Get next batch
table_path = datasette.urls.table(
job["database_name"], job["table_name"], format="json"
)
qs = job["filter_querystring"]
if next_cursor:
qs += "&_next={}".format(next_cursor)
qs += "&_size={}".format(self.batch_size)
response = await datasette.client.get(table_path + "?" + qs)

Neither of those methods have access to the request object - because enrichments are designed to run in-process but outside of the user's request.

Authentication is considered when a user first queues up an enrichment, but after that point the code runs independently and needs to be able to fetch new rows without adding extra authentication headers.

Options for solving this:

  • Stop using datasette.client.get("/db/table.json?...") to fetch new rows - go straight to the database instead. This is frustrating because I'll have to re-implement pagination etc.
  • Modify Datasette core to give datasette.client a way to bypass authentication. This is harder.
  • Maybe figure out a way to call the existing Datasette TableView mechanism in a way that bypasses authentication.

Or... there may be one more option. I could define my own permission_allowed() hook in this plugin which uses some kind of internal criteria unique to this plugin, such that calls to datasette.client.get() can complete here without having to intefere with other aspects of Datasette's existing auth system.

I'm going to experiment with that path first.

@simonw
Copy link
Collaborator Author

simonw commented Nov 16, 2023

That seems to work!

simonw added a commit that referenced this issue Nov 16, 2023
@simonw
Copy link
Collaborator Author

simonw commented Nov 17, 2023

Got this working on Datasette Cloud now.

@simonw simonw closed this as completed Nov 17, 2023
@simonw simonw added this to the First alpha milestone Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant