Access Issue #504

devilaadi · 2021-11-26T07:59:45Z

HI,

I am facing authorization issue while loading table.

self._table = RawDeltaTable(table_uri, version=version) deltalake.PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Generic error: HTTP error status (status: 403, body: "\u{feff}AuthorizationFailureThis request is not authorized to perform this operation.\nRequestId:173822eb-301e-009e-4c9a-e2f970000000\nTime:2021-11-26T07:49:44.2187383Z")127.0.0.1 - - [26/Nov/2021 08:49:43] "GET /hello HTTP/1.1" 500 -

Code

# Importing flask module in the project is mandatory
# An object of Flask class is our WSGI application.
from flask import Flask
from deltalake import DeltaTable
import os
from typing import Optional, List, Tuple, Any
import adlfs
from urllib.parse import urlparse
import pyarrow
from pyarrow.dataset import dataset, partitioning
from flask import jsonify


os.environ["AZURE_STORAGE_ACCOUNT"] = "*"
os.environ["AZURE_STORAGE_KEY"]='*'

def to_pyarrow_dataset2(
        dt: DeltaTable, fs, container_name, partitions: Optional[List[Tuple[str, str, Any]]] = None
    ) -> pyarrow.dataset.Dataset:
        """
        Build a PyArrow Dataset using data from the DeltaTable.

        :param partitions: A list of partition filters, see help(DeltaTable.files_by_partitions) for filter syntax
        :return: the PyArrow dataset in PyArrow
        """
        if partitions is None:
            file_paths = dt.file_uris()
        else:
            file_paths = dt.files_by_partitions(partitions)
        paths = [urlparse(curr_file) for curr_file in file_paths]

        empty_delta_table = len(paths) == 0
        if empty_delta_table:
            return dataset(
                [],
                schema=dt.pyarrow_schema(),
                partitioning=partitioning(flavor="hive"),
            )

        # Decide based on the first file, if the file is on cloud storage or local
        if paths[0].netloc:
            query_str = ""
            # pyarrow doesn't properly support the AWS_ENDPOINT_URL environment variable
            # for non-AWS S3 like resources. This is a slight hack until such a
            # point when pyarrow learns about AWS_ENDPOINT_URL
            endpoint_url = os.environ.get("AWS_ENDPOINT_URL")
            if endpoint_url is not None:
                endpoint = urlparse(endpoint_url)
                # This format specific to the URL schema inference done inside
                # of pyarrow, consult their tests/dataset.py for examples
                query_str += (
                    f"?scheme={endpoint.scheme}&endpoint_override={endpoint.netloc}"
                )

            keys = [container_name+curr_file.path for curr_file in paths]
            return dataset(
                keys,
                schema=dt.pyarrow_schema(),
                filesystem=fs,
                partitioning=partitioning(flavor="hive"),
            )
        else:
            return dataset(
                file_paths,
                schema=dt.pyarrow_schema(),
                format="parquet",
                partitioning=partitioning(flavor="hive"),
            )

storage_options = {
    'account_name':'*', 
    'account_key':'*'
}

fs = adlfs.AzureBlobFileSystem(**storage_options)

# Flask constructor takes the name of
# current module (__name__) as argument.
app = Flask(__name__)
 
# The route() function of the Flask class is a decorator,
# which tells the application which URL should call
# the associated function.

@app.route("/")
def hello():
    return "ok"

@app.route('/hello')
def hello_name():
    dt = DeltaTable("abfss://containername@straccount.dfs.core.windows.net/deltatablefolder/")
    df = to_pyarrow_dataset2(dt, fs, 'shared').to_table().to_pandas()
    
    return jsonify(df.to_dict(orient='records'))
 
# main driver function
if __name__ == '__main__':
 
    # run() method of Flask class runs the application
    # on the local development server.
    app.run()

str accnt key , table name has been done has been hidden with *

The text was updated successfully, but these errors were encountered:

roeap · 2021-11-29T11:36:20Z

hi @devilaadi, as per our discussion on slack, can this issue be closed?

roeap · 2022-04-22T22:21:29Z

@devilaadi - is this still relevant, or can we close this issue?

roeap · 2022-05-09T21:04:08Z

Closing this issue since there is no more feedback and validated that Azure integration does indeed work.

roeap closed this as completed May 9, 2022

This was referenced Dec 23, 2024

chore(deps): update delta_kernel requirement from 0.5.0 to 0.6.0 #3083

Closed

chore(deps): update delta_kernel requirement from 0.5.0 to 0.6.0 hntd187/delta-rs#21

Closed

chore(deps): update delta_kernel requirement from 0.5.0 to 0.6.0 roeap/delta-rs#249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access Issue #504

Access Issue #504

devilaadi commented Nov 26, 2021 •

edited

Loading

roeap commented Nov 29, 2021

roeap commented Apr 22, 2022

roeap commented May 9, 2022

Access Issue #504

Access Issue #504

Comments

devilaadi commented Nov 26, 2021 • edited Loading

roeap commented Nov 29, 2021

roeap commented Apr 22, 2022

roeap commented May 9, 2022

devilaadi commented Nov 26, 2021 •

edited

Loading