Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulp_file fails to sync file repos literally containing URL encoding strings like %3a in a file name #5686

Closed
quba42 opened this issue Aug 8, 2024 · 4 comments · Fixed by #5809 or #5835
Labels
file Issues related to pulp_file Issue

Comments

@quba42
Copy link
Contributor

quba42 commented Aug 8, 2024

Version

Tested using python3.11-pulp-file-1.15.1 on a Katello instance

Describe the bug

If a file in a file repo literally contains for example %3a in the filename, then this repo cannot be synced to pulp_file, because the download URL unparses the %3a to : and runs into a 404, e.g:

404, message='Not Found', url=URL('http://my-file-server/pub/file_repo/stupid-file-name-:-blub')

To Reproduce

I manually create a local file repo as follows:

mkdir file_repo
touch file_repo/stupid-file-name-%3a-blub
echo "blub" > file_repo/stupid-file-name-%3a-blub
bash make_pulp_manifest.sh

Here make_pulp_manifest.sh is:

TARGET_PATH=file_repo

for TARGET_FILE in "${TARGET_PATH}/"*
do
  FILE="$(basename "${TARGET_FILE}")"
  SHA256="$(sha256sum "${TARGET_FILE}" | cut -d ' ' -f 1)"
  SIZE="$(stat --printf="%s" "${TARGET_FILE}")"
  echo "${FILE},${SHA256},${SIZE}" >> ${TARGET_PATH}/PULP_MANIFEST
done

Feel free to create the PULP_MANIFEST file any other way!

Now serve this file repo somewhere and sync it to pulp_file and observe the sync error.

404, message='Not Found', url=URL('http://my-file-server/pub/file_repo/stupid-file-name-:-blub')

Expected behavior

Sync succeeds. Not sure how difficult it is to differentiate between a URL that contains URL encodings that should be converted and one like this case where the URL encoding string is literally part of the filename/path.

Additional context

The issue was originally discovered via the real world use case of trying to synchronize a mounted Ubuntu 22.04 iso into pulp_file. Example "problem files" from the mounted Ubuntu iso:

# ls -l /var/www/html/pub/ubuntu22/22.04-x86_64/pool/main/o/openssh/
total 468
-r--r--r--. 1 root root 435336 Jan  3  2024 openssh-server_1%3a8.9p1-3ubuntu0.6_amd64.deb
-r--r--r--. 1 root root  38716 Jan  3  2024 openssh-sftp-server_1%3a8.9p1-3ubuntu0.6_amd6
@quba42
Copy link
Contributor Author

quba42 commented Aug 8, 2024

In case you want the traceback:

pulp_tasks:
- pulp_href: "/pulp/api/v3/tasks/01913275-8c8d-7744-8f6f-b6939c5fb41a/"
  pulp_created: '2024-08-08T14:47:30.190+00:00'
  pulp_last_updated: '2024-08-08T14:47:30.190+00:00'
  state: failed
  name: pulp_file.app.tasks.synchronizing.synchronize
  logging_cid: d00b2e92-0cf1-4bbd-99b8-97a07ff858f5
  created_by: "/pulp/api/v3/users/1/"
  unblocked_at: '2024-08-08T14:47:30.201+00:00'
  started_at: '2024-08-08T14:47:30.260+00:00'
  finished_at: '2024-08-08T14:47:30.520+00:00'
  error:
    traceback: |2
        File "/usr/lib/python3.11/site-packages/pulpcore/tasking/tasks.py", line 66, in _execute_task
          result = func(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulp_file/app/tasks/synchronizing.py", line 51, in synchronize
          rv = dv.create()
               ^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/declarative_version.py", line 161, in create
          loop.run_until_complete(pipeline)
        File "/usr/lib64/python3.11/asyncio/base_events.py", line 654, in run_until_complete
          return future.result()
                 ^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/api.py", line 220, in create_pipeline
          await asyncio.gather(*futures)
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/api.py", line 41, in __call__
          await self.run()
        File "/usr/lib/python3.11/site-packages/asgiref/sync.py", line 486, in thread_handler
          raise exc_info[1]
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 186, in run
          pb.done += task.result()  # download_count
                     ^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 241, in _handle_content_unit
          await asyncio.gather(*downloaders_for_content)
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/models.py", line 119, in download
          raise e
        File "/usr/lib/python3.11/site-packages/pulpcore/plugin/stages/models.py", line 111, in download
          download_result = await downloader.run(extra_data=self.extra_data)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulpcore/download/http.py", line 269, in run
          return await download_wrapper()
                 ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/backoff/_async.py", line 151, in retry
          ret = await target(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulpcore/download/http.py", line 254, in download_wrapper
          return await self._run(extra_data=extra_data)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/site-packages/pulpcore/download/http.py", line 290, in _run
          self.raise_for_status(response)
        File "/usr/lib/python3.11/site-packages/pulpcore/download/http.py", line 187, in raise_for_status
          response.raise_for_status()
        File "/usr/lib64/python3.11/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
          raise ClientResponseError(
    description: 404, message='Not Found', url=URL('http://test-deploy-master.infra.dev.atix/pub/file_repo/stupid-file-name-:-blub')
  worker: "/pulp/api/v3/workers/01913227-33ef-7e4f-8183-13fda6a560b2/"
  child_tasks: []
  progress_reports:
  - message: Downloading Metadata
    code: sync.downloading.metadata
    state: completed
    done: 1
  - message: Downloading Artifacts
    code: sync.downloading.artifacts
    state: failed
    done: 0
  - message: Associating Content
    code: associating.content
    state: canceled
    done: 0
  - message: Parsing Metadata Lines
    code: sync.parsing.metadata
    state: completed
    total: 1
    done: 1
  created_resources: []
  reserved_resources_record:
  - "/pulp/api/v3/repositories/file/file/01913269-3eae-7f1a-83b6-ddaff5868d60/"
  - shared:/pulp/api/v3/remotes/file/file/01913269-3c65-715a-b428-0db1b25e2954/
  - shared:/pulp/api/v3/domains/01913223-895c-7679-bd67-b99dd1efdaf9/
create_version: true
task_groups: []
poll_attempts:
  total: 1
  failed: 1

@dkliban
Copy link
Member

dkliban commented Aug 13, 2024

Looks like we need to do something like this aio-libs/yarl#1077

However, I am not sure if it's always appropriate.

@dkliban dkliban added the file Issues related to pulp_file label Aug 13, 2024
@sbernhard
Copy link
Contributor

However, I am not sure if it's always appropriate.

I agree, that its pretty "strange" that files are namend like this and including "%3a" digits but from the linux file name standards, this works and therefore pulp "file" should work with it.

@quba42
Copy link
Contributor Author

quba42 commented Aug 19, 2024

Note: Somebody gave me the following line as a hint: https://github.com/pulp/pulpcore/blob/main/pulpcore/download/http.py#L287-L289

We suspect that this is where pulpcore needs changing.

hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 17, 2024
hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 18, 2024
hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 19, 2024
hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 19, 2024
hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 19, 2024
hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 20, 2024
hstct added a commit to ATIX-AG/pulpcore that referenced this issue Sep 20, 2024
sbernhard added a commit to ATIX-AG/pulpcore that referenced this issue Sep 24, 2024
sbernhard added a commit to ATIX-AG/pulpcore that referenced this issue Sep 24, 2024
sbernhard added a commit to ATIX-AG/pulpcore that referenced this issue Sep 24, 2024
quba42 pushed a commit to ATIX-AG/pulpcore that referenced this issue Sep 26, 2024
m-bucher pushed a commit to ATIX-AG/pulpcore that referenced this issue Sep 26, 2024
@lubosmj lubosmj reopened this Oct 1, 2024
hstct pushed a commit to ATIX-AG/pulpcore that referenced this issue Oct 1, 2024
hstct pushed a commit to ATIX-AG/pulpcore that referenced this issue Oct 1, 2024
@ggainey ggainey closed this as completed in 0d3b421 Oct 8, 2024
patchback bot pushed a commit that referenced this issue Nov 10, 2024
closes #5686

(cherry picked from commit 0d3b421)
mdellweg pushed a commit that referenced this issue Nov 11, 2024
closes #5686

(cherry picked from commit 0d3b421)
ggainey pushed a commit that referenced this issue Nov 11, 2024
closes #5686

(cherry picked from commit 0d3b421)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
file Issues related to pulp_file Issue
Projects
None yet
4 participants