Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if attachment is actually(!) referred to #9585

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pabzm
Copy link
Member

@pabzm pabzm commented Aug 15, 2024

This finishes what #9472 intended to do – but didn't actually do, as I found out.

The code now checks for each non-text mime-part in a multipart-part if its Content-ID or Content-Location is (probably) used in a sibling HTML-part, and only if that matches the respective mime-part is considered an "inline" attachment (that won't show up as downloadable or below the message content).

The second commit in this PR makes sure that for all image-parts, all mime-part-headers are loaded from the server, in order to actually get hands on the Content-Location-header (which isn't always fetched in the first place). It is limited to image-parts because those are the most common ones maybe having a Content-Location-header and I assumed that we shouldn't load the headers for every mime-part, so this seems like a workable real-world distinction for me.

One could probably change how the BODYSTRUCTURE response is fetched and parsed to ensure a Content-Location-header is always fetched in the first place, but I didn't dare to touch that code.

This fix closes #9565

@pabzm pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from d9fe09b to 73cbc95 Compare August 15, 2024 13:41
@pabzm pabzm changed the title Check if attachment is actually referred to Check if attachment is actually(!) referred to Aug 15, 2024
@pabzm pabzm self-assigned this Aug 15, 2024
@pabzm pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from 73cbc95 to 636dcdf Compare August 15, 2024 13:45
@pabzm pabzm requested a review from alecpl August 15, 2024 13:52
}
// Note: There might be more than one HTML part, thus
// we use a callback and concatenate the results.
$html_content = implode('', array_map(function ($part) { return $this->get_part_body($part->mime_id); }, $html_parts));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's indeed my @todo comment 70 lines below to get the HTML bodies and check for references. However, this has a performance impact, that for a message with many images and big HTML content might be noticeable. As of now I considered fetching part headers acceptable, but part body is another story.

We need some more considerations. For example, when loading an image attachment (rcmail_attachment_handler) maybe we could fetch the image without needing to parse the message structure and loading HTML parts again only to get the attachment part data.

Maybe checking for references must be done outside of rcube_message. So rcube_message is not that heavy (to not slow down all the cases where we deal with the message), but do not really need the full structure information, e.g. when viewing source or downloading the message, or when dealing with a single attachment.

Or maybe we need to use cache. It might not help much when dealing with parallel requests (loading image attachments) though. And caches are usually optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and am working on it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the approach so the reference checking will be only done on demand. I'd like to test this, but I have a hard time figuring out how, since the entire class essentially depends on the response to an IMAP bodystructure command, which I'd like to avoid mocking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some progress and am using the approach in #9460 for testing, but this needs more tests to be ready. I'm mostly afk until the beginning of January, though.

If there's no reference to it in a sibling HTML part then we handle it
as a classic attachment (which is shown as downloadable).
Previously all headers were only fetched for message/rfc822, or
if the Content-Type's "name" parameter was set, or if a Content-ID was
set.
The RFC doesn't require neither the "name" parameter nor a Content-ID
for using Content-Location, though, so we shouldn't depend on those.

Instead now all headers are also fetched if the main part of the
Content-Type is "image", to catch more cases.
@pabzm pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from 636dcdf to 1373f57 Compare November 11, 2024 11:23
@pabzm pabzm force-pushed the check-if-attachment-is-actually-referred-to branch from 1373f57 to 8f12091 Compare November 12, 2024 10:08
Copy link

@pabzm, @alecpl
🛎️ This PR has had no activity in two weeks.

Copy link

@pabzm, @alecpl
🛎️ This PR has had no activity in two weeks.

@pabzm pabzm mentioned this pull request Dec 19, 2024
@pabzm pabzm marked this pull request as draft December 20, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Attached picture is not shown if text-part is present in multipart/mixed and image-part as Content-ID
3 participants