Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploaded images become inaccessible after the domain was changed. #861

Open
Octobug opened this issue Mar 19, 2024 · 24 comments
Open

Uploaded images become inaccessible after the domain was changed. #861

Octobug opened this issue Mar 19, 2024 · 24 comments
Assignees
Labels
bug Something isn't working

Comments

@Octobug
Copy link
Contributor

Octobug commented Mar 19, 2024

Describe the bug

Uploaded images become inaccessible after the domain was changed. I only tested it in development env.

To Reproduce

Steps to reproduce the behavior:

  1. Under domain.old, you uploaded your custom avatar.
  2. You updated domain.old to domain.new, and the uploaded avatar became inaccessible.
    • Domain changes are very common.

Expected behavior

Uploaded files should serve well no matter what the domain is.

Screenshots

image image

Platform

  • Device: Desktop
  • OS: macOS Ventura 13.6.1
  • Browser and version: Chrome 122.0.6261.129 (Official Build) (x86_64)
  • Version: v1.2.5
@Octobug Octobug added the bug Something isn't working label Mar 19, 2024
@LinkinStars
Copy link
Member

Thanks for the feedback. It seems that the storing avatars relies on absolute paths. Modifying it to a new domain could potentially introduce new issues. Saving them as relative paths might be a better option. 🤔

@Octobug
Copy link
Contributor Author

Octobug commented Mar 21, 2024

Thanks for the feedback. It seems that the storing avatars relies on absolute paths. Modifying it to a new domain could potentially introduce new issues. Saving them as relative paths might be a better option. 🤔

@LinkinStars
Yeah, uploaded files should be stored as relative paths. And, for those already deployed in production, the fixing should implement a mechanism to automatically update existing data.

  • This could be done by an independent command line tool. or
  • During the installation. or
  • In the backend code, auto rewrite those stored paths when fetching them.

@Octobug Octobug changed the title Avatar became inaccessible after the domain was changed. Uploaded images become inaccessible after the domain was changed. Mar 21, 2024
@Octobug
Copy link
Contributor Author

Octobug commented Mar 21, 2024

@LinkinStars I found that all images in a question, answer or avatar have this issue. I've changed the issue title.

@Octobug
Copy link
Contributor Author

Octobug commented Mar 21, 2024

I would like to solve this issue once the solution is settled. 👀

@calmdev
Copy link

calmdev commented Mar 21, 2024

Fwiw, I ran into this issue with logo and favicon as well under the branding settings when originally testing how to go about setting up answer.

@LinkinStars
Copy link
Member

Thanks for the feedback. It seems that the storing avatars relies on absolute paths. Modifying it to a new domain could potentially introduce new issues. Saving them as relative paths might be a better option. 🤔

@LinkinStars Yeah, uploaded files should be stored as relative paths. And, for those already deployed in production, the fixing should implement a mechanism to automatically update existing data.

  • This could be done by an independent command line tool. or
  • During the installation. or
  • In the backend code, auto rewrite those stored paths when fetching them.

@Octobug Your suggestion is great. We have carefully discussed this issue for some time and believe that providing a command-line solution would be better.

For example, we can offer a functionality like this:

./answer fix image from A to B

This command can modify the domain name in existing data.

The reasons for this are as follows:

  1. Existing users may not prefer using relative paths to access images.
  2. Answer supports storage plugins, allowing users to upload images to other locations, such like S3 or OSS. In such cases, the saved image may have a different absolute domain path, such as https://s3.east-1.xxx.com/. Of course, this is just a possibility. By providing this command-line tool, we can assist users in making the switch when they want to change their storage and update this part of the address.

Additionally, there is another requirement. We believe it would be helpful to provide an environment variable, ABSOLUTE_IMAGE_PATH, which users can set to true. In this way, Answer will default to using absolute paths for storing images. This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.

Once again, thank you very much for suggesting this feature because we believe it could pose a little big challenge. What do you think? If there are any unclear points, we can continue the discussion.

@Octobug
Copy link
Contributor Author

Octobug commented Mar 22, 2024

@LinkinStars Thank you for your detailed response!

⚠️ I haven't read much of the relevant code yet so I might miss some background knowledge of Answer. I'll make a summary to see if I have understood correctly.

I am also in favor of building this functionality as a ./answer command option.

And let's split a URL into 2 major parts for the convenience in describing:

  • PROTOCOL://HOSTNAME:PORT
  • PATH
    • BASE_PATH
    • RELATIVE_PATH

I think the ABSOLUTE_IMAGE_PATH could be renamed to IMAGE_PATH_PREFIX (if there will be other types of files in the future, maybe it could be UPLOADS_PATH_PREFIX).

There are 2 major types of uploaded files:

  1. For any external storage provided by 3rd-party services (S3, OSS, ...), files are stored as PROTOCOL://HOSTNAME:PORT + PATH in DB with no doubt.
  2. For files stored locally, application codes can directly use IMAGE_PATH_PREFIX + RELATIVE_PATH, only RELATIVE_PATH is stored in DB. This usage of IMAGE_PATH_PREFIX can be delivered by documentation:
    • 2.1. By default, the value of IMAGE_PATH_PREFIX is /
    • 2.2. For files need to be moved to a new directory, the value of IMAGE_PATH_PREFIX is BASE_PATH (e.g. new/dirname/)
    • 2.3. For files need to be moved to another self-hosted place different from the Answer instance, the value of IMAGE_PATH_PREFIX is PROTOCOL://HOSTNAME:PORT + BASE_PATH (e.g. https://statics.self-hosted.com/uploads/). Update: Uploading would be a problem. Better not to support it.

Pseudocode:

// When fetching existing data to rewrite:
getImageURL(image) {
    if (isLocalFile(image)) { // an image has a property to be distinguished. e.g. !startsWith('http')
        return IMAGE_PATH_PREFIX + image.path;
    } else {
        return image.path;
    }
}

As for the ./answer fix image from a to b command, I think the from a and to b args are not needed. Because what are stored in DB are always RELATIVE_PATHs.

Update: for files stored in external places, these args are needed.

- Or, are the two args used for auto-migrating uploaded files (mv SRC DST)?
- The description above assumes that an admin move uploaded files manually.

Update:

  • IMAGE_PATH_PREFIX is used to fix the existing problem: After this problem is fixed, domain changes don't need any extra operations.
  • ./answer migrate image from a to b is used to provide a migration feature: When you change external storage type, migration is inevitable.

@Octobug
Copy link
Contributor Author

Octobug commented Mar 22, 2024

@LinkinStars

This task seems heavy for a Go & Answer beginner (exactly me 🤣). If the coding job is real complicated under your estimation, I think it's better be done by your team.

@LinkinStars
Copy link
Member

@Octobug

I think the ABSOLUTE_IMAGE_PATH could be renamed to IMAGE_PATH_PREFIX (if there will be other types of files in the future, maybe it could be UPLOADS_PATH_PREFIX).

I agree. IMAGE_PATH_PREFIX would be better.

Update:

IMAGE_PATH_PREFIX is used to fix the existing problem: After this problem is fixed, domain changes don't need any extra operations.
./answer migrate image from a to b is used to provide a migration feature: When you change external storage type, migration is inevitable.

That's a good summary. You got it.

This task seems heavy for a Go & Answer beginner (exactly me 🤣). If the coding job is real complicated under your estimation, I think it's better be done by your team.

Take it easy. You can try your best, and we will help you solve any problems together. 💪

@Octobug
Copy link
Contributor Author

Octobug commented Mar 22, 2024

Take it easy. You can try your best, and we will help you solve any problems together. 💪

@LinkinStars Okay then, I'll give it another try.

@Octobug
Copy link
Contributor Author

Octobug commented Mar 24, 2024

@LinkinStars

I have another thought on external storage type...

If the uploads dataset is huge, the migration could be very time-consuming because it needs to query & filter out all urls of avatars and urls in texts of questions & answers, and then write them back to DB.

What if Answer saves all types of uploads with relative paths with a prefix marker? Like: :NUM:/uploads/post/57pxxvykQFj.jpeg

:NUM: is a storage url prefix version number. And the version number maps to a URL prefix string, which is used to replace the :NUM: marker in a relative path stored in DB.

The default value of NUM is 0 mapping to "/", it deals with local uploads type. When an Answer admin decides to move the uploads directory to another path, the admin just needs to change 0 to map to "/new/path" after files are settled down.

If the admin adds an external storage, since there is a visit_url_prefix config field for every storage plugin, Answer knows if it is a new prefix. If it is, it increases NUM, and set it as active so that any subsequent uploads use the new active NUM.

Let's say there is a new type of site_info called uploads, it looks like:

{
    "url_prefix_versions": {
        "0": {
            // ":0:/uploads/post/57pxxvykQFj.jpeg"
            "active": true,
            "value": "/"
        },
        "1": {
            // ":1:/uploads/post/57pxxvykQFj.jpeg"
            "active": false,
            "value": "https://s3.east-1.xxx.com/"
        },
        "2": {
            // ":2:/uploads/post/57pxxvykQFj.jpeg"
            "active": true,
            "value": "https://abc.oss-cn-hangzhou.aliyuncs.com/"
        },
        ...
    },

    // other properties...
}

With this solution, those relative paths hardly need to be changed.

The basic migrate operation process is:

  1. The admin copies their uploads to another place.
  2. Then the admin updates the corresponding NUM's mapping.
  3. The admin now can delete uploads in the old place.

And still, ./answer migrate uploads from_a to_b could be treated as the last resort to rewrite paths in DB. And if the version number solution is adopted, this migrate functionality is better to support both command line and the admin UI. It can provide migration status like progress, affected number of urls, etc.

However, migration of large dataset may not be common. The consideration here is whether it is worth implementing such a feature to address a rare scenario.

If this idea is rejected, I will try to implement the previously discussed solution.

@LinkinStars
Copy link
Member

@Octobug First of all, thank you very much for presenting the new proposal. We have had a detailed discussion regarding your proposal. It is certain that using external storage for migration would make it somewhat easier.

However, we still have some considerations:

  1. For regular users, this additional configuration would require extra understanding, thus increasing the learning curve for them.
  2. There would be an additional number in the path, which might appear slightly odd to users.
  3. As you also mentioned, this migration is a very infrequent operation and might only be performed once. This point convinced us.

Therefore, based on the reasons stated above, we would prefer to keep the process simple, and thus we suggest sticking with the previous solution would be better.

@Octobug
Copy link
Contributor Author

Octobug commented Mar 25, 2024

@LinkinStars Okay, it was just a thought. 👀

I would like to confirm some details:

The env var

Does this var need to be supported both in Config File and Environment Variable?

  • In config file: service_config.upload_url_prefix
    • Used when site is running.
  • Environment Variable: UPLOAD_URL_PREFIX
    • Used for installation
  • Default value: "/"

It actually affects all kinds of uploads, so I think upload is more proper. And it may include PROTOCOL://HOSTNAME:PORT, so UPLOAD_URL_PREFIX is better than UPLOAD_PATH_PREFIX.

The command

./answer fix upload PREFIX_SRC PREFIX_DST

@LinkinStars
Copy link
Member

@LinkinStars Okay, it was just a thought. 👀

I would like to confirm some details:

The env var

Does this var need to be supported both in Config File and Environment Variable?

  • In config file: service_config.upload_url_prefix

    • Used when site is running.
  • Environment Variable: UPLOAD_URL_PREFIX

    • Used for installation
  • Default value: "/"

It actually affects all kinds of uploads, so I think upload is more proper. And it may include PROTOCOL://HOSTNAME:PORT, so UPLOAD_URL_PREFIX is better than UPLOAD_PATH_PREFIX.

The command

./answer fix upload PREFIX_SRC PREFIX_DST

I can't agree more. 👍 Yes, supported both in config file and env would be better. So UPLOAD_URL_PREFIX is better than UPLOAD_PATH_PREFIX.

@Octobug
Copy link
Contributor Author

Octobug commented Mar 31, 2024

Additionally, there is another requirement. We believe it would be helpful to provide an environment variable, ABSOLUTE_IMAGE_PATH, which users can set to true. In this way, Answer will default to using absolute paths for storing images. This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.

@LinkinStars I made some attempts on the var part. The consideration you mentioned before (I made it bold & italic) seemd hard to achieve: The var (UPLOAD_URL_PREFIX) is closely associated with service_config.upload_path.

  • If an admin moves the uploaded files locally, they need to change service_config.upload_path at the same time:
  • If an admin moves the uploaded files to other host that needs to be accessed via an absolute URL, UPLOAD_URL_PREFIX doesn't help at all. Because service_config.upload_path cannot reach to other host.
    • This var seems to be unnecessary for absolute URLs. Uploading files via plugins also doesn't need it.

If using a symlink pointing to the locally moved uploaded files is acceptable (tell admins to do so in the documentation), the var seems to be unnecessary at all. Answer is able to distinguish these two types of uploads without an env var:

  • Absolute URL for uploading via plugins.
  • Relative URL for uploading locally.

@LinkinStars
Copy link
Member

@Octobug 🤔 I got it. So, for now, we can consider only three scenarios:

  • Using relative paths: UPLOAD_URL_PREFIX defaults to "/"
  • Using absolute paths with the website domain: UPLOAD_URL_PREFIX is "http://answer.cn:8080/"
  • Using absolute paths with a third-party service: UPLOAD_URL_PREFIX is "https://s3.east-1.xxx.com/" (Of course, this scene is already handled by the plugin. Just an example.)

This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used.
That would simplify the problem. What do you think?

@Octobug
Copy link
Contributor Author

Octobug commented Apr 1, 2024

@Octobug 🤔 I got it. So, for now, we can consider only three scenarios:

  1. Using relative paths: UPLOAD_URL_PREFIX defaults to "/"
  2. Using absolute paths with the website domain: UPLOAD_URL_PREFIX is "http://answer.cn:8080/"
  3. Using absolute paths with a third-party service: UPLOAD_URL_PREFIX is "https://s3.east-1.xxx.com/" (Of course, this scene is already handled by the plugin. Just an example.)

This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used. That would simplify the problem. What do you think?

@LinkinStars

Right, there are three scenarios for now. But scenario 2. seems weird 🤔️. What benefits does storing absolute URLs bring (we don't have disagreement on 3.)? The only benefit I can think of is that admins can use a reverse proxy server (like Nginx) to route the requests of these absolute URLs to where those uploaded files are (maybe another host different from Answer's host).

@LinkinStars
Copy link
Member

@Octobug 🤔 I got it. So, for now, we can consider only three scenarios:

  1. Using relative paths: UPLOAD_URL_PREFIX defaults to "/"
  2. Using absolute paths with the website domain: UPLOAD_URL_PREFIX is "http://answer.cn:8080/"
  3. Using absolute paths with a third-party service: UPLOAD_URL_PREFIX is "https://s3.east-1.xxx.com/" (Of course, this scene is already handled by the plugin. Just an example.)

This consideration is made for existing users who may copy the content to other places where the images won't display if relative paths are used. That would simplify the problem. What do you think?

@LinkinStars

Right, there are three scenarios for now. But scenario 2. seems weird 🤔️. What benefits does storing absolute URLs bring (we don't have disagreement on 3.)? The only benefit I can think of is that admins can use a reverse proxy server (like Nginx) to route the requests of these absolute URLs to where those uploaded files are (maybe another host different from Answer's host).

@Octobug 👍 What you said is absolutely correct. We are indeed concerned about the existence of such users. As far as I know, there are users who deploy their frontend and backend separately, and as you mentioned, images are accessed through nginx proxy. Of course, this is only do for compatibility reasons. This is also why it is treated as an environment variable rather than a configuration item placed in the admin page.

@Octobug
Copy link
Contributor Author

Octobug commented Apr 2, 2024

@LinkinStars Okay, thank you for your patience! In this case, it seems that your first solution is better. :D That is, make the var a boolean. Although the name may need to be changed from ABSOLUTE_IMAGE_PATH to:

  • ABSOLUTE_UPLOAD_URL
  • UPLOAD_ABSOLUTE_URL
  • UPLOAD_USE_ABSOLUTE (I prefer this one)

@LinkinStars
Copy link
Member

  • UPLOAD_USE_ABSOLUTE (I prefer this one)

@Octobug 😄 Maybe we can use UPLOAD_USE_ABSOLUTE_URL. Thanks for your patience as well. This whole discussion has made the whole issue more clear.

@Octobug
Copy link
Contributor Author

Octobug commented Apr 2, 2024

  • UPLOAD_USE_ABSOLUTE (I prefer this one)

@Octobug 😄 Maybe we can use UPLOAD_USE_ABSOLUTE_URL. Thanks for your patience as well. This whole discussion has made the whole issue more clear.

@LinkinStars Aha, okay. I'll submit a PR a few days later together with the fix command.

@Octobug
Copy link
Contributor Author

Octobug commented May 26, 2024

@LinkinStars Here's my version of the fix command, I would like to know your opinion. 👀

answer fix upload SRC_PREFIX DST_PREFIX [--continue] [--offline|online] [--dry-run] [--limit NUM]

When the fix command starts, it checks if there is a tmp file: .fix_upload_interrupted.

  1. If there isn't, it generates one.
  2. If there is, it tells the user that the previous fix was interrupted, and the user needs to remove the file if they are to start a fresh fix, or uses the --continue switch to continue.
  3. When there isn't any data left that matches the SRC_PREFIX, it removes .fix_upload_interrupted then exit.

Switches:

  • --continue: the execution might be interrutped, some of the data is rewritten and the rest of them isn't. It works with the .fix_upload_interrupted.
    • All data that matches the DST_PREFIX will be ignored.
    • This switch is for a corner case, which is not idempotent:
      • fix upload /uploads/ /uploads/a/, assume it is interrupted during its running
      • run it again, resulting /uploads/a/a/
  • --offline: it requires the admin to shut down the Answer site, and it works in batch mode.
  • --online: it allows the fix to happen when a Answer site is up. It fixes a row of data one at a time in a loop, locks the data row preventing it from being loaded into the web editor, so that the user won't submit the wrong version of links after the fix is done.
    • But this kind of "read lock" seems impossible. I haven't done any research on it. This switch might not be feasible.
  • If it is feasible in go command, I would like to implement smaller granularity fix: fix upload [branding|avatar|question]
  • --dry-run: only do the test. It shows what data will be affected, the "before" and "after", with a limit (e.g. first 10 entries).
  • --limit: fix NUM rows then exit.

Updated at 2024-07-17 22:40

It makes little sense to lock a data row no matter it's an online or offline fix. Because you cannot make sure that there aren't any web UI editors opened with old data loaded in it. Even after an offline fix is finished, there might still be a user going to submit the old version data.

So, neither way is perfect. This corner case can only be addressed by the documentation work I guess.

@LinkinStars
Copy link
Member

@Octobug Very thoughtful consideration. I have only one suggestion. I recommend that this command be run offline, but it's not mandatory. It would be advisable to include instructions in the documentation, suggesting users to run it offline. Otherwise, it would require handling various unexpected concurrency situations, such as table locking, which would make the implementation more complex. 🤔️

@Octobug
Copy link
Contributor Author

Octobug commented May 27, 2024

@Octobug Very thoughtful consideration. I have only one suggestion. I recommend that this command be run offline, but it's not mandatory. It would be advisable to include instructions in the documentation, suggesting users to run it offline. Otherwise, it would require handling various unexpected concurrency situations, such as table locking, which would make the implementation more complex. 🤔️

Okay. I have the feeling that I will give up the --online switch. :D

@fenbox fenbox removed this from Answer Roadmap Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants