Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay port specified is not used in Proxy mode #760

Closed
machawk1 opened this issue Apr 4, 2022 · 13 comments · Fixed by #761
Closed

Replay port specified is not used in Proxy mode #760

machawk1 opened this issue Apr 4, 2022 · 13 comments · Fixed by #761

Comments

@machawk1
Copy link
Member

machawk1 commented Apr 4, 2022

% ipwb index samples/warcs/5mementos.warc | ipwb replay -P localhost:2005
Processing WARC records in 5mementos.warc complete
Proxying to localhost:2005
IPWB replay started on http://localhost:2016
 * Serving Flask app 'ipwb.replay' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

The service is only accessible at port 2016. Is this expected behavior or in fact a bug, @ibnesayeed?

@machawk1
Copy link
Member Author

machawk1 commented Apr 4, 2022

ipwb/ipwb/replay.py

Lines 1049 to 1066 in 6ef47bc

def start(cdxj_file_path, proxy=None):
host_port = ipwb_utils.get_ipwb_replay_config()
app.proxy = proxy
if not host_port:
ipwb_utils.set_ipwb_replay_config(IPWBREPLAY_HOST, IPWBREPLAY_PORT)
# This will throw an exception if daemon is not available.
ipwb_utils.check_daemon_is_alive()
ipwb_utils.set_ipwb_replay_index_path(cdxj_file_path)
app.cdxj_file_path = cdxj_file_path
try:
print((f'IPWB replay started on '
f'http://{IPWBREPLAY_HOST}:{IPWBREPLAY_PORT}'))
app.run(host='0.0.0.0', port=IPWBREPLAY_PORT)

sets the proxy variable but still starts ipwb on the predefined port.

@ibnesayeed
Copy link
Member

This is the intended behavior, as long as Link header of mementos and TimeMaps have the proxy value in them. It will be the job of a reverse proxy to make the service running on a different port accessible the proxy URI.

@machawk1
Copy link
Member Author

machawk1 commented Apr 4, 2022

@ibnesayeed I figured this. Did we want to provide a way for the user that is running the replay service to specify a port beyond the ability to proxy?

@ibnesayeed
Copy link
Member

The proxy flag is there to allow to run the service behind a reverse proxy or in a private network as app servers are not necessarily security hardened, so it is better to terminate TLS on a front server, deal with necessary loadbalancing, apply any firewalls, and only then forward the traffic to the app servers.

Honestly speaking, with many popular reverse proxy services we might not even need to explicitly know the front URI, instead, it can be identified implicitly from certain headers (provided, the reverse proxy server is setting those headers up, which, many do).

@machawk1
Copy link
Member Author

machawk1 commented Apr 5, 2022

@ibnesayeed Do we want to account for the scenario where one might want to simply run the localhost service in a port beyond we dictate (now 2016, prev. 5000)?

@ibnesayeed
Copy link
Member

I thought we already allow customization of port number. If we do not then we should make it an option.

@machawk1
Copy link
Member Author

machawk1 commented Apr 5, 2022

@ibnesayeed Perhaps we do but there is not indication of it in the ipwb replay help:
nohelp

@ibnesayeed
Copy link
Member

Then we should add -p/--port option.

@machawk1
Copy link
Member Author

machawk1 commented Apr 5, 2022

@ibnesayeed Do you think that the custom port should be written to the ipwb config JSON?

  • If yes, then subsequent runs of ipwb replay w/o the port specified will use the last custom port (instead of our default)
  • If no, then the Link header for mementos still pulls from the config file and thus, will not reflect the custom port.

Which approach would you rather take?

machawk1 added a commit that referenced this issue Apr 5, 2022
This approach allows the replay to run on the specified port but
the Link header for mementos does not use this value. It instead
uses the value in the ipwb config. However, we might not want to
overwrite the port in the config for this one-off specification.
Let's see which way @ibnesayeed prefers in #760 before we move
forward in remedying the issue.
@ibnesayeed
Copy link
Member

I have my reservations about having a config file the way it is right now. That being said, of course we would want to reflect changes in that while we have it.

@machawk1
Copy link
Member Author

machawk1 commented Apr 5, 2022

@ibnesayeed
Ok, if we are going to write the custom port to the config file, if the custom port is not specified, should the default be written to the config file on subsequent startup when a custom port is not specified?

@machawk1
Copy link
Member Author

machawk1 commented Apr 5, 2022

Also, @ibnesayeed, if you have any suggestions on how a persistent configuration can be retained between runs, I am open to hear them.

@ibnesayeed
Copy link
Member

My biggest concern with our config file approach is that it is not really a config file, but a state caching file. A config file would be something that is manually created/updated to avoid supplying configuration parameters from the CLI, but it should not be mutable (essentially, accessed in read-only mode) by the system.

That being said, since we are storing and overwriting session information in our current approach, we should continue that practice with ports too for now to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants