Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API - GA does not pick up the location of the user #350

Closed
AjitPS opened this issue Apr 8, 2019 · 25 comments
Closed

API - GA does not pick up the location of the user #350

AjitPS opened this issue Apr 8, 2019 · 25 comments
Assignees
Labels
enhancement Use this for a change about existing functionality, use 'new feature' for new functionality. project:web service Knetminer web service (ws), including search, JSON out, data export, config reading.

Comments

@AjitPS
Copy link
Collaborator

AjitPS commented Apr 8, 2019

Google analytics only showing limited (UK) invocations for API calls, especially for genepage JSP

  • Verify ga_id for UI and API in KnetMiner-dataSource-provider
  • is a 3rd ga_id needed for JSP/html endpoints (e.g., genepage)
@AjitPS AjitPS added the bug label Apr 8, 2019
@AjitPS AjitPS changed the title google analytics for API - genepage bug google analytics for API/genepage - bug Apr 8, 2019
@AjitPS AjitPS changed the title google analytics for API/genepage - bug google analytics for API/genepage Apr 8, 2019
@AjitPS AjitPS changed the title google analytics for API/genepage user analytics for API/genepage Apr 8, 2019
@dicknetherlands
Copy link
Contributor

The GA call is made every time any endpoint is called within the server (datasource). It is controlled by a local settings file. If logs are lower than expected, first of all make sure that the GA setting is actually turned on, and that the GA ID provided is valid and the same one as the GA account you're looking at the results in. Bear in mind that third-party installations may choose to turn this off or use their own GA ID.

I'll ask Madhu to check the code anyway but I'm 99% sure that all calls are being logged. (Perhaps the logging code is failing to register some of them if the Google API returns an error, which is a possibility.)

@AjitPS
Copy link
Collaborator Author

AjitPS commented Jul 10, 2019

thanks @dicknetherlands , in the past we observed more calls from the website/UI than the genepage api call for example, even when we were querying genepage while checking google analytics. Also, a few other websites also use our genepage mode now but I don;t remember seeing much of it in the analytics stats.

Would be good to maybe review that the GA ID is set correctly in the master branch, and if there's more specific analytics data we can get back using it. @KeywanHP can tell more about what he usually monitors in google analytics.

@KeywanHP
Copy link
Member

The main issue here is that the GEO location of the API users is not recorded properly. We know that people from US and France are using genepage but the location page is only showing:

image

@dicknetherlands
Copy link
Contributor

When using GA on a webpage, GA picks up the IP address of the computer the webpage is being viewed on, because that is where the GA code is. When using it on an API, the GA code is server-side, and reports against the IP address delivered to it by the incoming request. Depending on your networking setup, this may well be the IP address of your gateway or possibly even the Docker container manager. Hence you are seeing all hits reported against Harpenden which is where your servers are.

@mdonepudi the solution is the top answer on this stackoverflow page (note the comment below it about multiple addresses): https://stackoverflow.com/questions/29910074/how-to-get-client-ip-address-in-java-httpservletrequest

It needs to be applied here:

@dicknetherlands
Copy link
Contributor

PS. I'm not sure if it is used at all in the UI, I only remember implementing it in the API backend. @mdonepudi that would also be worth checking to see if the GA JS is in the UI template and picks up the correct GA ID from the backend settings.

@KeywanHP
Copy link
Member

KeywanHP commented Jul 11, 2019

Yes we have two GA codes - one for UI and one for API - the UI-GA picks up user locations.

@KeywanHP KeywanHP changed the title user analytics for API/genepage API - GA does not pick up the location of the user Jul 11, 2019
@KeywanHP KeywanHP added project:web service Knetminer web service (ws), including search, JSON out, data export, config reading. enhancement Use this for a change about existing functionality, use 'new feature' for new functionality. and removed bug labels Jul 11, 2019
@mdonepudi
Copy link
Contributor

Have deployed aratiny on AWS instance (London region). Accessed it via JMeter on local machine and KnetminerServer was sending proper IP to Google Analytics. Attached screenshot of the Google Analytics page. Printed DEBUG logs (from KnetminerServer) and can confirm it is indeed my IP.

Screenshot 2019-08-02 at 6 40 54 AM

@AjitPS and @KeywanHP am I looking at the wrong place?

@KeywanHP
Copy link
Member

KeywanHP commented Aug 2, 2019

I can confirm to see India now in our API Analytics. Is this working now because you changed code or because you deployed it on AWS? @mdonepudi

@mdonepudi
Copy link
Contributor

mdonepudi commented Aug 2, 2019

@KeywanHP I haven't changed any code. Just deployed on a remote(AWS) machine and accessed via my computer to reproduce the issue.

P.S: The analytics will still be reported as (not set) when accessing a local deployment using localhost/127.0.0.1 or similar IPv6 address.

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 3, 2019

@mdonepudi so, it seems like it works on AWS but will still not show location if we deploy server and client .war's here on a VM at Rothamsted??

@mdonepudi
Copy link
Contributor

@AjitPS could be due to the VM network configuration? Where are you accessing the VM from? If accessing on the same machine, it could be resolving to localhost which then cannot report location. Attempt to access from a different place to where it was deployed to.

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 4, 2019

@mdonepudi we have deployed server (ws.war) and client.war in a docker container on the same VM (like we'd also do on AWS) at Rothamsted.

Accessing the UI client.war from elsewhere shows locations but all the API modes (e.g., araknet/genome, egnepage/, etc.) don't show specific location).

We have seen this in our workshops in Cambridge and I have tried it when accessing our deployed public KnetMiners from UK, Germany and India.

You can reproduce it by trying a public instance UI, e.g., https://knetminer.rothamsted.ac.uk/Arabidopsis_thaliana/ and similar API calls at https://knetminer.rothamsted.ac.uk/araknet/genepage?keyword=dormancy&list=MFT or similar; and check google analytics locations...

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 5, 2019

FYI, as suggested by @dicknetherlands, I am currently in touch with RRes IT to verify if its their proxy/gateway causing the location misinformation for the API calls.

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 5, 2019

@mdonepudi , RRes IT responded with the following:

As far as I can see the X-Forwarded-For header should be present and correct. On the proxy setup for the araknet/ example (arabidopsis ws/ API) you give, we have the following:
ProxyPreserveHost on
ProxyRequests Off
AllowEncodedSlashes NoDecode
RequestHeader set X-Forwarded-Proto "https"
<Location /araknet>
ProxyPass http://babvs**.rothamsted.ac.uk:****/ws/araknet
ProxyPassReverse http://babvs**.rothamsted.ac.uk:****/ws/araknet
</Location>

I would suggest in the first instance to tweak the apache logging on the VM/perhaps inside docker to log the X-Forwarded-For header for every request to confirm this does actually include the IP address of the original client. If not let me know and I’ll see if I can see anything on the proxy that is not writing this correctly. For some information on this have a look at:- http://www.loadbalancer.org/blog/apache-and-x-forwarded-for-headers/

If this header is present and correct then you can either use this directly in your analysis of the requests, or you might want to look at the apache module mod_rpaf which has to be used on the backend server i.e., not the proxy server. More information at:- https://stackoverflow.com/questions/760283/apache-proxypass-how-to-preserve-original-ip-address

So, we may need to change the Tomcat within our docker image to set this. thoughts @dicknetherlands ?

Given it worked for our docker images on AWS on Friday, am not sure if its the Tomcat within our docker that needs this set or is it an issue with the CentOS VM's we have at RRes.

@dicknetherlands
Copy link
Contributor

dicknetherlands commented Aug 5, 2019 via email

@mdonepudi
Copy link
Contributor

@dicknetherlands On AWS, I tried both the latest master branch followed by X-Forwarded-For headers (for debugging in the logs) which I haven't committed/pushed to the github. If @AjitPS wants to deploy a test instance inside the Rothamsted infrastructure, I can push the changes.

@dicknetherlands
Copy link
Contributor

dicknetherlands commented Aug 5, 2019 via email

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 5, 2019

yes, @mdonepudi pls send me a P/R with the latest code and I can deploy and test locally here today/tomorrow to know for sure what's the issue on our VM's here.

mdonepudi pushed a commit that referenced this issue Aug 5, 2019
@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 5, 2019

thanks @mdonepudi, do you also want to pull latest master commits to your branch and send a P/R to master branch with that + your IP/GA debugging code? for me to test the IP addr. debugging??

AjitPS added a commit that referenced this issue Aug 5, 2019
#350 debug messages added for test server
@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 5, 2019

Thanks, have merged it to master now. We will test locally and get back to you with location, etc. that it shows,

@mdonepudi
Copy link
Contributor

As you can see, printing various headers other than X-Forwarded-For for debugging purposes. Collecting logs would help.

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 6, 2019

Thanks @mdonepudi , we have deployed it on a VM here with a human_test.oxl but:

  • catalina.log does now show your new debug level messages for checking google analytics IP addresses. see: human_catalina.log

@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 28, 2019

let's compare IP logs after #367 is also tested. aratiny logs may differ as run on local PC (jetty), while deployed VM's use Tomcat (with varying tomcat log_level)

mdonepudi pushed a commit that referenced this issue Aug 30, 2019
AjitPS referenced this issue in AjitPS/KnetMiner Aug 30, 2019
changed log level to DEBUG to bebug #350
@AjitPS
Copy link
Collaborator Author

AjitPS commented Aug 30, 2019

FYI, we now have aratiny deployed publicly for testing locations outside RRes firewall, on https://knetminer.rothamsted.ac.uk/aratiny/ (UI) and example API call: https://knetminer.rothamsted.ac.uk/ws/aratiny/genepage?keyword=dormancy&list=ABI3 ... looks to be working now when both x-forwarded-for (occasionally null for API call) and getRemoteAddr() are checked to verify incoming IP addr.

@mdonepudi
Copy link
Contributor

First HTTPRequest header X-FORWARDED-FOR is checked and if no value found, fall backs to getRemoteAddr method of HTTPRequest.

AjitPS referenced this issue in AjitPS/KnetMiner Aug 30, 2019
rolling back log level to INFO. #350 is fixed
AjitPS added a commit that referenced this issue Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Use this for a change about existing functionality, use 'new feature' for new functionality. project:web service Knetminer web service (ws), including search, JSON out, data export, config reading.
Projects
None yet
Development

No branches or pull requests

4 participants