Skip to content

Commit

Permalink
[Continued] Google Analytics import (#1753)
Browse files Browse the repository at this point in the history
* Add has_imported_stats boolean to Site

* Add Google Analytics import panel to general settings

* Get GA profiles to display in import settings panel

* Add import_from_google method as entrypoint to import data

* Add imported_visitors table

* Remove conflicting code from migration

* Import visitors data into clickhouse database

* Pass another dataset to main graph for rendering in red

This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data.  Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.

* Hook imported GA data to dashboard timeseries plot

* Add settings option to forget imported data

* Import sources from google analytics

* Merge imported sources when queried

* Merge imported source data native data when querying sources

* Start converting metrics to atoms so they can be subqueried

This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.

* Convery GA (direct) source to empty string

* Import utm campaign and utm medium from GA

* format

* Import all data types from GA into new tables

* Handle large amounts of more data more safely

* Fix some mistakes in tables

* Make GA requests in chunks of 5 queries

* Only display imported timeseries when there is no filter

* Correctly show last 30 minutes timeseries when 'realtime'

* Add with_imported key to Query struct

* Account for injected :is_not filter on sources from dashboard

* Also add tentative imported_utm_sources table

This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.

* Return imported data to dashboard for rest of Sources panel

This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.

* Clear imported stats from all tables when requested

* Merge entry pages and exit pages from imported data into unfiltered dashboard view

This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.

* Display imported devices, browsers and OSs on dashboard

* Display imported country data on dashboard

* Add more metrics to entries/exits for modals

* make sure data is returned via API with correct keys

* Import regions and cities from GA

* Capitalize device upon import to match native data

* Leave query limits/offsets until after possibly joining with imported data

* Also import timeOnPage and pageviews for pages from GA

* imported_countries -> imported_locations

* Get timeOnPage and pageviews for pages from GA

These are needed for the pages modal, and for calculating exit rates for
exit pages.

* Add indicator to dashboard when imported data is being used

* Don't show imported data as separately line on main graph

* "bounce_rate" -> :bounce_rate, so it works in subqueries

* Drop imported browser and OS versions

These are not needed.

* Toggle displaying imported data by clicking indicator

* Parse referrers with RefInspector

- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
  referrer host + path, whereas 'ga:source' includes utm_mediums and
  other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
  so they are manually checked for as RefInspector won't pick up on
  these.

* Keep imported data indicator on dashboard and strikethrough when hidden

* Add unlink google button to import panel

* Rename some GA browsers and OSes to plausible versions

* Get main top pages and exit pages panels working correctly with imported data

* mix format

* Fetch time_on_pages for imported data when needed

* entry pages need to fetch bounces from GA

* "sample_percent" -> :sample_percent as only atoms can be used in subqueries

* Calculate bounce_rate for joined native and imported data for top pages modal

* Flip some query bindings around to be less misleading

* Fixup entry page modal visit durations

* mix format

* Fetch bounces and visit_duration for sources from GA

* add more source metrics used for data in modals

* Make sources modals display correct values

* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration

* Merge imported data into aggregate stats

* Reformat top graph side icons

* Ensure sample_percent is yielded from aggregate data

* filter event_props should be strings

* Hide imported data from frontend when using filter

* Fix existing tests

* fix tests

* Fix imported indicator appearing when filtering

* comma needed, lost when rebasing

* Import utm_terms and utm_content from GA

* Merge imported utm_term and utm_content

* Rename imported Countries data as Locations

* Set imported city schema field to int

* Remove utm_terms and utm_content when clearing imported

* Clean locations import from Google Analytics

- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
  city data from GA.

* Display imported region and city in dashboard

* os -> operating_system in some parts of code

The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.

* to_atom -> to_existing_atom

* format

* "events" metric -> :events

* ignore imported data when "events" in metrics

* update "bounce_rate"

* atomise some more metrics from new city and region api

* atomise some more metrics for email handlers

* "conversion_rate" -> :conversion_rate during csv export

* Move imported data stats code to own module

* Move imported timeseries function to Stats.Imported

* Use Timex.parse to import dates from GA

* has_imported_stats -> imported_source

* "time_on_page" -> :time_on_page

* Convert imported GA data to UTC

* Clean up GA request code a bit

There was some weird logic here with two separate lists that really
ought to be together, so this merges those.

* Fail sooner if GA timezone can't be identified

* Link imported tables to site by id

* imported_utm_content -> imported_utm_contents

* Imported GA from all of time

* Reorganise GA data fetch logic

- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
  consider data to be imported.

* Clarify removal of "visits" data when it isn't in metrics

* Apply location filters from API

This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.

* Do not use changeset for setting site.imported_source

* Add all metrics to all dimensions

* Run GA import in the background

* Send email when GA import completes

* Add handler to insert imported data into tests and imported_browsers_factory

* Add remaining import data test factories

* Add imported location data to test

* Test main graph with imported data

* Add imported data to operating systems tests

* Add imported data to pages tests

* Add imported data to entry pages tests

* Add imported data to exit pages tests

* Add imported data to devices tests

* Add imported data to sources tests

* Add imported data to UTM tests

* Add new test module for the data import step

* Test import of sources GA data

* Test import of utm_mediums GA data

* Test import of utm_campaigns GA data

* Add tests for UTM terms

* Add tests for UTM contents

* Add test for importing pages and entry pages data from GA

* Add test for importing exit page data

* Fix module file name typo

* Add test for importing location data from GA

* Add test for importing devices data from GA

* Add test for importing browsers data from GA

* Add test for importing OS data from GA

* Paginate GA requests to download all data

* Bump clickhouse_ecto version

* Move RefInspector wrapper function into module

* Drop timezone transform on import

* Order imported by side_id then date

* More strings -> atoms

Also changes a conditional to be a bit nicer

* Remove parallelisation of data import

* Split sources and UTM sources from fetched GA data

GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:

1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source

* Keep prop names in queries as strings

* fix typo

* Fix import

* Insert data to clickhouse in batches

* Fix link when removing imported data

* Merge source tables

* Import hostname as well as pathname

* Record start and end time of imported data

* Track import progress

* Fix month interval with imported data

* Do not JOIN when imported date range has no overlap

* Fix time on page using exits

Co-authored-by: mcol <mcol@posteo.net>
  • Loading branch information
ukutaht and m-col committed Mar 10, 2022
1 parent b4992ce commit e27734e
Show file tree
Hide file tree
Showing 51 changed files with 3,648 additions and 475 deletions.
1 change: 1 addition & 0 deletions assets/js/dashboard/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ export function serializeQuery(query, extraQuery=[]) {
if (query.from) { queryObj.from = formatISO(query.from) }
if (query.to) { queryObj.to = formatISO(query.to) }
if (query.filters) { queryObj.filters = serializeFilters(query.filters) }
if (query.with_imported) { queryObj.with_imported = query.with_imported }
if (SHARED_LINK_AUTH) { queryObj.auth = SHARED_LINK_AUTH }
Object.assign(queryObj, ...extraQuery)

Expand Down
1 change: 1 addition & 0 deletions assets/js/dashboard/query.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ export function parseQuery(querystring, site) {
date: q.get('date') ? parseUTCDate(q.get('date')) : nowForSite(site),
from: q.get('from') ? parseUTCDate(q.get('from')) : undefined,
to: q.get('to') ? parseUTCDate(q.get('to')) : undefined,
with_imported: q.get('with_imported') ? q.get('with_imported') === 'true' : true,
filters: {
'goal': q.get('goal'),
'props': JSON.parse(q.get('props')),
Expand Down
49 changes: 38 additions & 11 deletions assets/js/dashboard/stats/visitor-graph.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import React from 'react';
import { withRouter } from 'react-router-dom'
import { withRouter, Link } from 'react-router-dom'
import Chart from 'chart.js/auto';
import { navigateToQuery } from '../query'
import numberFormatter, {durationFormatter} from '../util/number-formatter'
import * as api from '../api'
import LazyLoader from '../components/lazy-loader'
import * as url from '../util/url'

function buildDataSet(plot, present_index, ctx, label) {
var gradient = ctx.createLinearGradient(0, 0, 0, 300);
Expand Down Expand Up @@ -316,17 +317,19 @@ class LineGraph extends React.Component {

if (this.state.exported) {
return (
<svg className="animate-spin h-4 w-4 text-indigo-500 absolute -top-8 right-8" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4"></circle>
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
<div className="flex-auto w-4 h-4">
<svg className="animate-spin h-4 w-4 text-indigo-500" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4"></circle>
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
</div>
)
} else {
const endpoint = `/${encodeURIComponent(this.props.site.domain)}/export${api.serializeQuery(this.props.query)}`

return (
<a href={endpoint} download onClick={this.downloadSpinner.bind(this)}>
<svg className="absolute w-4 h-5 text-gray-700 feather dark:text-gray-300 -top-8 right-8" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round"><path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"></path><polyline points="7 10 12 15 17 10"></polyline><line x1="12" y1="15" x2="12" y2="3"></line></svg>
<a className="flex-auto w-4 h-4" href={endpoint} download onClick={this.downloadSpinner.bind(this)}>
<svg className="absolute text-gray-700 feather dark:text-gray-300" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round"><path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"></path><polyline points="7 10 12 15 17 10"></polyline><line x1="12" y1="15" x2="12" y2="3"></line></svg>
</a>
)
}
Expand All @@ -338,15 +341,36 @@ class LineGraph extends React.Component {

if (samplePercent < 100) {
return (
<div tooltip={`Stats based on a ${samplePercent}% sample of all visitors`} className="absolute cursor-pointer -top-8 right-14 lg:-top-20 lg:right-8">
<svg className="w-4 h-4 text-gray-300 text-gray-700" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<div tooltip={`Stats based on a ${samplePercent}% sample of all visitors`} className="cursor-pointer flex-auto w-4 h-4">
<svg className="absolute w-4 h-4 text-gray-300 text-gray-700" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth="2" d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
</div>
)
}
}

importedNotice() {
const source = this.props.graphData.imported_source;

if (source) {
const withImported = this.props.graphData.with_imported;
const strike = withImported ? "" : " line-through"
const target = url.setQuery('with_imported', !withImported)
const tip = withImported ? "" : "do not ";

return (
<Link to={target} className="w-4 h-4">
<div tooltip={`Stats ${tip}include data imported from ${source}.`} className="cursor-pointer flex-auto w-4 h-4">
<svg className="absolute text-gray-300 text-gray-700" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<text x="4" y="18" fontSize="24" fill="currentColor" className={"text-gray-700 dark:text-gray-300" + strike}>{ source[0].toUpperCase() }</text>
</svg>
</div>
</Link>
)
}
}

render() {
const extraClass = this.props.graphData.interval === 'hour' ? '' : 'cursor-pointer'

Expand All @@ -356,8 +380,11 @@ class LineGraph extends React.Component {
{ this.renderTopStats() }
</div>
<div className="relative px-2">
{ this.downloadLink() }
{ this.samplingNotice() }
<div className="absolute right-2 -top-5 lg:-top-20 lg:w-6 w-16 lg:h-20 flex lg:flex-col flex-row">
{ this.downloadLink() }
{ this.samplingNotice() }
{ this.importedNotice() }
</div>
<canvas id="main-graph-canvas" className={'mt-4 ' + extraClass} width="1054" height="342"></canvas>
</div>
</div>
Expand Down
3 changes: 3 additions & 0 deletions config/.env.dev
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ SHOW_CITIES=true
PADDLE_VENDOR_AUTH_CODE=895e20d4efaec0575bb857f44b183217b332d9592e76e69b8a
PADDLE_VENDOR_ID=3942

GOOGLE_CLIENT_ID=875387135161-l8tp53dpt7fdhdg9m1pc3vl42si95rh0.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=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

IP_GEOLOCATION_DB=/home/ukutaht/plausible/analytics/city_database.mmdb
5 changes: 3 additions & 2 deletions config/runtime.exs
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,8 @@ if config_env() == :prod && !disable_cron do
check_stats_emails: 1,
site_setup_emails: 1,
clean_email_verification_codes: 1,
clean_invitations: 1
clean_invitations: 1,
google_analytics_imports: 1
]

extra_queues = [
Expand All @@ -340,7 +341,7 @@ if config_env() == :prod && !disable_cron do
else
config :plausible, Oban,
repo: Plausible.Repo,
queues: false,
queues: [google_analytics_imports: 1],
plugins: false
end

Expand Down
18 changes: 18 additions & 0 deletions lib/plausible/clickhouse_repo.ex
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,22 @@ defmodule Plausible.ClickhouseRepo do
Ecto.Adapters.SQL.query!(__MODULE__, events_sql, [domain])
Ecto.Adapters.SQL.query!(__MODULE__, sessions_sql, [domain])
end

def clear_imported_stats_for(site_id) do
[
"imported_visitors",
"imported_sources",
"imported_pages",
"imported_entry_pages",
"imported_exit_pages",
"imported_locations",
"imported_devices",
"imported_browsers",
"imported_operating_systems"
]
|> Enum.map(fn table ->
sql = "ALTER TABLE #{table} DELETE WHERE site_id = ?"
Ecto.Adapters.SQL.query!(__MODULE__, sql, [site_id])
end)
end
end
Loading

0 comments on commit e27734e

Please sign in to comment.