-
Notifications
You must be signed in to change notification settings - Fork 0
Create auto updating sample_data
queries
#150
Comments
For CREATE OR REPLACE VIEW `httparchive.sample_data.pages_1k` AS (
SELECT
*
FROM
`httparchive.all.pages`
WHERE
date IS NOT NULL AND
date = (
SELECT
CAST(REGEXP_REPLACE(MAX(partition_id), r'(\d{4})(\d{2})(\d{2})', '\\1-\\2-\\3') AS DATE) AS date
FROM
`httparchive.all.INFORMATION_SCHEMA.PARTITIONS`
WHERE
table_name = 'pages' AND
partition_id != '__NULL__') AND
rank = 1000
); For @rviscomi what do you think about adding |
Could this view query the corresponding
Is it possible to set clustering on views? Or are you asking about clustering in the |
Yup! Done: CREATE OR REPLACE VIEW `httparchive.sample_data.pages_1k` AS (
SELECT
*
FROM
`httparchive.latest.pages`
WHERE
rank = 1000
); Seems to work.
You don't set clustering on views - you set it on the underlying table. Think of a view as just a shorthand that gets swapped in just before the query runs.
I'm asking about adding CREATE OR REPLACE VIEW `httparchive.sample_data.requests_1k` AS (
SELECT
r.*
FROM
`httparchive.latest.requests` r
JOIN
`httparchive.latest.pages`
USING (date, client, page)
WHERE
rank = 1000
); It still costs 1 TB to query this sample table - even for a simple If there was a rank on the |
Oh sorry I misread and thought you were asking about Adding |
Ah interesting. Then I don’t think we can have |
These could be views on the latest
all
tables - similar tolatest
tables in #141 but with a reduced dataset.The
rank=1000
websites might be a good fit here. Less random, but maybe that's a good thing?The text was updated successfully, but these errors were encountered: