-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create auto updating sample_data
queries
#150
Comments
For CREATE OR REPLACE VIEW `httparchive.sample_data.pages_1k` AS (
SELECT
*
FROM
`httparchive.all.pages`
WHERE
date IS NOT NULL AND
date = (
SELECT
CAST(REGEXP_REPLACE(MAX(partition_id), r'(\d{4})(\d{2})(\d{2})', '\\1-\\2-\\3') AS DATE) AS date
FROM
`httparchive.all.INFORMATION_SCHEMA.PARTITIONS`
WHERE
table_name = 'pages' AND
partition_id != '__NULL__') AND
rank = 1000
); For @rviscomi what do you think about adding |
Could this view query the corresponding
Is it possible to set clustering on views? Or are you asking about clustering in the |
Yup! Done: CREATE OR REPLACE VIEW `httparchive.sample_data.pages_1k` AS (
SELECT
*
FROM
`httparchive.latest.pages`
WHERE
rank = 1000
); Seems to work.
You don't set clustering on views - you set it on the underlying table. Think of a view as just a shorthand that gets swapped in just before the query runs.
I'm asking about adding CREATE OR REPLACE VIEW `httparchive.sample_data.requests_1k` AS (
SELECT
r.*
FROM
`httparchive.latest.requests` r
JOIN
`httparchive.latest.pages`
USING (date, client, page)
WHERE
rank = 1000
); It still costs 1 TB to query this sample table - even for a simple If there was a rank on the |
Oh sorry I misread and thought you were asking about Adding |
Ah interesting. Then I don’t think we can have |
These could be views on the latest
all
tables - similar tolatest
tables in #141 but with a reduced dataset.The
rank=1000
websites might be a good fit here. Less random, but maybe that's a good thing?The text was updated successfully, but these errors were encountered: