BigQuery table performance loss with TABLE_QUERY -
having surprising performance hit when querying multiple tables, vs. 1 big one. scenario:
we have simple web analytics tool based on bigquery. track basic events individual sites. month, pumped data 1 big table. now, breaking data partitions site , month.
so big table [events.all]
now have, say, [events.events_2014_06_siteid]
querying individual tables group is faster, , processing less data. querying our entire dataset much, slower on simple queries. , our new dataset 1 day old, whereas big table 30 days old, it's slower despite querying far less data.
for example:
select count(et) [events.all] et='re'
--> completed in 3.2s, processing 79mb of data. table has 21,048,979 rows.
select count(et) ( table_query(events, 'table_id contains "events_2014_"') ) et='re'
--> completed in 44.2s, processing 1.8mb of data. put together, these tables have 492,264 rows.
how come occurs, , there way resolve big disparity?
Comments
Post a Comment