Hector Cirbián

Hector Cirbián

SW Architect
Related topics: BigQuery Google Sheets

Connect Google Sheets with BigQuery for Fun and Profit

Thursday August 19th, 2021
2 minutos

(or: Saving $$$ doing Invoice Calculation and Generation using Google BigQuery and Google Sheets).

Since VisiCalc and Lotus 1-2-3 popularized spreadsheet programs, they have become an ubiquitous tool. Spreadsheet’s tabular format and dynamic calculations make them a perfect fit for accounting and data analysis. But like multifunctional “swiss army knives”, spreadsheets have some rough edges. 

Many of these rough edges, such as the lack of concurrent editing, no access authorization, or data protection, are solved with SaaS spreadsheets, such as Google Sheets. However these cloud-based spreadsheets also have their shortcomings when handling huge amounts of data and, even when the analyzed data fits in their limits, the fact that they are not in sync with the data source leads to the problem of “multiple versions of the truth”.

The solution would be being able to connect cloud spreadsheets with Big Data sources. A ‘holy grail’ that would allow overcoming these shortcomings and provide a secure and reliable way of analyzing data, and even generating customer quality level documents. And that is exactly what Google Connected Sheets does. This feature allows connecting Google Sheets with BigQuery datasets.

 

Google Connected Sheets Tips & Tricks

What is needed to use this feature? Access to Google Cloud Platform and Enterprise level accounts. Users also need access permissions to BigQuery datasets and tables to be able to fully use this feature. 

Without access permissions to BigQuery, it’s not possible to connect new spreadsheets to them, and if already connected spreadsheets are shared with users without these access permissions to BigQuery sources, they will be able to use Sheets features, but won’t be able to refresh BigQuery data in the sheet or schedule these refreshes, as explained by this Google help article.

There is no point in describing in this post how this connection is done, as Google documentation does that already. But there are several tips worth sharing here.

First of all, is the use of parameterized queries. The most straightforward way of using Connected Sheets is creating a direct connection to a single Big Query table, and so, to all its rows. 

But, perhaps we need some level of aggregation for the data we fetch? Or perhaps the data must come from several different tables? Or, even if all aggregation and analysis calculations are going to be done in the spreadsheet, maybe we don’t need all rows, just a specific range? Or perhaps we need all of these? 

In all of the cases aforementioned, the Write Custom Query option is what needs to be used. The Write Custom Query option allows you to specify which data must be fetched from BigQuery using an SQL query. 

As well as the Write Custom Query option, for further flexibility, it’s possible to parametrize your query with values from selected cells in the spreadsheet. 

In the image below we can see an example of a parameter configuration:

You can pick a cell in any of the existing sheets to create a named parameter. The named parameter can then be referenced in the query using the “@PARAMETER_NAME” notation. In the example we are showing above, we have named the parameters @START_DATE and @END_DATE and they have been related to cells C2 and D2 respectively.

(Note: when creating the query parameter, the selected cell must already contain a value). Once the value of any of these parameters is changed, the cells that depend on the affected connection query can be updated by just clicking their popups refresh text. Or go to the Data -> Data connectors -> Refresh options to update a specific or just all connections in the spreadsheet.

A very important factor related to query parameters is to know how tables are partitioned and clustered in BigQuery. To make a query more efficient, it is important to use, if possible, a where clause, using columns that the searched table uses for partitioning or clustering. This allows the BigQuery query engine to restrict the data it needs to scan in order to process the query and thus improving performance. 

The use of partitioning or clustering not only improves BigQuery usage performance but IMPORTANTLY reduces costs; as Google charges for BigQuery usage based on the amount of scanned data, a wise use of partitions and clusters can reduce costs, a lot. 

 

The lesson here is that when linking Sheets with BigQuery, try to use partitioning and clustering as much as possible. If the data tables are given and it’s not possible to change its schema, try to use their partition and cluster columns as parameters for the spreadsheet connecting queries. If you can design the tables schemas, set their partitioning and clustering based on the parameters the spreadsheet requires. That way, BigQuery’s performance, and cost will be optimal.

Another interesting option related to Connected Sheets is Scheduled refresh. It allows you to set the refresh timing and frequency of connected sheets and related objects. If we add to the mix the Publish to the web option, then Connected Sheets become a way to create quite capable web applications.

 

Invoicing; an Interesting Use Case

Google documentation lists Business planning, Customer service, Sales reports as possible use cases for Connected Sheets. We found another possible use: Invoicing

This is especially salient when the source for such invoices is millions of data rows, for example, for large accounts in the telecom sector. These kinds of customers usually have invoicing requirements that are very different from normal subscribers and require a high grade of flexibility. 

Manually invoicing shouldn’t be an option, as it is prone to errors and mistakes. Connected Sheets are a very handy tool to use in this case. They are highly configurable and flexible, which allows them to conform to the customer billing requirements and once the spreadsheet is set up, invoice generation becomes an automatic or semi-automated process where human errors are taken out of the scene.

Also, the fact that the invoicing data comes from BigQuery and is available to other tools, like Data Studio, makes it possible to give the customer additional information related to their invoices, such as reports and dashboards. All this with the confidence that, as all of them use the same Single Source of Truth, they are totally consistent with each other.