All customers striving to be a data-driven business must utilize the most effective and efficient means to compute their business metrics and KPIs. The first challenge they will have to solve is to select the technology and techniques of how to compute those metrics and KPIs.
At Making Science it is the data engineers who propose the best approach for building the data pipeline and where, when, and how to calculate the metrics and KPIs. The data engineers take into consideration the stage of maturity regarding data and metrics stability and understanding to select the best approach to compute the metrics and KPIs. There are tradeoffs between implementation approaches for calculating the metrics and KPIs.
In businesses in their “early stages” of data analytics and data source understanding, many times the data analyst doesn’t have a clear definition of the metrics or KPIs that need to be computed. This means that the metric computations will experience numerous changes/revisions or even a complete re-definition. Businesses that are more mature in their businesses are not typically faced with these definition challenges and are usually more focused on metric/KPI latency challenges or metric/KPI transparency challenges as they look to expand new services and expand their businesses into new regions, etc.
The computation of metrics can be performed using programming languages embedded in the ETL data pipelines. The advantages of this approach include; is its efficiency in the use of computing resources and the speed of metrics compute execution. The disadvantages of the approach are; more effort to make a change to a metric computation, no transparency to offer data analysts visibility regarding a metric calculation.
An alternative approach for metrics computation is to use SQL and more specifically SQL Views. A View is just a predefined query to compute our metric and the query is given a name that can be treated in an “identical” manner as referencing a SQL table. The newly defined table (SQL View) may be used as any other table. The view will automatically execute the moment the view is queried. So what do SQL Views have to Offer?
Reduced time for metrics software development
Views are a pipeline mechanism that can be more easily built than writing Java, Python, or Scala. All that is needed is to define the SQL query on the data and name it. There is no need to deal with compiling, integrating, testing, uploading tasks for a pipeline, as it would be if it used a different programming language to build the pipeline.
Having the query that computes the metrics (count of users with repetitions), defining a View is quite simple and quick. Utilizing Google’s BigQuery even reduces the development time further by using the BigQuery Console to interactively test the SQL sentences for your SQL view. Once the query is executed and checks out ok save it as a View.
Reduced time to update a SQL View to compute metrics
Again with Google’s BigQuery console, a Data Analyst has a simple means to retrieve the Saved SQL View queries to copy and edit them as they desire. Making a copy 1st before making any changes is a good practice as the additional step offers a means to check your revised query results against your baseline results to ensure you have achieved the desired impact on your metric calculation.
Complex Analytical Queries
In BigQuery a SQL View supports complex analytical queries to compute metrics, such as scans on denormalized data models that contain Arrays and Maps. A SQL View in BigQuery can even include the application of machine learning models to enrich the data results in an SQL View.
Another very interesting functionality available is linking SQL Views together, building a data pipeline to generate complex metrics or KPIs that depend on intermediate metrics.
Metric consistency and transparency via a centralized metric computation logic
With the use of Views, the metric computation logic into one View can be centralized, meaning that it will serve that metric to many downstream consumers, such as dashboards, alert systems, operational analytics systems, and external data processing pipelines. This approach is a piece in solving Data Governance. If the metric logic needs to be changed, it is located in one place, it is not needed to share the logic with all consumer clients that require the metric.
The guaranteed freshness of metrics
Because a SQL View represents a query, it is computed every time the View is queried; this ensures metric computations are always up to date with the most recent underlying data included. This is extremely important as many data pipelines can experience delays in data collection processes for any number of reasons.
There is a significant disadvantage to using SQL Views for Metrics
Because SQL Views are evaluated and computations performed every time they are queried, all of the underlying data source details are read each time a metric is consulted. While this is good for the freshness of the metric data), it is not an efficient use of computing resources for metrics that will not change once their time has passed. Special care needs to be taken when defining the SQL Views and the underlying source tables to minimize the amount of data read each time the SQL View is executed. The BigQuery cache system will not always eliminate this performance and efficiency disadvantage in particular when data in an SQL View has changed.
Google compute platform offers some services and capabilities to address the downsides of using SQL Views in BigQuery for metrics calculations by eliminating the unnecessary scanning of data in BigQuery.
As explained, SQL Views can be more easily created and updated. The SQL Views allow computing simple and complex metrics and are great building blocks for data pipelines. Consequently, they are a guaranteed means to ensure transparency, consistency, and freshness of metrics.
At Making Science we have years of experience applying the architectural approach in many End to End Analytics projects helping companies move to a much more agile approach in managing data-driven business metrics. In our next blog we will review some of these advanced techniques for optimizing your SQL Views Centric Metrics Computation architecture; “Faster, Better and Cheaper Metrics using SQL Views and Advanced data warehousing services in BigQuery”.