Google Looker Studio: How to extract data from a JSON string with REGEXP_EXTRACT

In some cases it happens that you need to collect several data points into one single field. In this article I’ll share with you how to extract such data from a JSON string with Google Looker Studio. Key benefit is that you have maximum flexibility to access all data points contained in such a string. To achieve this, In the following how to, I’ll show how to extract data with the Google Looker Studio regular expression function called „REGEXP_EXTRACT“.

1. Problem & challenge

For one of my clients, we are collecting a huge amount of data within a single event in Google Analytics. It’s a results data set of a configurator tool. For several reasons, we decided to send a complete JSON string with all configurator results into one event data point.

Example of a JSON string from which single values will be extracted for use in Google Data Studio

Example of a JSON string from which single values will be extracted for use in Google Looker Studio

As a next step, we had the challenge to unravel that data from that JSON string again. From this, we came up with the solution introduced below.

Here’s how we did it before developing this new approach. We worked with the function „CONTAINS_TEXT“ and the following looker studio resource custom field construct:

Google Looker Studio - Data source field with CONTAINS_TEXT function

Google Looker Studio – Data source field with CONTAINS_TEXT function

But there are some constraints coming with this solution:

  • Only a few parameter values can be added the manual way. In case of above shown example of cities‘ names, one can imagine there are way more cities that users can come from.
  • Use the data as dimension, not as metric. Thus, this is not so flexible.
  • Couldn’t extract values with many different values. (Show example of 0.12345 value)

2. Solution & outcome

A great way to accomplish this is Google Looker Studios Regular Expressions function: REGEXP_EXTRACT.

You can find Google’s documentation of the function here.

The following outcomes come with that approach:

  • More flexibility
  • collect *all* values contained in the JSON string
  • Do calculations with numbers coming from a JSON string

Two more annotations before we start:

    • This also works with any other format in which data is packed into a single field. For example comma separated values (CSV).
    • This article is not about how to connect a JSON data source to Google Looker Studio. To this topic others have already delivered solutions like Garrett Vorce: „Google Looker Studio Tutorial with JSON“ on Medium.

So, let’s now have a look into the details, how to achieve this:

3. How to extract data from a JSON string with the Regular Expression function „REGEXP_EXTRACT“

3.1 Pre-Requesites: Setup a Google Looker Studio dashboard and connect your data source

You can read at google.com how to accomplish that: Click here.

3.2 Add a new data source field

  1. In Google Looker Studio, click in the menu on „Resource“
  2. Then click on „Manage added data sources“
  3. Find your data source
  4. Click on „Edit“
  5. Click on „(+) Add a field“

3.3 Use regexp_extract to extract the data out of the JSON string

3.3.1 Extract a text string from JSON – with a Regular Expression

    1. Now, in the „Add a field“ dialogue, first give your new field a name.
      In our example, we want to extract the information „city“ from the JSON string. So we can call the field i.e. „City (JSON)“.
    2. Then it’s time to enter the formula: REGEXP_EXTRACT(Ereignisaktion,'"city":"((\"|[^"!,])*)"'

      Google Looker Studio - Data source field with REGEXP_EXTRACT function

      Google Looker Studio – Data source field with REGEXP_EXTRACT fun

  • So now, what does this formula do in detail? We’ll have a look:
    1. With  REGEXP_EXTRACT the function gets called
    2. Ereignisaktion is the (german) name of the Analytics event tracking data field, in which the JSON string is stored. (In english it would be Event action.)
    3. The key part now is the regular expression (also called „regex“ or „regexp“): '"city":"((\"|[^"!,])*)"'. It does the following:
      • The two '  signs mark beginning and end of the „regex“.
      • "city":" are the signs that are directly before the searched string (here the cities‘ names) that we want to extract.
      • With the part ((\"|[^"!,])*)" the searched string (the city name) gets collected into our newly added field.

There are different kinds of data that could be extracted:

3.3.2 Extract numbers from JSON

You want to extract numbers from JSON? This is also possible.

  1. You can do this with following regular expression (regex) formula. Example here is a Longitude definition named lon:
    REGEXP_EXTRACT(Ereignisaktion,'"lon":(([^"!,])*)')
    With this regex formula we do not search for a string in brackets, but for a string without brackets. This is like numbers are defined in JSON.
  2. To be able to calculate with this new metric also, you will need to change the data fields type to „number“:

    Change data source field to type

    Change data source field to type „number“

3.4 Add a new chart to your Looker Studio Dashboard with the new data source field

Now you can add the new data to your Google Looker Studio dashboard:

Data visualization with JSON extracted data via REGEXP_EXTRACT function

Data visualization with JSON extracted data via REGEXP_EXTRACT function

Subscribe to my news & blog updates

Your data will not be disclosed to third parties. More about this in the privacy policy. Of course, you can unsubscribe at any time. 

4. Some final words

In case you have any questions or annotations, please leave a comment.

About me: My name is Thomas Gerstmann. I support organizations to understand their users & data, so they can reach their goals online better. My focus is more on data-driven strategy and analytics & research as well as Experimentation. But I also work often – like documented here in this article – on technical questions. My goal is to help reaching #ZeroHunger.

Recent posts in “Datadriven Marketing”