While Fivetran’s Salesforce connector offers a simple data ingestion solution, formula fields can be tricky. Here are three ways to handle them.
Salesforce is widely used in customer relationship management for marketing automation, analytics, and sales. This cloud-based software has a variety of built-in entities, but a large part of Salesforce’s popularity is due to its flexibility. For example, Salesforce allows the definition of custom fields, or fields where some logic is executed to calculate the value. These latter fields are usually referred to as formula fields or calculated fields. But, despite being a powerful tool, formula fields can cause quite a lot of trouble during data extraction.
There are multiple integration services available to ingest Salesforce’s data into a data warehouse. We chose Fivetran as our ELT platform, because of its connector’s reliability and simplicity. But by default the Fivetran Salesforce connector does not handle formula fields.
To understand the problem here, first we need to take a look at how the Fivetran Salesforce connector works under the hood. Then we will check out the recommended solution and finally see what we actually implemented.
How does the Fivetran Salesforce connector work?
During the connector’s setup, Fivetran needs the credentials of an active Salesforce account to make API calls. The platform tries to optimize for the number of calls; it swaps between the BULK and REST API as it sees fit and it also cleverly handles quotas.
When Fivetran queries for updates, it finds changes based on the Salesforce object’s SystemModStamp field. But the value of a formula field is calculated based on a statement expressed in Salesforce’s own query language, SOQL. It can even refer to another object’s fields, similar to lookup fields. When the SOQL formula or the other, referred object changes, the SystemModStamp is not updated, so Fivetran does not know anything about what happened. Even though the formula field’s value is different, it will not get imported.
To avoid data integrity issues, Fivetran disables formula field syncing by default, so they do not even appear on its user interface (UI). But this does not mean there is no way to load them into the destination.
1. Enable Salesforce formula field syncing in Fivetran
A very simple solution is to reach out to Fivetran support and ask them to enable formula field syncing. They usually handle this within a couple of hours, after which the formula fields will be available on the Fivetran UI; they’ll load when the next job runs. But later, if the referred data or the formulas change, the imported values become inaccurate.
This still can be a valid option in a few cases, where data freshness is not an important factor or when the data changes very slowly. The data integrity problem can be fixed by a full Fivetran resync, which will load the current value of the formula fields. But this is certainly not the recommended or the most elegant way to handle Salesforce formula fields in Fivetran.
2. Fivetran’s solution for Salesforce formula fields
During its work, the Fivetran connector creates a few metadata or system tables that are stored in the destination database. One of them is fivetran_formula, which contains information about the Salesforce formula fields, like the object and field name or the SOQL statement itself and its translation specific to the destination database - in this case BigQuery Standard SQL. To detect formula changes, Fivetran also creates a sibling table, the fivetran_formula_history.
The translated formulas can be used with the Fivetran Salesforce Formula Utils dbt Core package to create views in the destination database. This is the solution recommended by Fivetran, but there are a few drawbacks.
It is far more complex to use the data build tool (dbt), than to set up the connector on the Fivetran UI. The macros in the package require some configuration to accurately map formula fields to destination tables.
Another problem is that Fivetran’s SQL translation capabilities are limited. For example, SOQL has a few functions that it cannot translate. It’s also not possible to include global variables in views. At the time of writing, there is quite a lot that Fivetran cannot handle and since our business case heavily relies on things that fall outside the limitations, we had to support Fivetran’s connector with a custom solution.
3. Handle Salesforce formula fields without Fivetran
Before we started any kind of implementation, we considered two very different options. One was to translate the SOQL queries themselves manually to Standard SQL, and the other was to create a data pipeline that imports the changes. Both alternatives have their pros and cons.
Most of our formulas were simple; translating them would have been quick and easy, but this solution was still not able to handle changing Salesforce global variables. Not to mention that whenever the administrators in Salesforce changed the formulas, we would have had to also mirror that change with manual maintenance of our translations.
We decided to implement an Airflow pipeline to import the Salesforce formula field values every day, since this solution is more robust and requires less maintenance. We already had Airflow instances and 24 hours of data freshness was acceptable. At first we ran into timeouts from the Salesforce REST API, but the Bulk API was able to handle this volume of data without a hiccup.
The pipeline first queried the Salesforce REST API for every entity available to the Fivetran user. Then it selected all the formula fields and checked on the Bulk API if their values had changed in the last 24 hours. The intermediate results landed in Google Cloud Storage in new line delimited JSON format. In the next step they were imported into our BigQuery salesforce_formula table.
The final result contained the Salesforce entity name, the entity id, the formula field name, a timestamp, and the most recent value. We used this table to generate views, where we joined the tables imported by Fivetran to the salesforce_formula on the entity id and we selected all the fields from Fivetran and the most recent values from the formula table.
Final thoughts on Salesforce formula fields imports
We explored three possible ways to extract and load Salesforce formula fields.
The first option was to contact Fivetran support and enable formula field syncing, then reimport all the data whenever the data integrity issues became unbearable. This solution is trivial to implement, but if the data freshness requirement is high, and the data volume is huge, there is a good chance that the Fivetran jobs will timeout, use up quotas, or not finish on time.
Instead of this, Fivetran recommends using its SOQL translations in the fivetran_formula system table to generate views in your target database using the Core dbt Salesforce Utils package. This solution is maintainable and efficient, but Fivetran is sadly not able to translate every formula.
If you absolutely have to import formulas that Fivetran cannot translate, then it is also possible to support Fivetran’s solution with one of your own, where you create an Airflow pipeline that extracts the changed formula values from Salesforce and loads them into your choice of sink.