Skip to main content
Version: v2

Google Cloud Storage

Sync Expectations

The following table describes the sync behavior of Fullstory events to Google Cloud Storage in parquet files.

Sync IntervalRelevant TimestampsSetup Guide
Approximately processed_time rounded to the next hour + 1 hourevent_time
ingested_time
processed_time
updated_time
Help Doc

Events Parquet File Schema

The following table describes the schema for the parquet files containing Fullstory events.

FieldTypeNullableDescription
event_idstringNThe unique identifier for the event.
event_timestringYThe time in UTC that the event occurred, formatted as yyyy-MM-ddTHH:mm:ss.SSSSSSZ.
processed_timestringNThe time in UTC that the event was packaged in the parquet file, formatted as yyyy-MM-ddTHH:mm:ss.SSSSSSZ.
updated_timelongYIf set, the time when the parquet file was recreated represented by the number of milliseconds since the epoch.
device_idlongNThe device ID as defined in the base event model.
session_idlongYThe session ID as defined in the base event model.
view_idlongYThe view ID as defined in the base event model.
event_typestringNThe type of this event.
event_propertiesstringNA json string containing the associated event properties.
source_typestringNThe source type of the event.
source_propertiesstringNA json string containing the associated source properties.

Events Parquet File Path

The path is following the hive partitioning layout (key=value, supported by BigQuery, etc.) that contains an ingested_time column: fullstory_<org_id>/events/ingested_time=yyyy-MM-dd HH:mm:ss/<file_name>.parquet.

The ingested_time column is the formatted time in UTC that the events are ingested by Fullstory's servers, truncated to seconds. Note that the ingested time does not imply the range of the event time in a parquet file, use a larger range for the ingested time to make sure the target event time range is included in a query.

Querying as an External Table

The parquet files containing events can be queried as an external table. For example, a BigQuery external table can be created by setting the path of files as fullstory_<org_id>/events/*. The ingested_time column should be auto detected and used in queries to limit the number of parquet files scanned.

When querying the JSON fields in BigQuery, you should favor parsing with PARSE_JSON in a subquery to avoid the overhead of running JSON_EXTRACT_VALUE which combines a parse and then access via path.

To demonstrate how to query the events as an external table, the following example shows how to count the number of rage clicks broken down by browser and URL for a single day.

SELECT
JSON_VALUE(source_properties, '$.user_agent.browser') AS browser,
JSON_VALUE(source_properties, '$.url.path') AS path,
COUNT(1) AS rage_clicks
FROM (
SELECT
event_id,
event_time,
ingested_time,
processed_time,
updated_time,
device_id,
session_id,
view_id,
event_type,
PARSE_JSON(event_properties, wide_number_mode=>'round') AS event_properties,
source_type,
PARSE_JSON(source_properties, wide_number_mode=>'round') AS source_properties
FROM
`project.dataset.external_table` ) parsed
WHERE
ingested_time BETWEEN '2024-01-01 00:00:00' AND '2024-01-03 00:00:00'
AND event_time >= '2024-01-01T00:00:00.000000Z' AND event_time <= '2024-01-01T99:99:99.999999Z'
AND event_type = 'click'
AND CAST(JSON_VALUE(event_properties, '$.fs_rage_count') AS INTEGER) > 0
GROUP BY
browser,
path
ORDER BY
rage_clicks DESC;