The number of sources that produce semi-structured data has increased exponentially in recent years. The arrival of Snowflake Data Cloud has made it effortless to process complex datasets. Snowflake supports storing and processing semi-structured data. It supports semi-structured formats enlisted below:
This article is about the two features in Snowflake that come in handy for Data Engineers while handling semi-structured data.
A variant is a datatype which can hold semi-structured data in a single field. Snowflake stores semi-structured data in the column format when semi-structured data is loaded into a VARIANT column. A single row can contain other underlying data. Let’s demonstrate with the help of examples:
The stored data can be easily retrieved and structured from the above table by a simple query as below:
select
v:time::timestamp as observation_time,
v:city.id::int as city_id,
v:city.name::string as city_name,
v:city.country::string as country,
v:city.coord.lat::float as city_lat,
v:city.coord.lon::float as city_lon,
v:clouds.all::int as clouds,
(v:main.temp::float)-273.15 as temp_avg,
(v:main.temp_min::float)-273.15 as temp_min,
(v:main.temp_max::float)-273.15 as temp_max,
v:weather[0].main::string as weather,
v:weather[0].description::string as weather_desc,
v:weather[0].icon::string as weather_icon,
v:wind.deg::float as wind_dir,
v:wind.speed::float as wind_speed
from json_weather_data
Note: ‘v’ is the Field (Column) name, and the JSON tags are arranged alongside to retrieve datasets.
Compound values are flattened into multiple rows with the use of the Snowflake Flatten Command. A tool called Snowflake FLATTEN is used to transform semi-structured data into a relational structure. Relatively complex JSON structures such as nested JSON structures can be structured with FLATTEN function in Snowflake. Here is an example of the same:
The above JSON file stores a single set of conversations between two texters in a single row. If the requirement is to get every individual message as a separate row a FLATTEN function can be used to produce the following result:
(SELECT
b.value:message_date::TIMESTAMP AS Time_Of_Message,
b.value:conversation_number::STRING AS Conversation_num,
b.value:message:message_text:msg_txt::STRING AS Message,
ROW_NUMBER() OVER ( PARTITION BY b.value:dialogId::STRING ORDER BY
b.value:message_date::TIMESTAMP ASC) as Message_order
FROM customer_messages t,LATERAL FLATTEN(input => t.v) b);
The ability of Snowflake’s assistance is undeniable. It aids you to make better decisions and offers a better overall experience when attempting to get the most out of your data, through features like Flatten and Variant. If you want to be a part of Snowflake’s incredible platform, let us help you. Beinex’s partnership with Snowflake enables us to offer you advanced features like automated tuning and elastic computing, along with analytics modernization services, to help your organisation realise exponential Return on Investment.