Snowflake: Handling semi-structured data with Variant and Flatten

Snowflake: Handling Semi-structured Data with Variant and Flatten

There are instances in which you really get overwhelmed by tons of data that you may spend hours exploring, finally leaving you with heaps of unresolved data sets. Companies have been handling giant, complex data ecosystems and pipelines to transform complex data sets in semi-structured formats like JSON, XML, etc. All these intricate data sets can be converted into manageable and insightful data formats by adopting cloud-based technology. Snowflake’s out-of-the-box features to handle complicated semi-structured data formats with variant and Flatten help to handle the demanding needs of enterprises.
23 Feb 2023
admin

    Interested in the article or the service offering? Get in touch with us:


    Semi-structured Formats with Snowflake Support

    The number of sources that produce semi-structured data has increased exponentially in recent years. The arrival of Snowflake Data Cloud has made it effortless to process complex datasets. Snowflake supports storing and processing semi-structured data. It supports semi-structured formats enlisted below:

    • • JSON
    • • Avro
    • • ORC
    • • Parquet
    • • XML

    This article is about the two features in Snowflake that come in handy for Data Engineers while handling semi-structured data.

    • • Variant
    • • Flatten

    Variant

    A variant is a datatype which can hold semi-structured data in a single field. Snowflake stores semi-structured data in the column format when semi-structured data is loaded into a VARIANT column. A single row can contain other underlying data. Let’s demonstrate with the help of examples:

    The stored data can be easily retrieved and structured from the above table by a simple query as below:
    select
    v:time::timestamp as observation_time,
    v:city.id::int as city_id,
    v:city.name::string as city_name,
    v:city.country::string as country,
    v:city.coord.lat::float as city_lat,
    v:city.coord.lon::float as city_lon,
    v:clouds.all::int as clouds,
    (v:main.temp::float)-273.15 as temp_avg,
    (v:main.temp_min::float)-273.15 as temp_min,
    (v:main.temp_max::float)-273.15 as temp_max,
    v:weather[0].main::string as weather,
    v:weather[0].description::string as weather_desc,
    v:weather[0].icon::string as weather_icon,
    v:wind.deg::float as wind_dir,
    v:wind.speed::float as wind_speed
    from json_weather_data
    Note: ‘v’ is the Field (Column) name, and the JSON tags are arranged alongside to retrieve datasets.

    Flatten

    Compound values are flattened into multiple rows with the use of the Snowflake Flatten Command. A tool called Snowflake FLATTEN is used to transform semi-structured data into a relational structure. Relatively complex JSON structures such as nested JSON structures can be structured with FLATTEN function in Snowflake. Here is an example of the same:

    The above JSON file stores a single set of conversations between two texters in a single row. If the requirement is to get every individual message as a separate row a FLATTEN function can be used to produce the following result:

    Here is the query to generate the above table from semi-structured data:

    (SELECT
    b.value:message_date::TIMESTAMP AS Time_Of_Message,
    b.value:conversation_number::STRING AS Conversation_num,
    b.value:message:message_text:msg_txt::STRING AS Message,
    ROW_NUMBER() OVER ( PARTITION BY b.value:dialogId::STRING ORDER BY
    b.value:message_date::TIMESTAMP ASC) as Message_order
    FROM customer_messages t,LATERAL FLATTEN(input => t.v) b);

    Summing Up

    The ability of Snowflake’s assistance is undeniable. It aids you to make better decisions and offers a better overall experience when attempting to get the most out of your data, through features like Flatten and Variant. If you want to be a part of Snowflake’s incredible platform, let us help you. Beinex’s partnership with Snowflake enables us to offer you advanced features like automated tuning and elastic computing, along with analytics modernization services, to help your organisation realise exponential Return on Investment.

    Tags: