A workflow is a series of steps that are combined to achieve a certain task. In terms of Analytics, these can be looked at as something that supports the process of Data Engineering, ETL, Data manipulation, and building any sophisticated analytical models like those related to Machine Learning on the data.
Smart Workflow: A smart workflow is the one which once created is fully automatable, scalable, and caters to a wide variety of tasks and audience. For e.g., let us assume each month a business user gets data from 10 data sources, combines them, and performs some data wrangling on the combined data. It looks like the diagram given below:
The entire process if fully automated and run each month with minimal or zero user intervention would qualify as a smart workflow.
Smart workflows are an interesting concept and can be considered the future of Analytics since it serves the purpose of a lot of stakeholders including IT team members, business users etc. For a business user these workflows allow them (the actual data owners) to have full control of their data. For IT teams, this is a boon since it encourages automation and reduces dependency on their teams for day-to-day business tasks.
Creating smart workflows is easy now with a lot of self-serve analytics tools in the market like Alteryx, Tableau Prep, Knime, Talend etc.
I would say this has come more from the Enterprises realising the value of data and what it brings to table in terms of effective decision making. If we take an example of a Sales Forecast, it used to be totally intuitive and based on business sense a few years ago. These forecasts ended up being quite far from actuals a lot of times. Forecasts that used to be almost 50% of actuals were considered good. But with data in conjunction with sophisticated Statistical methodologies, forecasts are quite accurate in today’s world seeing an accuracy of almost 100%. Organisations have realised this, and most of the enterprises today use Advanced Analytics for Forecasting. Similarly, across the entire decision-making cycle organisations are realising and seeing the value of what data has to offer in real terms.
Major challenges which I have seen and how I overcame these would be:
Convincing stakeholders on the value data can bring to the ecosystem: Divided the customer journey into sprints and aimed initially for the use cases that have a quick turnaround time and added value. Once the client sees, what is achievable with their own data, they are convinced, and this helps us to carry on smoothly with the rest of the journey.
Making decision makers trust the data at hand: This challenge is largely handled by taking all concerned stakeholders onboard and having a robust Data Quality framework in place. Not only do we present the data insights at an overall level but also the Data checks done at different stages to arrive at those insights. This increases the confidence of the stakeholders, concerned and they start to believe in the data at hand.
For organisations in the initial stages of the data journey, realising the value of the data proposition can be a time-consuming affair. Convincing stakeholders and making them realise the value has been a major challenge quite a few times: This challenge can also be handled in a similar fashion the first one is. Target use cases that are easy to achieve and show the client value in those. Many a time Analytics professionals try to show something fancy like Artificial Intelligence in the initial stages of an organisations’ data journey. This must be avoided since Analytics adoption is a gradual process.
At times, Technology adoption by clients has been a major bottleneck due to cost and unavailability of skilled resources on technologies.: This is closely related to showing the client value in what they are investing into. Once clients see the value, they are happy to bear the cost of the technology at hand.
There can be no one single answer to this question. It totally depends on the Analytics use case at hand. But to give a little context to the same Enhanced hardware configuration, increase in speed of data flow and software that allow data manipulation with minimal efforts can be called as drivers of the quick data processing and delivery. A few examples of the same could be data from stock market where near real time information gathering and processing happens. This can be attributed to a technique called ‘Web Scrapping’. The technology enablers for this are many but we can consider scripting languages like R, Python for the same. Another example on similar lines is of data generated from IoT devices and information from it. Technology enablers in this case would be IoT devices, smart devices and high-speed available internet.
Data comes in different forms and shapes. It can be classified into major groups like:
Structured data: This is the most common type of data available in the Analytics world today. It accounts for nearly 80% of the total data at hand. This data is in the form of tables stored in a database with a certain hierarchy. It has a strict predefined format for storage.
Unstructured data: This is another data type which has got traction in the recent times due to vast amount of information available in it. Data coming from ‘Comments’, ‘User reviews’, ’Product reviews’ etc. is one form of unstructured data. Data from audio and video files, images, text files, word docs etc is another form of unstructured data. Unstructured data is a conglomeration of many varied types of data that are stored in their native formats.
Flexible data stores are the ones which allow storing of any type of data whether structured or unstructured. It finds vast applications in multiple areas.
Financial institutions use the same to combine data from multiple sources to create a unified view of their customer.
Retail giants use a lot of information in reviews and comments left by users to improve their products and services.
Call centres use data from audio files to rate employees.
Use of video analytics by retailers to find Out-of-stock items in their inventory.
Data operating model is the way in which data flows within the organisation. Let us take the example of an organisation that collects data using surveys and performs analysis on it. The entire data flow process from sending out surveys, capturing data from surveys, data wrangling to performing analysis on this data can be referred to as data operating model.
It is important as it helps to break through the organisational and technical silos within a business. It builds upon the business model and addresses how data is being treated across organisational processes, all the way from data collection, cleansing and enrichment to the sharing and use of data. Having the right data operating model in place which breaks information silos and reduces dependency on certain individuals or teams are critical elements in the making of a data driven organisation.