Data/Performance Analytics

Azure, Data/Performance Analytics, Decision Making, Expertise Data Warehousing, SQL

Analyze at the speed of thought with Azure Synapse

Azure Synapse Analytics is an infinite analytics service that unites data integration, enterprise data warehousing, and big data analytics. Using either server-less or dedicated resources at scale, you have the freedom to query data your way. Here we will be comparing the performance at different scales.   Azure Synapse Performance Comparison   In order to compare the performance of different data volume tables with different scale, we have below set of tables/views: [dbo].[Sales_Order_2Years] [dbo].[Sales_Order_4Years] [dbo].[Sales_Order_6Years] [dbo].[Sales_Order_8Years] [dbo].[Sales_Order_10Years] Process of Scaling the Azure Dedicated SQL Pool   Default Azure Dedicated SQL pool is set on DW100c specification. To change the scaling on the higher specification, we can click on the scale button as shown in the below screenshot of the Azure Dedicated Pool window of Azure Portal.       In the above screenshot, we have changed the scale to DW200c, in the highlighted section it is showing all the pricing details. Once, change clicked on the save button on the same window to set the specification to DW200c.   SQL used in the process of performance analysis   1. Select count(*) from Sales_Order_2Years;   2. Select DISTINCT Part_Code from Sales_Order_2Years;   3. Select sum(cast(Month_Primary_Amount as float)) from Sales_Order_2Years;     4. Select sum(cast(Month_Primary_Amount as float)) ,Part_Code from Sales_Order_2Years group by Part_Code ;     5. Select sum(cast(Month_Primary_Amount as float)),Part_Code ,Organization_Name from Sales_Order_2Years group by Part_Code, Organization_Name;     6. Select sum(cast(Month_Primary_Amount as float)),Part_Code FROM Sales_Order_2Years group by Part_Code order by Part_Code DESC ;   7. Select sum(cast(Month_Primary_Amount as float)),sum(cast(Primary_Quantity as float )),Part_Code,Organization_Name From Sales_Order_2Years group by Part_Code,Organization_Name order by Part_Code desc;   Similarly, executed queries with other tables having different volumes.   Performance Comparison on Difference DP Specifications Here following are the comparison among the different DP Specifications with different data sets of Volume in millions.   Conclusion: The above comparison clearly shows that when there is a large volume of data (in Millions) and the specification is higher on the dedicated Azure Pool, the performance increases. However, it doesn’t get better on low data volume whether the specification is higher or lower.   Scale Up Dedicated SQL Pool using Pipeline   Note: To allow the Synapse workspace to call the REST API, we need to give the Synapse workspace access to the SQL Pool. In the Access control (IAM) of the SQL Pool, assign the contributor role to your Synapse Workspace. Now, to scale up the SQL Pool using pipeline, we need to create a pipeline and set the parameters as below:   Above are the generic Parameters used within the Pipeline. Subscription Id: Subscription Id of Synapse Workspace ResourceGroupName: Name of the ResourceGroup of the used Synapse Workspace   Region: Location of the Synapse Workspace ServerName: Synapse Workspace Name DatabaseName: Name of the dedicated SQL Pool Command: Resume performance level: To level which we need to ScaleUp   Then in the pipeline, we need a Web Activity which will get the status of the SQL Pool whether it is Pause or Resume, an IF Condition which will check if the status is Resume or not, and again a Web Activity which will then Scale Up the SQL Pool.     Here is the setting of check status web activity:   Here is the URL value: @concat(‘https://management.azure.com/subscriptions/’,pipeline().parameters.SubscriptionId,’/resourceGroups/’,pipeline().parameters.ResourceGroupName,’/providers/Microsoft.Synapse/workspaces/’,pipeline().parameters.ServerName,’/sqlPools/’,pipeline().parameters.DatabaseName,’?api-version=2019-06-01-preview’)   Method = GET Resource =https://management.azure.com/   Now in If Condition, we will check if the status is not Resume and if it is not i.e. condition is true then in its Activity we will resume the status by assigning a Web Activity for it.   Here the expression in IF Condition is : @and(equals(activity(‘Check Azure Synapse Analytics Status’).output.properties.status,’Paused’), equals(pipeline().parameters.Command,’Resume’)) To add Web Activity we need to click on the pencil icon besides True as below and drag a Web Activity to that window from the left pane:       In Settings we need to give the below information: URL Value: @concat(‘https://management.azure.com/subscriptions/’,pipeline().parameters.SubscriptionId,’/resourceGroups/’,pipeline().parameters.ResourceGroupName,’/providers/Microsoft.Synapse/workspaces/’,pipeline().parameters.ServerName,’/sqlPools/’,pipeline().parameters.DatabaseName,’/’,pipeline().parameters.Command,’?api-version=2019-06-01-preview’)   Method: Post Body: {} Resource =https://management.azure.com/   Once the status is set to Resume(or if it is confirmed that it is already resumed) the last Web Activity will Scale up the SQL Pool as per the given parameter. URL Value: @concat(‘https://management.azure.com/subscriptions/’,pipeline().parameters.SubscriptionId,’/resourceGroups/’,pipeline().parameters.ResourceGroupName,’/providers/Microsoft.Synapse/workspaces/’,pipeline().parameters.ServerName,’/sqlPools/’,pipeline().parameters.DatabaseName,’/’,’?api-version=2019-06-01-preview’)   Method: Patch Header: Name = Content-Type Value= application/json Body: {“sku”: {“name”: ‘@{pipeline().parameters.PerformanceLevel}’}} Resource =https://management.azure.com/   You can schedule the pipeline before running ETLs in order to load the data and increase the performance by scaling up the SQL Pool and then in the same way you can create a pipeline to scale it down as before and schedule it after the load is done.   Azure Synapse offers an enterprise-grade analytical engine for datasets of any size. It offers a flexible pricing model to enable businesses to start small and scale up based on their need. If you would like to explore Azure Synapse for your business, Accelerant offers a free demo environment customized to your data needs. Reach us at info@ymz.633.myftpupload.com for more information.               Author: Ritu Negi Technical Lead, Accelerant

AI/ML, Azure, Data Infrastructure, Data Management, Data Science, Data/Performance Analytics, Decision Making, Microsoft Power BI

Data Democratization: Stage 6: Lock them up – Secure your data

Data needs to be protected just as networks are protected from malicious actors (internal and external). Data security should not be an afterthought but a design decision from the start. Enterprises have spent a considerable amount of resources securing their data centers, monitoring access to sensitive data, and yet there are significant data breaches every year. It has been reported that between January and September 2019, there were over 7.9 billion data records exposed — a 33% increase from the same time in 2018! The year 2020 turned out to be worse than 2019, and 2021 is on pace to be the worst year so far.   Modern data architectures in the cloud make it easy for enterprises of any size to adopt these standards, but it needs to be augmented with resources from the internal IT team. In addition, the modern cloud infrastructures are monitored, deploying the best processes, top-notch technology, and expert security analysts. Hence, everyone reaps the benefit of best practices deployed universally in the cloud and continued fixes for any security flaws in the technology stack.   The innovation of modern cloud computing, such as Azure, has lowered the barrier to setting up world-class secure decision support systems.

AI/ML, Data Infrastructure, Data Management, Data Science, Data/Performance Analytics, Decision Making, Oracle

Data Democratization: Stage 5, Executive Actions

It is said that there are no million-dollar ideas, only million-dollar actions! Hence, all thoughts, plans, and decisions need follow-up actions. So for these actions to be most effective, organizations need to drive activities driven by data.   The executive actions in successful enterprises are taken by the executives and undertaken every day by all employees. Enterprise change management is a complex process, and it needs a change of culture. Data needs to tell a story that resonates with employees at every level, motivating them to continue to take incremental steps (actions) to move the ball towards the goal post.   While such actions are set in motion at every department, team level, there must be continual monitoring of whether efforts achieve desired results. As mentioned in earlier sections, the change is constant. Defining measurable key performance indices and continuous tracking of teams provide a path to make mid-flight adjustments. Such feedback loops improve success chances while mitigating the risk posed by changing conditions, both internal and external.   Accelerant has created data-centric playbooks for various departments such as Finance, Supply Chain, Operations, Sales, and Marketing. Our solutions are hosted on Microsoft Azure and reap the benefits of scalable cloud from Microsoft. We can assist you in defining measurable KPIs and putting together a framework to create closed-loop monitoring for continuous improvement.

AI/ML, Azure, Data Infrastructure, Data Management, Data Science, Data/Performance Analytics, Decision Making

Data Democratization: Stage 4, Best is yet to come (Artificial Intelligence)

Top-performing organizations extract incremental values from various business processes. A combination of small incremental gains results in a significant change that benefits multiple departments across the organization. For example, incremental improvements in the supply chain result in better costs while also improving customer satisfaction with faster delivery. This results in more customers coming back for repeat business, thus delivering better sales. However, it must be emphasized that data initiatives need continuous effort from the various stakeholders responsible for the business performance. It is a company culture where every employee gets access to data relevant to his/her job. Every employee understands that decisions or opinions need to be backed by data and the “gut-feel.”   When data is harmonized and analyzed descriptively, organizations should start investing in the ultimate prize of achieving higher automation levels via AI and machine learning. AI and machine learning open doors to opportunities that offer unprecedented efficiency gains, e.g., autonomous driving. Organizations must do a self-assessment of data maturity, without which all such initiatives are bound to fail. AI/Machine Learning techniques provide exceptional value provided the training data sets are pristine. Such models need periodic adjustments to account for various data drag.   Data is the rocket fuel that leapfrogs businesses ahead of the competition. Investments in data infrastructure, analysis capabilities, and data-driven decision-making culture are vital to keeping a competitive edge.

AI/ML, Azure, Data Management, Data Science, Data/Performance Analytics

Data Democratization: Stage 3, The known unknowns and unknown unknowns

Data is a competitive edge, only if we can harness knowledge from it. Successful businesses will drive decision-making based on the trends, anomalies, variances they see in the data. Unfortunately, many companies may not even know the underlying currents and hidden trends in data, which may be critical to their businesses. This results in missed opportunities or expensive mistakes.   According to the Harvard Business Review, in 1958, corporations listed in the S&P 500 had an average stay of 61 years. By 1980, numbers had declined sharply to 25 years. In 2011, the average tenure dropped to 18 years. Failure to understand and act on hidden signals regarding the business can be catastrophic. Data is at the heart of digital disruption, and it is only accelerating at an unprecedented pace.   Businesses of all sizes need to measure the “as is” (the current state of affairs) before making future changes. There needs to be an agile approach to data harmonization and analysis at speed to understand data signals. Existing cloud data infrastructure makes it easy to start experimenting with data at a low start-up cost. Agile data operations integrated into continuous integration and continuous delivery DevOps methodology facilitate businesses adapting to changing external factors in real-time without expensive rework.   Thus, investments into cloud data infrastructure such as Microsoft Azure is key to uncovering the “known unknowns” and discovering the “unknown unknowns” in the data. Net net, Investments into the cloud data infrastructure increases chances of success in a highly competitive and dynamic business climate of the new digital world.

AI/ML, Azure, Data Management, Data Science, Data/Performance Analytics

Data Democratization: Stage 2, Tear down this wall

On average, a typical mid-size enterprise has hundreds of applications, databases, and unstructured file sources. A consumer-facing application may also have thousands to millions of end devices (IOTs). The data residing in such applications create ubiquitous data islands that do not capture a complete business view. To get a 360 idea of the business, enterprises need to combine data from various sources into a harmonized structure. Often companies create data warehouses, data marts, etc., to integrate the applications/databases to get this complete picture. While these traditional approaches lead to incremental value for the business, they are too slow and lose relevance over time. While the need to harmonize data still exists, the techniques for traditional ETL are no longer adequate. Businesses need to look into modern methods of ingesting data in raw form and defining analysis techniques at run time. In other words, traditional techniques of ETL put too much burden on moving extensive data through the network while performing computations, thus slowing it down significantly. In the new world, data is ingested as is (ELT), and compute performed closer to data, thus resulting in faster analysis at scale.   Enterprises can break down these silos of data or applications by adopting new data integration techniques deployed on on-demand, unlimited compute on scalable cloud platforms.

AI/ML, Azure, Data Management, Data Science, Data/Performance Analytics

Data Democratization: Stage 1, Drain the (data) swamp

The proliferation of applications (legacy, cloud), IOTs, databases, unstructured files (images, videos, etc.) has created an enormous gap between data creation and the ability to value out of that data. Hence the enterprises started to store data on low-cost data lake storage. Without a proper way to catalog, clean, and prep such data, it is impossible to get value out of creating such storage. Some teams embark on hiring big teams to curate these vast piles of data using various data governance tools, but those efforts continue to fall short. Thus the swamp of structured and unstructured data continues to get bigger and bigger. This results in a huge execution gap between data creation and data analysis.   There need to be automated and intelligent ways to label, clean, and curate data, hence drain the data swamp. Data cleansing and governance tools should offer and employ rule-based artificial intelligence techniques for data cleansing and curation. The good news is that with the advent of cloud technologies, access to sophisticated machine learning algorithms is getting easy with very little upfront investment. The enterprise data governance team can experiment with various machine learning techniques (supervised or unsupervised learning) to bridge the execution gap and drain the data swamp.

AI/ML, Azure, Data/Performance Analytics

Data democratization: An approach to building a better enterprise

It is often stated that a better democracy is built on broader civic participation. We can apply the same concept to build a better enterprise. The biggest asset in any organization is its PEOPLE. Once the leadership defines a company’s vision statement, every employee needs to make smaller decisions to help in the enterprise’s collective success. Hence, by democratizing data, we facilitate better data-driven decisions at every level. For some legacy organizations, it may be necessary just for survival in the 21st century.   Becoming a data-centric organization doesn’t happen by buying the fanciest tool in the market. It is an alignment of executive leadership, cross-functional teams, agile technologies, and persistence. This is not a project that will have an end date. However, it is a journey with essential milestones, and every next milestone provides exponential benefits to the competitive edge.   In this series, we will break down the various stages in the intelligent enterprise journey while poking pun at slogans from the recent political discourse. Stay tuned for more. We will discuss the various stages in the intelligent enterprise journey in our upcoming blog next week!

Activate Your Free 15-Day Infominer Trial

X
Scroll to Top