Azure

Azure, Data Infrastructure

Azure SQL VM Best Practices: VM Sizing

  Overview When we run SQL Server on Azure Virtual Machines, we can use the same database performance tuning options applicable to SQL Server in an on-premises environment.   For Azure SQL server VM, best practice starts with collecting the baseline requirements for the processing unit and the storage; for this, we have to know the IOPS, throughput, and latency of the source. Also noteworthy is that the CPU and Memory requirement also gather data during peak hours, such as workloads during your business hours, and other high load processes such as end-of-day processing or weekend ETL workloads. Then, use the performance analysis to select the VM Size that can scale to your workload performance requirements. However, the performance of a relational(SQL) database in the cloud depends on many factors, such as the size of a virtual machine, the configuration of the data disks, SQL Server features, Azure Features, High availability, and disaster recovery (HADR) features and also the Security optimization.   Consider scaling up your resources for typically heavy workloads such as end-of-quarter processing, and then scale done once the workload completes. There is typically a trade-off between reducing costs and optimizing for performance. This performance best practices series focuses on getting the best performance for SQL Server on Azure Virtual Machines. If your workload is less demanding, you might not require every recommended optimization. Consider your performance needs, costs, and workload patterns as you evaluate these recommendations. It is about setting up or opting for the best infrastructure for a SQL Server on the Azure cloud. However, if the peak workload at your end is less demanding, you might not require all recommended optimizations. Inspect your performance needs, work pattern, and most importantly, the cost involved as you evaluate these recommendations.   The following is a check-list for optimal performance of SQL Server on Azure Virtual Machines: – VM Size – Storage and Scalability – Baseline Requirements – Security Considerations – SQL Server Features   Since this article is Azure SQL VM and no one knows Azure and SQL better than Microsoft, we will post the link to relevant Microsoft documentation for better understanding and transparency wherever needed. VM Size Here is a quick check-list of VM size best practices for running your SQL Server on Azure VM: · Use VM sizes with four or more vCPU. · Use memory-optimized virtual machine sizes for the best performance of SQL Server workloads. · Consider a higher memory-to-vCore ratio for mission-critical and data warehouse workloads. · Collect the target workload performance characteristics (IOPS) and use them to determine the appropriate VM size for your business.   A vCPU is the abbreviation for virtual centralized processing unit and represents a portion or share of the underlying, physical CPU assigned to a particular virtual machine (VM). A Physical CPU can have multiple vCPU, and this vCPU can be managed by controller software like Hyper-V. In the past, there was a standard procedure that there were eight vCPUs per core. Today, vCPU count is primarily determined by the manufacturer. It is determined by taking the number of processing threads that a chipset offers per core and multiplying the occupied sockets. This is how it looks: (Threads x Cores) x Physical CPU = Number vCPU. If you have small VMs that barely use CPU time, you could quickly get 20-30 VMs from an 8-core server. But, if you have larger workloads such as a database server, you will have far fewer VMs from that same eight-core server. Thus, it is all about resource utilization and allocation.   A few questions to consider before we get started: · Do our apps run at 100% CPU utilization all the time? · Do they have periods where utilization bursts? · Do they have maintenance windows? · When you know the requirements, you can make an informed decision on the underlying hardware. There is a large array of machines in the Azure cloud that we can select from; still, we will emphasize only on D, E, and M family machines since we refer only to SQL workloads. SQL Server database engine performance is more memory-driven as compared to a number of cores available .eg: Dsv2 11-15, Edsv4, Mv2, Msv2, M, Mdsv2, DCsv2, Dasv4, Dsv3, Ddsv4, Lsv2. These can be broadly categorized into General purpose, Memory, and Storage Optimized machines.   Some of the naming convention for machines are as follows: A – AMD based processor D – Disk(local temp, the temporary disk is present) M – The most amount of Memory in a particular size S – Premium Storage capable. I – Isolated storage V – Version   D series: Has a consistent memory to core ratio of 7:1, medium to large caches, and in-memory analytics. Supports Premium Storage and are entry-level SQL Server virtual machines. E Series: Has a Memory to core ratio of 8:1; these are ideal for Memory intensive enterprise applications, also have sizable local storage SSD capacity and throughput. This acts as a solid all-purpose SQL Server virtual machine. M Series: Has a core to memory ratio of 30:1 without constrained cores and 122:1 with constrained cores. These offer vCore counts and Memory for the largest SQL Server workloads. These are mission-critical and Data Warehouse Virtual Machines. Also, it is the point to note that these costs are enormous, so we need to manage them accordingly.   Also, it is essential to understand the concept of constrained cores. Some database workloads like SQL Server or Oracle require high Memory, storage, and I/O bandwidth, but not a high core count. Conversely, many database workloads are not CPU-intensive. Azure offers specific VM sizes to constrain the VM vCPU count to reduce software licensing costs while conserving the same Memory, storage, and I/O bandwidth. For example, the vCPU count can be constrained to one-half or one-quarter of the original VM size. For Example – Standard_M16ms comes with 16 cores and 437.5 GB of Memory(RAM).   Screenshot taken from M-series – Azure Virtual Machines| Microsoft Docs  

Azure, Data/Performance Analytics, Decision Making, Expertise Data Warehousing, SQL

Analyze at the speed of thought with Azure Synapse

Azure Synapse Analytics is an infinite analytics service that unites data integration, enterprise data warehousing, and big data analytics. Using either server-less or dedicated resources at scale, you have the freedom to query data your way. Here we will be comparing the performance at different scales.   Azure Synapse Performance Comparison   In order to compare the performance of different data volume tables with different scale, we have below set of tables/views: [dbo].[Sales_Order_2Years] [dbo].[Sales_Order_4Years] [dbo].[Sales_Order_6Years] [dbo].[Sales_Order_8Years] [dbo].[Sales_Order_10Years] Process of Scaling the Azure Dedicated SQL Pool   Default Azure Dedicated SQL pool is set on DW100c specification. To change the scaling on the higher specification, we can click on the scale button as shown in the below screenshot of the Azure Dedicated Pool window of Azure Portal.       In the above screenshot, we have changed the scale to DW200c, in the highlighted section it is showing all the pricing details. Once, change clicked on the save button on the same window to set the specification to DW200c.   SQL used in the process of performance analysis   1. Select count(*) from Sales_Order_2Years;   2. Select DISTINCT Part_Code from Sales_Order_2Years;   3. Select sum(cast(Month_Primary_Amount as float)) from Sales_Order_2Years;     4. Select sum(cast(Month_Primary_Amount as float)) ,Part_Code from Sales_Order_2Years group by Part_Code ;     5. Select sum(cast(Month_Primary_Amount as float)),Part_Code ,Organization_Name from Sales_Order_2Years group by Part_Code, Organization_Name;     6. Select sum(cast(Month_Primary_Amount as float)),Part_Code FROM Sales_Order_2Years group by Part_Code order by Part_Code DESC ;   7. Select sum(cast(Month_Primary_Amount as float)),sum(cast(Primary_Quantity as float )),Part_Code,Organization_Name From Sales_Order_2Years group by Part_Code,Organization_Name order by Part_Code desc;   Similarly, executed queries with other tables having different volumes.   Performance Comparison on Difference DP Specifications Here following are the comparison among the different DP Specifications with different data sets of Volume in millions.   Conclusion: The above comparison clearly shows that when there is a large volume of data (in Millions) and the specification is higher on the dedicated Azure Pool, the performance increases. However, it doesn’t get better on low data volume whether the specification is higher or lower.   Scale Up Dedicated SQL Pool using Pipeline   Note: To allow the Synapse workspace to call the REST API, we need to give the Synapse workspace access to the SQL Pool. In the Access control (IAM) of the SQL Pool, assign the contributor role to your Synapse Workspace. Now, to scale up the SQL Pool using pipeline, we need to create a pipeline and set the parameters as below:   Above are the generic Parameters used within the Pipeline. Subscription Id: Subscription Id of Synapse Workspace ResourceGroupName: Name of the ResourceGroup of the used Synapse Workspace   Region: Location of the Synapse Workspace ServerName: Synapse Workspace Name DatabaseName: Name of the dedicated SQL Pool Command: Resume performance level: To level which we need to ScaleUp   Then in the pipeline, we need a Web Activity which will get the status of the SQL Pool whether it is Pause or Resume, an IF Condition which will check if the status is Resume or not, and again a Web Activity which will then Scale Up the SQL Pool.     Here is the setting of check status web activity:   Here is the URL value: @concat(‘https://management.azure.com/subscriptions/’,pipeline().parameters.SubscriptionId,’/resourceGroups/’,pipeline().parameters.ResourceGroupName,’/providers/Microsoft.Synapse/workspaces/’,pipeline().parameters.ServerName,’/sqlPools/’,pipeline().parameters.DatabaseName,’?api-version=2019-06-01-preview’)   Method = GET Resource =https://management.azure.com/   Now in If Condition, we will check if the status is not Resume and if it is not i.e. condition is true then in its Activity we will resume the status by assigning a Web Activity for it.   Here the expression in IF Condition is : @and(equals(activity(‘Check Azure Synapse Analytics Status’).output.properties.status,’Paused’), equals(pipeline().parameters.Command,’Resume’)) To add Web Activity we need to click on the pencil icon besides True as below and drag a Web Activity to that window from the left pane:       In Settings we need to give the below information: URL Value: @concat(‘https://management.azure.com/subscriptions/’,pipeline().parameters.SubscriptionId,’/resourceGroups/’,pipeline().parameters.ResourceGroupName,’/providers/Microsoft.Synapse/workspaces/’,pipeline().parameters.ServerName,’/sqlPools/’,pipeline().parameters.DatabaseName,’/’,pipeline().parameters.Command,’?api-version=2019-06-01-preview’)   Method: Post Body: {} Resource =https://management.azure.com/   Once the status is set to Resume(or if it is confirmed that it is already resumed) the last Web Activity will Scale up the SQL Pool as per the given parameter. URL Value: @concat(‘https://management.azure.com/subscriptions/’,pipeline().parameters.SubscriptionId,’/resourceGroups/’,pipeline().parameters.ResourceGroupName,’/providers/Microsoft.Synapse/workspaces/’,pipeline().parameters.ServerName,’/sqlPools/’,pipeline().parameters.DatabaseName,’/’,’?api-version=2019-06-01-preview’)   Method: Patch Header: Name = Content-Type Value= application/json Body: {“sku”: {“name”: ‘@{pipeline().parameters.PerformanceLevel}’}} Resource =https://management.azure.com/   You can schedule the pipeline before running ETLs in order to load the data and increase the performance by scaling up the SQL Pool and then in the same way you can create a pipeline to scale it down as before and schedule it after the load is done.   Azure Synapse offers an enterprise-grade analytical engine for datasets of any size. It offers a flexible pricing model to enable businesses to start small and scale up based on their need. If you would like to explore Azure Synapse for your business, Accelerant offers a free demo environment customized to your data needs. Reach us at info@ymz.633.myftpupload.com for more information.               Author: Ritu Negi Technical Lead, Accelerant

AI/ML, Azure, Data Infrastructure, Data Management, Data Science, Data/Performance Analytics, Decision Making, Microsoft Power BI

Data Democratization: Stage 6: Lock them up – Secure your data

Data needs to be protected just as networks are protected from malicious actors (internal and external). Data security should not be an afterthought but a design decision from the start. Enterprises have spent a considerable amount of resources securing their data centers, monitoring access to sensitive data, and yet there are significant data breaches every year. It has been reported that between January and September 2019, there were over 7.9 billion data records exposed — a 33% increase from the same time in 2018! The year 2020 turned out to be worse than 2019, and 2021 is on pace to be the worst year so far.   Modern data architectures in the cloud make it easy for enterprises of any size to adopt these standards, but it needs to be augmented with resources from the internal IT team. In addition, the modern cloud infrastructures are monitored, deploying the best processes, top-notch technology, and expert security analysts. Hence, everyone reaps the benefit of best practices deployed universally in the cloud and continued fixes for any security flaws in the technology stack.   The innovation of modern cloud computing, such as Azure, has lowered the barrier to setting up world-class secure decision support systems.

AI/ML, Azure, Data Infrastructure, Data Management, Data Science, Data/Performance Analytics, Decision Making

Data Democratization: Stage 4, Best is yet to come (Artificial Intelligence)

Top-performing organizations extract incremental values from various business processes. A combination of small incremental gains results in a significant change that benefits multiple departments across the organization. For example, incremental improvements in the supply chain result in better costs while also improving customer satisfaction with faster delivery. This results in more customers coming back for repeat business, thus delivering better sales. However, it must be emphasized that data initiatives need continuous effort from the various stakeholders responsible for the business performance. It is a company culture where every employee gets access to data relevant to his/her job. Every employee understands that decisions or opinions need to be backed by data and the “gut-feel.”   When data is harmonized and analyzed descriptively, organizations should start investing in the ultimate prize of achieving higher automation levels via AI and machine learning. AI and machine learning open doors to opportunities that offer unprecedented efficiency gains, e.g., autonomous driving. Organizations must do a self-assessment of data maturity, without which all such initiatives are bound to fail. AI/Machine Learning techniques provide exceptional value provided the training data sets are pristine. Such models need periodic adjustments to account for various data drag.   Data is the rocket fuel that leapfrogs businesses ahead of the competition. Investments in data infrastructure, analysis capabilities, and data-driven decision-making culture are vital to keeping a competitive edge.

AI/ML, Azure, Data Management, Data Science, Data/Performance Analytics

Data Democratization: Stage 3, The known unknowns and unknown unknowns

Data is a competitive edge, only if we can harness knowledge from it. Successful businesses will drive decision-making based on the trends, anomalies, variances they see in the data. Unfortunately, many companies may not even know the underlying currents and hidden trends in data, which may be critical to their businesses. This results in missed opportunities or expensive mistakes.   According to the Harvard Business Review, in 1958, corporations listed in the S&P 500 had an average stay of 61 years. By 1980, numbers had declined sharply to 25 years. In 2011, the average tenure dropped to 18 years. Failure to understand and act on hidden signals regarding the business can be catastrophic. Data is at the heart of digital disruption, and it is only accelerating at an unprecedented pace.   Businesses of all sizes need to measure the “as is” (the current state of affairs) before making future changes. There needs to be an agile approach to data harmonization and analysis at speed to understand data signals. Existing cloud data infrastructure makes it easy to start experimenting with data at a low start-up cost. Agile data operations integrated into continuous integration and continuous delivery DevOps methodology facilitate businesses adapting to changing external factors in real-time without expensive rework.   Thus, investments into cloud data infrastructure such as Microsoft Azure is key to uncovering the “known unknowns” and discovering the “unknown unknowns” in the data. Net net, Investments into the cloud data infrastructure increases chances of success in a highly competitive and dynamic business climate of the new digital world.

AI/ML, Azure, Data Management, Data Science, Data/Performance Analytics

Data Democratization: Stage 2, Tear down this wall

On average, a typical mid-size enterprise has hundreds of applications, databases, and unstructured file sources. A consumer-facing application may also have thousands to millions of end devices (IOTs). The data residing in such applications create ubiquitous data islands that do not capture a complete business view. To get a 360 idea of the business, enterprises need to combine data from various sources into a harmonized structure. Often companies create data warehouses, data marts, etc., to integrate the applications/databases to get this complete picture. While these traditional approaches lead to incremental value for the business, they are too slow and lose relevance over time. While the need to harmonize data still exists, the techniques for traditional ETL are no longer adequate. Businesses need to look into modern methods of ingesting data in raw form and defining analysis techniques at run time. In other words, traditional techniques of ETL put too much burden on moving extensive data through the network while performing computations, thus slowing it down significantly. In the new world, data is ingested as is (ELT), and compute performed closer to data, thus resulting in faster analysis at scale.   Enterprises can break down these silos of data or applications by adopting new data integration techniques deployed on on-demand, unlimited compute on scalable cloud platforms.

AI/ML, Azure, Data Management, Data Science, Data/Performance Analytics

Data Democratization: Stage 1, Drain the (data) swamp

The proliferation of applications (legacy, cloud), IOTs, databases, unstructured files (images, videos, etc.) has created an enormous gap between data creation and the ability to value out of that data. Hence the enterprises started to store data on low-cost data lake storage. Without a proper way to catalog, clean, and prep such data, it is impossible to get value out of creating such storage. Some teams embark on hiring big teams to curate these vast piles of data using various data governance tools, but those efforts continue to fall short. Thus the swamp of structured and unstructured data continues to get bigger and bigger. This results in a huge execution gap between data creation and data analysis.   There need to be automated and intelligent ways to label, clean, and curate data, hence drain the data swamp. Data cleansing and governance tools should offer and employ rule-based artificial intelligence techniques for data cleansing and curation. The good news is that with the advent of cloud technologies, access to sophisticated machine learning algorithms is getting easy with very little upfront investment. The enterprise data governance team can experiment with various machine learning techniques (supervised or unsupervised learning) to bridge the execution gap and drain the data swamp.

AI/ML, Azure, Data/Performance Analytics

Data democratization: An approach to building a better enterprise

It is often stated that a better democracy is built on broader civic participation. We can apply the same concept to build a better enterprise. The biggest asset in any organization is its PEOPLE. Once the leadership defines a company’s vision statement, every employee needs to make smaller decisions to help in the enterprise’s collective success. Hence, by democratizing data, we facilitate better data-driven decisions at every level. For some legacy organizations, it may be necessary just for survival in the 21st century.   Becoming a data-centric organization doesn’t happen by buying the fanciest tool in the market. It is an alignment of executive leadership, cross-functional teams, agile technologies, and persistence. This is not a project that will have an end date. However, it is a journey with essential milestones, and every next milestone provides exponential benefits to the competitive edge.   In this series, we will break down the various stages in the intelligent enterprise journey while poking pun at slogans from the recent political discourse. Stay tuned for more. We will discuss the various stages in the intelligent enterprise journey in our upcoming blog next week!

Activate Your Free 15-Day Infominer Trial

X
Scroll to Top