title | description | author | ms.author | ms.date | ms.topic | ms.assetid | caps.latest.revision |
---|---|---|---|---|---|---|---|
Parallel Data Warehouse overview |
This topic explains the appliance software and the non-appliance software components of Analytics Platform System. |
barbkess |
barbkess |
01/05/2017 |
article |
db0c4a43-a66d-4c44-ab91-791c5785f71c |
20 |
This topic explains the appliance software and the non-appliance software components of Analytics Platform System.
MPP Engine
The MPP Engine is the brains of the Massively Parallel Processing (MPP) system. It does the following:
-
Creates parallel query plans and coordinates parallel query execution on the Compute nodes.
-
Stores and coordinates metadata and configuration data for all of the databases.
-
Manages SQL Server PDW database authentication and authorization.
-
Tracks hardware and software status.
Data Movement Service (DMS) is part of the “secret sauce” of PDW. It does the following:
-
Transfers data to and from the SQL Server PDW nodes.
-
Processes query operations that require transferring data among the nodes.
-
Improves query performance by optimizing data transfer speeds.
The Admin Console is a web application that presents the appliance state, health, and performance information.
The Configuration Manager (dwconfig.exe), is the tool that appliance administrators use to configure Analytics Platform System.
SQL Server manages all of the databases on the Control node.
-
The Shell database manages the metadata for all distributed user databases.
-
TempDB contains the metadata for all user temporary tables across the appliance.
-
Master is the master table for SQL Server on the Control node.
The Compute nodes are parallel data processing and storage units. They have direct attached storage and use SQL Server to manage user data.
Data Movement Service (DMS) runs on each Compute node to do the following:
-
As part of processing parallel queries, DMS transfer data to and from other Computer nodes and the Control node.
-
DMS, running on each Compute node, receives data loads in parallel. Data is loaded in parallel directly from the loading server to the Compute nodes
-
DMS transfers data from each Compute node directly to the backup server.
-
Using PolyBase, DMS transfers data to and from an external Hadoop cluster, or the HDInsight Region on the appliance.
Each Compute node runs an instance of SQL Server to process queries and manage user data.
The appliance fabric provides the operating system, services, and network infrastructure for the appliance.
Active Directory (AD) Domain Services (DS)
Analytics Platform System performs authentication among the Analytics Platform System nodes, and manages the authentication of SQL Server PDW Windows Authentication logins.
DNS service
Windows Domain Name Service (DNS) resolves domain names to IP addresses for the Analytics Platform System appliance.
Windows Deployment Service (WDS) deploys the Windows Server operating system onto the appliance. It is deployed on every host and virtual machine across the appliance.
The DHCP service creates IP addresses so that the hosts within the appliance domain can join the appliance network without having a pre-configured IP address.
Analytics Platform System uses virtualization to achieve high availability. The Virtual Machine Manager hosts System Center to deploy the operating system on the physical hosts.
Windows Server Update Services (WSUS) to apply or remove Windows Updates across all of the hosts and virtual machines.
All of the hosts and virtual machines in the appliance run Windows Server operating system.
Windows Failover Clustering provides the ability to restart processes on a passive host in the event that a host fails.
Windows Storage Spaces manages user data as a storage pool for a small group of Compute nodes. If a Compute node fails, the data is still accessible through another Compute node in the group.
Microsoft Hyper-V Server provides a simple and reliable virtualization solution. Analytics Platform System uses virtualizations to balance CPU resources and to provide high availability for the PDW nodes and appliance fabric components .
PolyBase technology to integrates SQL Server PDW data with external Hadoop data. The Hadoop data can be stored on any of these Hadoop Data sources:
-
Hortonworks for Linux or Windows Server
-
Cloudera on Linux
-
HDInsight running on APS
-
HDInsight data stored on Azure Storage Blob
Queries are written with Transact-SQL modified to fit the MPP nature of the queries. All queries are submitted to the Control node which generates a parallel query plan to run the query across the Compute nodes.
SQL Server Data Tools runs inside of Visual Studio and is our recommended GUI tool for submitting queries to SQL Server PDW. It is similar to SQL Server Management Studio by allowing you to navigate through an object explorer.
If you don’t already have Visual Studio, you can download the tools that you need for free.
sqlcmd is the SQL Server command-line tool for running Transact-SQL statements and system commands. It works with SQL Server PDW and is our recommended command-line tool for querying SQL Server PDW.With sqlcmd you can run Transact-SQL statements interactively from the command-line, as a batch file, or from Windows PowerShell.
You can use Integration Services to query SQL Server PDW.
By using a SQL Server linked server connection, you can use SQL Server to submit Transact-SQL statements to SQL Server PDW.
Analysis Services
SQL Server PDW is a valid data source for Analysis Services databases and Excel PowerPivot models. Using the OLE DB provider, you can configure an Analysis Services cube to use either multidimensional online analytical processing (MOLAP) or relational online analytical processing (ROLAP) storage.
You can use SQL Server PDW as a SQL Server data source for reports that you develop for Reporting Services by using SQL Server Report Builder. You can also use SQL Server PDW as a SQL Server source for report models. By using Report Manager or the report server API, you can generate a model from a SQL Server PDW database.
You can connect to SQL Server PDW with PowerPivot for Excel, a free download that significantly expands the data analysis capabilities of Excel.
Install PDW-specific destination adapters that allow you to use SQL ServerIntegration Services to load data into SQL Server PDW.
dwloader is a command-line loading tool that loads data in parallel from your loading server to the SQL Server PDW Compute nodes.
With PolyBase technology, you can load non-relational data from an Hadoop Cluster into a relational table in SQL Server PDW. The Hadoop data can be located in an external Hadoop Cluster, the HDI Region on APS, or in a Azure Storage Blob.
SQL Server PDW uses Transact-SQL database backup and restore commands to backup and restore user databases, in parallel, to and from a backup server. SQL Server PDW writes the backup to a directory in a Windows file share, and then likewise restores data from a Windows file share.
For more information, see Plan for Backup and Loading Hardware and Backup and Restore Overview
The Remote Table Copy feature allows you to copy tables from SQL Server PDW databases to remote (non-appliance) SMP SQL Server databases. This enables hub and spoke scenarios for SQL Server PDW.
Analytics Platform System has several ways to monitor appliance activity
The Admin Console allows you to view current status about the appliance health. This runs as a web application on the Control node and is accessible over https.
For more information, see Monitor the Appliance by Using the Admin Console (Analytics Platform System)
The Admin Console is based on system view queries. You can query the system views individually to get the specific piece of information you need.
For more information, see Monitor the Appliance by Using System Views (Analytics Platform System)
There are System Center Operations Manager (SCOM) Management Packs for SQL Server PDW.
To configure the appliance for SCOM, see Monitor the Appliance by Using System Center Operations Manager (Analytics Platform System)