Git Product home page Git Product logo

kromerm / adflab Goto Github PK

View Code? Open in Web Editor NEW
133.0 27.0 125.0 331.04 MB

Azure Data Factory hands-on lab, self-paced. Learn how to lift & shift SSIS packages to the Cloud with ADF. Build new ETL pipelines in ADF, transform data at scale, load Azure Data Warehouse data marts. Also walks through operationalizing ADF pipelines with scheduling and monitoring modules.

Home Page: http://aka.ms/adflab2

PowerShell 100.00%

adflab's Issues

FAAMaster

This file needs to be extracted for the script to work
C:\adflab\Deployment\Files\input\FAAmaster\FAAmaster.zip.

Please have it pre-extracted in github to make it easier for people doing a git clone

Lab 5 - WASB Paths require updates

The documentation should ask the developer to update the FAAMerge.hql hive script to reflect the automatically generated storage account name

Mine was adflabstorage7nuut for example.

This is required 3 times in the HQL.

Also, hard coding of storage accounts in HQL makes the script not portable between environments.

LAB03 References to LAB01 VSTS Git

Howdy, running through these. The LAB03 module starts out with setting up a code repository (VSTS Git) and references the repository set up in LAB01, but LAB01 does not have repository setup step.

"We will be using the Azure Data Factory we created in Lab 01 called adflab-adf. We also will use a VSTS Git (or Github) account to version code in the GUI pipeline tool. This was also setup in Lab01 but we will show how to connect it."

LabDeploy.ps1 Fails if files not unzipped

The deployment script fails because some of the files in github repo have not been unzipped and need to be.

To get it to work I Extracted
.\adflab\Deployment\Files\input\FAAmaster\FAAmaster.txt
From
.\adflab\Deployment\Files\input\FAAmaster\FAAmaster.zip

and Extracted
.\adflab\Deployment\Files\backups\AirlinePerformance-OLTP.bacpac
From
.\adflab\Deployment\Files\backups\AirlinePerformance-OLTP.zip

You really need to update the docs to reflect this if you can't store the files uncompressed in GH as it is an extra step the user needs to perform,.

Create an example for tumbling windows trigger

Right now there is just an example for the scheduled trigger in Lab 7. However there is a need to explain how to create a tumbling Windows trigger.
expected Content:

  • How to create a tumbling window trigger
  • How to Hand over the Parameters for window start time and window end time to an activity
  • Example: load data from a blob with a Directory structure like /yyyy/mm/dd/hh/Content.csv using a databricks Notebook for a given tumbling window

Lab 2 - SSIS Package Deploy Failed

Step 9 - Returns error:
TITLE: SQL Server Integration Services

Warning: Failed to decrypt an encrypted XML node. Verify that the project was created by the same user. Project load will attempt to continue without the encrypted information.

Warning: Failed to decrypt sensitive data in project with a user key. You may not be the user who encrypted this project, or you are not using the same machine that was used to save the project. If the sensitive data is a parameter value, the value may be required to run the package on the Integration Services server.

Step 12:
TITLE: SQL Server Integration Services

Value does not fall within the expected range. (Microsoft.SqlServer.ManagedDTS)

This was flagged during the validation step.
Error:

Then click back, then Next, then next then Deploy

TITLE: SQL Server Integration Services

Unable to create the type with the name 'AzureStorage'. (Microsoft.SqlServer.ManagedDTS)

@runincluster is not a parameter for procedure create_execution

Just in case anyone runs into this. In module 2, when I tried executing the SSIS package, I got this error message:

@runincluster is not a parameter for procedure create_execution. (Microsoft.SqlServer.IntegrationServices.UITasks)

It turned out that the Management Studio version 17.0 that I had was too old to run this properly. I installed a new MS and it worked. Info:
https://docs.microsoft.com/en-us/sql/integration-services/what-s-new-in-integration-services-in-sql-server-2017?view=sql-server-ver15

Module 2 doc omits VSTS git repo setup step

The Module 2 doc shows a VSTS repo in the Data Integration app, and references functionality that's only available if a repo is configured, such as "Save" and "Publish".

I worked through the doc as is, as far as setting up an hourly trigger. I could not publish the pipeline, so I used "Publish All" at the top of the Data Integration app and got an error. At that point I realized there was a VSTS repo in the doc. I created a VSTS project and repo, then configured in the Data Integration app. While the integration runtime and pipeline JSON files got pushed to the repo, my pipeline disappeared from the Data Integration app (I recognize ADFv2 is still in preview, no worries and I will recreate).

For the sake of others working through Module 2, it would be good to add the step of configuring a repo earlier in the document.

PowerShell Lab deployment

I am new to PowerShell scrips.
In LAB 1 -Part 1 , i am unable to link to PowerShell Run Script and connect to my Azure Account.
Getting the error
`PS SQLSERVER:>
#Prompt for Azure credentials
Login-AzureRmAccount
Login-AzureRmAccount : The term 'Login-AzureRmAccount' is not recognized as the name of a cmdlet, function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:3 char:1

  • Login-AzureRmAccount
  •   + CategoryInfo          : ObjectNotFound: (Login-AzureRmAccount:String) [], CommandNotFoundException
      + FullyQualifiedErrorId : CommandNotFoundException
    

`

Unclear instructions on how to configure

image

The instruction are unclear about what to do here. It says for each SQL Server connection configure the connection string, server name, and password. I only see the password field empty so i added the password there. The connection string and server name are already present so do they have to be replaced by the setting of the azure version of the DW, ODS and OLTP from the Azure console?

Further, does adflabstaging need to be replaced by adflabstorage with the corresponding connection string and key?

Please help!

Changing protection level error in Module 2

When I try to deploy the ADFLab.isproj file in Module 2 I receive the following error:
Changing protection level - Value does not fall within expected range. (Microsoft.SqlServer.ManagedDTS)

LAB01 - AirlinePerformance-OLTP bacpac is corrupt

Hi

I might be doing something wrong here because it looks like other people have got past this point. The AirlinePerformance-OLTP database doesn't restore successfully. It returns a corrupt file error. I got this through both the Powershell script and by attempting to manually import the database from the bacpac file.

The shell of the database is created, with no user defined objects.

image
image

Thanks
Ian

Lab3: Send Success Email failed

I get the Azure Logic App URL set up in the LabModule 1 (my URL: https://prod-11.eastus2.logic.azure.com:443/workflows/237d5b5a01b24e03a937e38035c3274e/triggers/manual/paths/invoke?api-version=2016-10-01&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=tInhe9iSzhNMjmH82y96alwOpJRy0c_l6iLyK9ROf2k)

and followed the instruction to select post and fill in the body. But the Send email logic App component failed. The input of this component:
{
"url": "https://prod-11.eastus2.logic.azure.com:443/workflows/237d5b5a01b24e03a937e38035c3274e/triggers/manual/paths/invoke?api-version=2016-10-01&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=tInhe9iSzhNMjmH82y96alwOpJRy0c_l6iLyK9ROf2k",
"method": "POST",
"headers": {
"Content-type": "application/json"
},
"body": {
"dataFactoryName": "DataFactoryMei",
"successtype": "Succeeded",
"message": "159764059",
"pipelineName": "S3 to Blob Copy",
"recipient": "[email protected]"
}
}

The error:
{
"errorCode": "2108",
"message": "Error calling the endpoint. Response status code: ",
"failureType": "UserError",
"target": "Send Success Email"
}

I checked the adflab logicAppEmail and verified my account for the adflab Office365.
After I run the pipeline to send success email, I do not even see the activity in trigger history in the LogicAppEmail.

Can you tell me where is my problem?

Lab 3 - Creation of VSTS Git Project

The document refers to Lab 1 instructions that are non-existent

In Lab 01 we setup a VSTS Git project and will link it here. Github can also be used.

CreateAzureDW.sql not working

When executing the build script in powershell, it errors at the SQL DW Schema build. Error from Powershell below

"
Invoke-Sqlcmd : Parse error at line: 9, column: 1: Incorrect syntax near 'DimDate'.
At C:\Users\chris\azure\adflab\Deployment\LabDeploy.ps1:100 char:1
Invoke-Sqlcmd -inputFile "CreateAzureDW.sql" -ServerInstance "$sqlSer ...

CategoryInfo          : InvalidOperation: (:) [Invoke-Sqlcmd], SqlPowerShellSqlExecutionException
FullyQualifiedErrorId : SqlError,Microsoft.SqlServer.Management.PowerShell.GetScriptCommand

Invoke-Sqlcmd : Parse error at line: 110, column: 1: Incorrect syntax near 'END'.
At C:\Users\chris\azure\adflab\Deployment\LabDeploy.ps1:100 char:1
Invoke-Sqlcmd -inputFile "CreateAzureDW.sql" -ServerInstance "$sqlSer ...

CategoryInfo : InvalidOperation: (:) [Invoke-Sqlcmd], SqlPowerShellSqlExecutionException
FullyQualifiedErrorId : SqlError,Microsoft.SqlServer.Management.PowerShell.GetScriptCommand"
"

Tried executing in SQL, first few lines of error are here:

"Msg 15151, Level 16, State 1, Line 4
Cannot alter the role 'largerc', because it does not exist or you do not have permission.
Msg 102, Level 15, State 1, Line 32
Incorrect syntax near 'DISTRIBUTION'.
Msg 102, Level 15, State 1, Line 52
Incorrect syntax near 'DISTRIBUTION'. "

Update to DW Gen2

I'd recommend updating your lab to use SQL DW gen 2. The arm api has been updated to allow this. Your arm would need to update to the following:

    {
      "type": "databases",
      "name": "AirlinePerformance-DW",
      "apiVersion": "2017-10-01-preview",
      "location": "[resourceGroup().location]",
        "sku": {
            "name": "DW100c",
            "tier": "DataWarehouse"
        },
      "properties": {
        "collation": "SQL_Latin1_General_CP1_CI_AS"
      },

Unable to deploy ARM template on line 36 of LabDeploy.ps1

The primary error seems to be that the ARM template doesnt recognize the parameters event thought the JSON file is valid.

Main Error:
...
New-AzureRmResourceGroupDeployment : A parameter cannot be found that matches parameter name 'labNamePrefix'.
At C:\adflab\Deployment\LabDeploy.ps1:36 char:113

  • ... e $resourceGroupName -TemplateFile LabARM.json -labNamePrefix $labNam ...
  •                                                ~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidArgument: (:) [New-AzureRmResourceGroupDeployment], ParameterBindingException
    • FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGr
      oupDeploymentCmdlet
      ...

Configuration:
$dir = "C:\adflab" #Working directory of where your LabDeploy.ps1 file is located
$resourceGroupName = "adflab-test" #Name of Azure resource group to deploy the lab resrouces to, will create if it does not exist
$location = "East US 2" #Geo location of resource group, resources will use this as well
$labNamePrefix = "adflab" #prefix to append on to unique names such as SQLServer and Storage account
$sqlUsername = "labadmin" #SqlServer admin account
$sqlPassword = "*******" #SqlServer admin password
$logicAppEmail = "[email protected]" #O365 Account to send emails for lab
$subscriptionName = "Development" #Name of subscription to use for deployment

Full Log Attached as comment

Lab 5 - Running HiveQL Pipeline fails

I followed the instructions to setup the ondemand cluster and put the details about the HQL script to execute.

When doing a test run, I get the following error:

{
"errorCode": "105",
"message": "Object reference not set to an instance of an object.",
"failureType": "SystemError",
"target": "Run FAA Hive Merge"
}

Are there ways to troubleshoot pipelines in ADF? every error is pretty obscure.

JSON Code for pipeline:

{
"name": "Merge FAA Files",
"properties": {
"activities": [
{
"name": "Run FAA Hive Merge",
"type": "HDInsightHive",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 20
},
"typeProperties": {
"scriptPath": "input/FAAMerge.hql",
"arguments": [],
"defines": {},
"scriptLinkedService": {
"referenceName": "AzureStorage-Staging",
"type": "LinkedServiceReference"
}
},
"linkedServiceName": {
"referenceName": "HDI-ADFLab",
"type": "LinkedServiceReference"
}
}
]
}
}

JSON Code for Activity

{
"name": "Run FAA Hive Merge",
"type": "HDInsightHive",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 20
},
"typeProperties": {
"scriptPath": "input/FAAMerge.hql",
"arguments": [],
"defines": {},
"scriptLinkedService": {
"referenceName": "AzureStorage-Staging",
"type": "LinkedServiceReference"
}
},
"linkedServiceName": {
"referenceName": "HDI-ADFLab",
"type": "LinkedServiceReference"
}
}

LAB01 - Syntax Errors in CreateAzureDW.sql

The "GO" between TRUNCATE TABLE ADF.DimDate and INSERT INTO needs to be removed.

The "GO" between FROM dbo.DimDate and DECLARE @CurrentFactMaxDate needs to be removed.

Deployment fails: Gen1 Data Warehouse deprecated

Initial PS deployment fails with this as follows:
\Deployment> $ARMOutput = New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateFile LabARM.json -labNamePrefix $labNamePrefix -sqlUsername $sqlUsername -sqlPassword $sqlPassword -logicAppEmail $logicAppEmail
New-AzureRmResourceGroupDeployment : 23:01:59 - Resource Microsoft.Sql/servers/databases 'xxxxxxxlwfnhc/AirlinePerform
ance-DW' failed with message '{
"code": "45122",
"message": "'Gen1 Data Warehouse has been deprecated in this region. Please leverage Gen2 Data Warehouse.'",
"target": null,
"details": [
{
"code": "45122",
"message": "'Gen1 Data Warehouse has been deprecated in this region. Please leverage Gen2 Data Warehouse.'",
"target": null,
"severity": "16"
}
],
"innererror": []
}'
At line:1 char:14

  • ... ARMOutput = New-AzureRmResourceGroupDeployment -ResourceGroupName $re ...
  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
    • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDep
      loymentCmdlet

New-AzureRmResourceGroupDeployment : 23:02:46 - Template output evaluation skipped: at least one resource deployment op
eration failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage det
ails.
At line:1 char:14

  • ... ARMOutput = New-AzureRmResourceGroupDeployment -ResourceGroupName $re ...
  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
    • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDep
      loymentCmdlet

New-AzureRmResourceGroupDeployment : 23:02:46 - Template output evaluation skipped: at least one resource deployment op
eration failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage det
ails.
At line:1 char:14

  • ... ARMOutput = New-AzureRmResourceGroupDeployment -ResourceGroupName $re ...
  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
    • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDep
      loymentCmdlet

Lab 2 Configuring Connection String

Hi Team!

I'm having these errors in Lab 2 specifically in setting up the connections. I'm having these error messages.

image

I am also bit confused on what the connection string format should be, and which of these should I use. Please see screenshot.

image

My current configuration settings are:

Data Source=adflabsqlfj477.database.windows.net;User ID=labadmin;password=L@bP@ss01;Initial Catalog=AirlinePerformance-ODS;Provider=SQLNCLI11.1;Persist Security Info=True;Auto Translate=False;

Data Source=adflabsqlfj477.database.windows.net;User ID=labadmin;password=L@bP@ss01;Initial Catalog=AirlinePerformance-OLTP;Provider=SQLNCLI11.1;Persist Security Info=True;Auto Translate=False;

Data Source=adflabsqlfj477.database.windows.net;User ID=labadmin;password=L@bP@ss01;Initial Catalog=AirlinePerformance-DW;Provider=SQLNCLI11.1;Persist Security Info=True;Auto Translate=False;

image

I have tried mixing and matching several stuffs unfortunately I still get the same error message.

Thank you for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.