Git Product home page Git Product logo

batch-insights's Introduction

Batch Insights

PROJECT STATUS

This project is no longer actively maintained. Please see the main Azure Batch GitHub repository for more information about Azure Batch.

Build Status

Azure Batch Insights is a tool used to get system statistics for your Azure Batch account nodes.

Usage (New)

Create Application Insights account

  1. Goto the Azure portal
  2. Search for Application Insights
  3. Create or use an existing one(Application type input doesn't matter)

Configure your Azure Batch pool start task

Set 3 environment variables in your start task. Make sure this is set as a Batch environment variable rather than exporting. Without the Batch environment variable it will not show up in Batch Explorer. Then set the start task user to be Pool Admin(Task admin might work too)

  • APP_INSIGHTS_INSTRUMENTATION_KEY: This your app insight instrumentation key

On the application insight blade in the Azure Portal

  • APP_INSIGHTS_APP_ID: This is your app insight application id

On the application insight blade in the Azure Portal

  • BATCH_INSIGHTS_DOWNLOAD_URL: This is the link to the exe to run. To find this go to the releases and get the link to the release you need

For example:

Linux

Add this to your start task

# For version 1.x of batch insights
/bin/bash -c 'wget  -O - https://raw.githubusercontent.com/Azure/batch-insights/master/scripts/1.x/run-linux.sh | bash'

# For latest version of batch insights
/bin/bash -c 'wget  -O - https://raw.githubusercontent.com/Azure/batch-insights/master/scripts/run-linux.sh | bash'

Windows

Add this to your start task

# For version 1.x of batch insights
cmd /c @"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Azure/batch-insights/master/scripts/1.x/run-windows.ps1'))"

# For latest version of batch insights
cmd /c @"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Azure/batch-insights/master/scripts/run-windows.ps1'))"

Note: The script used above just downloads the executable at the BATCH_INSIGHTS_DOWNLOAD_URL URL and run it in the background. You can download it some other way and start it separately.

Python Usage (Old)

Ubuntu

Add this command in your start task commandLine:

/bin/bash -c 'wget  -O - https://raw.githubusercontent.com/Azure/batch-insights/master/ubuntu.sh | bash'

Centos

Add this command in your start task commandLine:

/bin/bash -c 'wget  -O - https://raw.githubusercontent.com/Azure/batch-insights/master/centos.sh | bash'

Windows

cmd /c @"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Azure/batch-insights/master/windows.ps1'))"

Generic

If you already have a version of python installed you just need to download nodestats.py and install dependencies You can add this to your main script:

pip install psutil python-dateutil applicationinsights==0.11.3
wget --no-cache https://raw.githubusercontent.com/Azure/batch-insights/master/nodestats.py
python --version
python nodestats.py > batch-insights.log 2>&1 &

Configuration

See available configuration options

You can set the AZ_BATCH_INSIGHTS_ARGS environemnt variable to pass parameters to the tool. e.g. AZ_BATCH_INSIGHTS_ARGS > --disable networkIO --aggregation 5

View data

Option 1: Batch Explorer

BatchLabs is a desktop app used to manage, debug and monitor your azure batch accounts. You can download it here If you followed the getting started instruction batchlabs should show you the statistics for each of your pool.

Option 2:

Use the app insights tools to build your own query on the Azure Portal

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

batch-insights's People

Contributors

dpwatrous avatar edwardsp avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar porges avatar smith1511 avatar timotheeguerin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

batch-insights's Issues

Crash in v1.1.0 using standard_d2s_v3 VMs

The app insights uploader crashes with the following stacktrace

Using v1.1.0, VMs are all standard_d2s_v3

File path: D:\batch\tasks\startup\wd\node-stats.err.log

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x660465]

goroutine 1 [running]:
github.com/Azure/batch-insights/pkg.AppInsightsService.UploadStats(0x788a20, 0xc0000058a0, 0xc0000ee000, 0x0, 0x0, 0x0, 0xc000014ca0, 0x2, 0x2, 0x0, ...)
	D:/a/1/s/pkg/appinsights.go:43 +0x495
github.com/Azure/batch-insights/pkg.ListenForStats(0xc000024090, 0xb, 0xc00000a150, 0x26, 0xc00000a180, 0x24)
	D:/a/1/s/pkg/batchinsights.go:73 +0x3f4
main.main()
	D:/a/1/s/main.go:32 +0x321

batch insights crashes with "invalid memory address or nil pointer dereference"

i am getting the following error when starting batch insights in a node

System information:
OS: windows
Pool ID: weid-log-poc-6
Node ID: tvmps_bf84e465f618415c5d708b5f482b966099754c72da408d20f3817451c0ac7676_p
Instrumentation Key: xxxxx
context deadline exceeded
context deadline exceeded
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x6648d5]

goroutine 1 [running]:
github.com/Azure/batch-insights/pkg.AppInsightsService.UploadStats(0x752e40, 0xc0000052a0, 0xc0000d8090, 0x0, 0x0, 0x0, 0xc000030d20, 0x2, 0x2, 0x0, ...)
D:/a/1/s/pkg/appinsights.go:45 +0x465
github.com/Azure/batch-insights/pkg.ListenForStats(0xc0000140b0, 0xe, 0xc00000e140, 0x48, 0xc00000c150, 0x24)
D:/a/1/s/pkg/batchinsights.go:68 +0x353
main.main()
D:/a/1/s/main.go:32 +0x321

the pool was created in cloud service configuration
and i am sure i have provided the correct environment variables in the pool start task environment variables
new EnvironmentSetting("APP_INSIGHTS_INSTRUMENTATION_KEY", "xxx"),
new EnvironmentSetting("APP_INSIGHTS_APP_ID", "xxx"),
new EnvironmentSetting("BATCH_INSIGHTS_DOWNLOAD_URL", "https://github.com/Azure/batch-insights/releases/download/v1.0.0/batch-insights.exe")

I'm stupid but

--2021-02-19 18:43:33-- https://raw.githubusercontent.com/Azure/batch-insights/master/scripts/1.x/run-linux.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 169 [text/plain]
Saving to: 'STDOUT'

 0K                                                       100% 18.5M=0s

2021-02-19 18:43:33 (18.5 MB/s) - written to stdout [169/169]

http://: Invalid host name.

image

Add support for disk usage

Should get disk usage for os disk and resource disk by default.
Maybe provide a way to specify which other disk to watch

Confusing CPU stats showing in Batch Labs

What is the CPU usage showing in BatchLabs? I have 4 dedicated nodes each with 16 cores and BatchLabs shows the following:

cpuusageinbatchlabs

I don't understand what the 10 CPUs are referring to.

Batch Insight crash if instrumentation key is missing

Hi, any ideas why it does not work on Ubuntu 16.04?

Here is a log:

2019-03-15 10:41:49 (12.1 MB/s) - './batch-insights' saved [7474080/7474080]

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x6719b9]

goroutine 1 [running]:
github.com/Azure/batch-insights/pkg.hideSecret(...)
	/home/vsts/work/1/s/pkg/config.go:148
github.com/Azure/batch-insights/pkg.UserConfig.Print(0xc000010d40, 0xc000010d50, 0x0, 0xc000010da0, 0x1, 0x1, 0xc000018578, 0xc000010db0, 0x1, 0x1)
	/home/vsts/work/1/s/pkg/config.go:32 +0x159
main.main()
	/home/vsts/work/1/s/main.go:94 +0x43e
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 

BatchLabs does not display metrics

Hi,

I have added the script to a Windows DSVM using C#:

var applicationInsightsId = Environment.GetEnvironmentVariable("APP_INSIGHTS_APP_ID");
var applicationInsightsInstrumentationKey = Environment.GetEnvironmentVariable("APPINSIGHTS_INSTRUMENTATIONKEY");
var environmentSettings = new List<EnvironmentSetting>
{
    new EnvironmentSetting("APP_INSIGHTS_APP_ID", applicationInsightsId),
    new EnvironmentSetting("APP_INSIGHTS_INSTRUMENTATION_KEY", applicationInsightsInstrumentationKey)
}

// Adding Application Insights to Azure Batch.
// See also https://github.com/Azure/batch-insights.
var startTask = new StartTask
{
    CommandLine = "cmd /c @\"%SystemRoot%\\System32\\WindowsPowerShell\\v1.0\\powershell.exe\" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command \"iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Azure/batch-insights/master/windows.ps1'))\"",
    UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin)),
    WaitForSuccess = true,
    EnvironmentSettings = environmentSettings
};

pool.StartTask = startTask;

I can see that the script ran successfully:

Getting latest version of the Chocolatey package for download.
Getting Chocolatey from https://chocolatey.org/api/v2/package/chocolatey/0.10.8.
Downloading 7-Zip commandline tool prior to extraction.
Extracting D:\Users\_azbatchtask_start\AppData\Local\Temp\chocolatey\chocInstall\chocolatey.zip to D:\Users\_azbatchtask_start\AppData\Local\Temp\chocolatey\chocInstall...
Installing chocolatey on this machine
Creating ChocolateyInstall as an environment variable (targeting 'Machine') 
  Setting ChocolateyInstall to 'C:\ProgramData\chocolatey'
WARNING: It's very likely you will need to close and reopen your shell 
  before you can use choco.
Restricting write permissions to Administrators
We are setting up the Chocolatey package repository.
The packages themselves go to 'C:\ProgramData\chocolatey\lib'
  (i.e. C:\ProgramData\chocolatey\lib\yourPackageName).
A shim file for the command line goes to 'C:\ProgramData\chocolatey\bin'
  and points to an executable in 'C:\ProgramData\chocolatey\lib\yourPackageName'.

Creating Chocolatey folders if they do not already exist.

WARNING: You can safely ignore errors related to missing log files when 
  upgrading from a version of Chocolatey less than 0.9.9. 
  'Batch file could not be found' is also safe to ignore. 
  'The system cannot find the file specified' - also safe.
chocolatey.nupkg file not installed in lib.
 Attempting to locate it from bootstrapper.
PATH environment variable does not have C:\ProgramData\chocolatey\bin in it. Adding...
WARNING: Not setting tab completion: Profile file does not exist at 
'D:\Users\_azbatchtask_start\Documents\WindowsPowerShell\Microsoft.PowerShell_p
rofile.ps1'.
Chocolatey (choco.exe) is now ready.
You can call choco from anywhere, command line or powershell by typing choco.
Run choco /? for a list of functions.
You may need to shut down and restart powershell and/or consoles
 first prior to using choco.
Ensuring chocolatey commands are on the path
Ensuring chocolatey.nupkg is in the lib folder
Chocolatey v0.10.8
Installing the following packages:
python
By installing you accept licenses for the packages.

Progress: Downloading python3 3.6.3... 13%
Progress: Downloading python3 3.6.3... 39%
Progress: Downloading python3 3.6.3... 66%
Progress: Downloading python3 3.6.3... 93%
Progress: Downloading python3 3.6.3... 100%

Progress: Downloading python 3.6.3... 20%
Progress: Downloading python 3.6.3... 63%
Progress: Downloading python 3.6.3... 100%

python3 v3.6.3 [Approved]
python3 package files install completed. Performing other installation steps.
Downloading python3 64 bit
  from 'https://www.python.org/ftp/python/3.6.3/python-3.6.3-amd64.exe'

Progress: 0% - Saving 26.78 KB of 30.16 MB
Progress: 0% - Saving 160 KB of 30.16 MB
Progress: 1% - Saving 320 KB of 30.16 MB
Progress: 1% - Saving 480 KB of 30.16 MB
And some time later:
Progress: 99% - Saving 30 MB of 30.16 MB
Progress: 100% - Saving 30.16 MB of 30.16 MB
Progress: 100% - Completed download of D:\Users\_azbatchtask_start\AppData\Local\Temp\chocolatey\python3\3.6.3\python-3.6.3-amd64.exe (30.16 MB).
Download of python-3.6.3-amd64.exe (30.16 MB) completed.
Hashes match.
Installing python3...
python3 has been installed.
Installed to 'C:\Python36'
  python3 can be automatically uninstalled.
Environment Vars (like PATH) have changed. Close/reopen your shell to
 see the changes (or in powershell/cmd.exe just type `refreshenv`).
 The install of python3 was successful.
  Software installed as 'EXE', install location is likely default.

python v3.6.3 [Approved]
python package files install completed. Performing other installation steps.
 The install of python was successful.
  Software install location not explicitly set, could be in package or 
  default install location if installer.

Chocolatey installed 2/2 packages. 
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).

Enjoy using Chocolatey? Explore more amazing features to take your 
experience to the next level at
 https://chocolatey.org/compare
Current path: C:\Python36\Scripts\;C:\Python36\;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin;C:\Anaconda;C:\Anaconda\Library\mingw-w64\bin;C:\Anaconda\Library\usr\bin;C:\Anaconda\Library\bin;C:\Anaconda\Scripts;C:\Program Files\Microsoft MPI\Bin\;C:\Program Files\Microsoft SQL Server\110\Tools\Binn;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Microsoft SQL Server\130\DTS\Binn\;C:\Program Files\Microsoft SQL Server\130\DTS\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\120\Tools\Binn\;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI\wbin;C:\dsvm\tools\mxnet\\lib;C:\dsvm\tools\mxnet\\3rdparty\cudnn\bin;C:\dsvm\tools\mxnet\\3rdparty\cudart;C:\dsvm\tools\mxnet\\3rdparty\vc;C:\dsvm\tools\mxnet\\3rdparty\gnuwin;C:\dsvm\tools\mxnet\\3rdparty\openblas\bin;C:\dsvm\tools\bin;C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy;C:\dsvm\tools\DataMovement\ADL;C:\dsvm\tools\DataMovement\DocumentDB;C:\Program Files\Microsoft\R Server\R_SERVER\bin\x64;C:\Program Files\Zulu\zulu8.17.0.3-jdk8.0.102-win_x64\jre\bin;c:\dsvm\tools\cntk2\cntk;C:\Program Files (x86)\Pandoc\;c:\Program Files\Zulu\zulu8.17.0.3-jdk8.0.102-win_x64\jre\bin\server;C:\JuliaPro-0.5.0.2\Julia-0.5.0\bin;C:\Program Files\dotnet\;C:\Program Files\nodejs\;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\;C:\Program Files\Git\cmd;C:\Program Files\Git\usr\bin;C:\ProgramData\chocolatey\bin;;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin;C:\Anaconda;C:\Anaconda\Library\mingw-w64\bin;C:\Anaconda\Library\usr\bin;C:\Anaconda\Library\bin;C:\Anaconda\Scripts;C:\Program Files\Microsoft MPI\Bin\;C:\Program Files\Microsoft SQL Server\110\Tools\Binn;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Microsoft SQL Server\130\DTS\Binn\;C:\Program Files\Microsoft SQL Server\130\DTS\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\Microsoft SQL Server\120\Tools\Binn\;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program Files (x86)\Microsoft SDKs\Azure\CLI\wbin;C:\dsvm\tools\mxnet\\lib;C:\dsvm\tools\mxnet\\3rdparty\cudnn\bin;C:\dsvm\tools\mxnet\\3rdparty\cudart;C:\dsvm\tools\mxnet\\3rdparty\vc;C:\dsvm\tools\mxnet\\3rdparty\gnuwin;C:\dsvm\tools\mxnet\\3rdparty\openblas\bin;C:\dsvm\tools\bin;C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy;C:\dsvm\tools\DataMovement\ADL;C:\dsvm\tools\DataMovement\DocumentDB;C:\Program Files\Microsoft\R Server\R_SERVER\bin\x64;C:\Program Files\Zulu\zulu8.17.0.3-jdk8.0.102-win_x64\jre\bin;c:\dsvm\tools\cntk2\cntk;C:\Program Files (x86)\Pandoc\;c:\Program Files\Zulu\zulu8.17.0.3-jdk8.0.102-win_x64\jre\bin\server;C:\JuliaPro-0.5.0.2\Julia-0.5.0\bin;C:\Program Files\dotnet\;C:\Program Files\nodejs\;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\;C:\Program Files\Git\cmd;C:\Program Files\Git\usr\bin;D:\batch\tasks\shared;D:\batch\tasks\startup\wd
Python version:
Python 3.6.3
Collecting psutil
  Downloading psutil-5.4.3-cp36-cp36m-win_amd64.whl (226kB)
Collecting python-dateutil
  Downloading python_dateutil-2.7.0-py2.py3-none-any.whl (207kB)
Collecting applicationinsights
  Downloading applicationinsights-0.11.2.tar.gz (45kB)
Collecting six>=1.5 (from python-dateutil)
  Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: psutil, six, python-dateutil, applicationinsights
  Running setup.py install for applicationinsights: started
    Running setup.py install for applicationinsights: finished with status 'done'
Successfully installed applicationinsights-0.11.2 psutil-5.4.3 python-dateutil-2.7.0 six-1.11.0
Downloading nodestats.py
Starting App insights background process in D:\batch\tasks\startup\wd

TaskPath                                       TaskName                        
--------                                       --------                        
\                                              batchappinsights                
\                                              batchappinsights                

But when I want to view the metrics in BatchLabs, they do not appear:
image

How can I troubleshoot the problem?

Ambiguous wording in Readme

In the section on setting up environment variables, it's unclear whether APP_INSIGHTS_APP_ID is meant to be set to the resource's "ApplicationId" or "AppId" property. This caused me (and possibly others looking at the Issues) to get no monitoring after following the instructions because I guessed and chose wrong to start.

CPU consumptio showing different values on Batch explorer and App insights & Lock issue for node post every job run.

Hi Team,
while processing data , node wise it is showing different consumption of CPU on BatchExplorer around 40% and in Telemetry it is showing almost 100%.

Every time am facing this issue. i have to reboot node for every job.

--2020-10-21 10:15:28-- https://raw.githubusercontent.com/Azure/batch-insights/master/ubuntu.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.56.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.56.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 283 [text/plain]
Saving to: 'STDOUT'

 0K                                                       100% 16.2M=0s

2020-10-21 10:15:29 (16.2 MB/s) - written to stdout [283/283]

E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

Pre-aggregate metrics to reduce cost

Uploading all those metrics every 5 seconds with a large pool can result in a high cost from application insights. We can aggregate data locally to a default of 1 minute(Minimum app insights will aggregate anyway)

How Do I view Batch-Insights metrics from the Azure Portal?

From the ReadMe:

Option 2:
Use the app insights tools to build your own query on the Azure Portal

Is there any documentation anywhere on how exactly to do that? Batch-Explorer is a nice tool, but the graphs aren't very flexible and the scaling is a little wonky. AppInsights in the Azure Portal seems to have a lot more options, but I can't seem to find a way to get to the data logged by the Batch-Insights tool.

batch-insights running but no metrics available in both BatchExplorer and Application Insights

Use case

We have dynamic pools of 1 server each that we use to weekly run applications.
We want to enable Application Insights for such servers but also have the metrics in BatchExplorer available.

Setup

I have a pool with the Start Task configured like this:
image

Inside ./scripts/start_task.sh I have a few commands to prepare the pool and in the end:
image

I mostly followed readme.md except having the run command inside my custom script.
The value of BATCH_INSIGHTS_DOWNLOAD_URL is https://github.com/Azure/batch-insights/releases/download/v1.3.0/batch-insights.

Issue

There is no values in BatchExplorer nor Application Insights.

Diagnostic attempts

I looked into the contents of batch-insights.log:

User configuration:
   Pool ID: xxxxx
   Node ID: xxxxx
   Instrumentation Key: xxxxx
   Aggregation: 1
   Disable: []
   Monitoring processes: []
BatchInsights configuration:
   Pool ID: xxxxx
   Node ID: xxxxx
   Instrumentation Key: xxxxx
   Aggregation: 1m0s
   Sampling rate: 5000000000
   Disable: {"diskIO":false,"diskUsage":false,"networkIO":false,"gpu":false,"cpu":false,"memory":false}
   Monitoring processes: []
System information:
   OS: linux
No GPU detected. Nvidia driver might be missing. Error while initializing NVML could not load NVML library

If I ssh to the node I see batch-insights running:
image

I am using ubuntuserver 18_04-lts-gen2 but I also tested with 20.04 with the same result.

Error when executing the ubuntu script

Here is the error I'm getting when running the ubuntu.sh script in a startup task from a pool deployed using custom image.

E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?

Register-ScheduledTask : Access is denied

Hi,

I am getting below error while executing from task "https://raw.githubusercontent.com/Azure/batch-insights/master/scripts/1.x/run-windows.ps1".

At line:29 char:1

  • Register-ScheduledTask -Action $action -Principal $principal -TaskNam ...
  •   + CategoryInfo          : PermissionDenied: (PS_ScheduledTask:Root/Microsoft/...S_ScheduledTask) [Register-Schedul 
     edTask], CimException
      + FullyQualifiedErrorId : HRESULT 0x80070005,Register-ScheduledTask
    
    

Start-ScheduledTask : The system cannot find the file specified.
At line:31 char:1

  • Start-ScheduledTask -TaskName "batchappinsights";
  •   + CategoryInfo          : ObjectNotFound: (PS_ScheduledTask:Root/Microsoft/...S_ScheduledTask) [Start-ScheduledTas 
     k], CimException
      + FullyQualifiedErrorId : HRESULT 0x80070002,Start-ScheduledTask
    
    

Get-ScheduledTask : No MSFT_ScheduledTask objects found with property 'TaskName' equal to 'batchappinsights'. Verify
the value of the property and retry.
At line:32 char:1

  • Get-ScheduledTask -TaskName "batchappinsights";
  •   + CategoryInfo          : ObjectNotFound: (batchappinsights:String) [Get-ScheduledTask], CimJobException
      + FullyQualifiedErrorId : CmdletizationQuery_NotFound_TaskName,Get-ScheduledTask
    
    
    
    

Regards,
Anshul Gupta

Misleading documentation

Probably it's not this project issue, but rather BatchLabs (Azure/BatchExplorer#1348). However it might be confusing on both sides. Anyway I think people will look for a solution here as well - same as I did it, and didn't find it so easily.

In BatchLabs before I configured AppInsights for my Windows pool, there was a documentation saying how it should be configured (copy-pasting it without any modification).

  1. Add APP_INSIGHTS_KEY environment variable in the start task(App insights instrumentation key)
  2. Add APP_INSIGHTS_APP_ID environment variable in the start task(App insights application ID)
  3. Add os specific one liner in your start task see doc (https://github.com/Azure/batch-insights#ubuntu)

I did everything as described, but AppInsights weren't working. Only after investigating windows.ps1 I saw that environment variable should be called APP_INSIGHTS_INSTRUMENTATION_KEY, not APP_INSIGHTS_KEY.
Seems like minor issue, but it's quite hard to spot such ones.

For non-windows setups - it might be not an issue, because it's scheduled in another way, and nodestats.py supports both variants.

No metrics shown in batch explorer and app insights

Batch configured, but metrics are not collected

start\stdout.log

SUCCESS: Specified value was saved.
Starting App insights background process in D:\batch\tasks\startup\wd

TaskPath                                       TaskName                          State     
--------                                       --------                          -----     
\                                              batchappinsights                  Ready     
\                                              batchappinsights                  Running   

startup\wd\nodestats.log

�[33mWARN�[0m Using postional arguments for Node ID, PoolID, KEY and  Process names is deprecated. Use --poolID, --nodeID, --instKey, --process 
�[33mWARN�[0m It will be removed in 2.0.0                  
User configuration:
   Pool ID: psl1
   Node ID: tvmps_******
   Instrumentation Key: xxxxx
   Aggregation: 1
   Disable: []
   Monitoring processes: [--poolID]
BatchInsights configuration:
   Pool ID: psl1
   Node ID: tvmps_966f3398e2dcc0f5778aa0650e26720476375e2eebd4dddbb14d1bbd4bc80110_d
   Instrumentation Key: xxxxx
   Aggregation: 1m0s
   Sampling rate: 5000000000
   Disable: {"diskIO":false,"diskUsage":false,"networkIO":false,"gpu":false,"cpu":false,"memory":false}
   Monitoring processes: [--poolID]
System information:
   OS: windows
No GPU detected. Nvidia driver might be missing. Error while initializing NVML NVIDIA driver is not loaded

Batch configuration

image

image

Hitting chocolatey rate limits

I'm getting this error when running the startup command on windows low priority nodes

Exception calling "DownloadFile" with "2" argument(s): "The remote server returned an error: (429) Too Many Requests."
At line:167 char:3
+   $downloader.DownloadFile($url, $file)
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : WebException
 
ERROR: The system cannot find the file specified.
D:\Users\PoolAdmin1469656965\AppData\Local\Temp\chocolatey\chocInstall\chocolatey.zip

Looking at the Chocolatey troubleshooting docs I see that there is per IP rate limits which appear to be the cause of this error.

Is there a solution to this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.