Git Product home page Git Product logo

insights's Issues

Update package manifest record to split tags

The PackageManifestRecord contains raw package tags as a string. Ideally these should be pre-processed into a parsed list for easier analysis. Currently you need to split tags with something like:

JverPackageManifests 
| where ResultType == "Available"
| extend Tags = translate(
    "  ", " ", translate(
    ",", " ", translate(
    ";", " ", translate(
    "\t", " ", Tags))))
| mv-expand split(Tags, " ")
| where isnotempty(Tags)

Tag splitting algorithm: https://github.com/NuGet/NuGet.Jobs/blob/376ac06e6e07d4ee8d1e28f6b2346e3891487496/src/Catalog/Helpers/Utils.cs#L37-L48

Transient network error leads to the wrong return value when releasing the Blob lease

Version

18d694c

Bug description

We found that transient network errors (e.g., network timeout or package drop) may happen when releasing the lease of a blob via ReleaseAsync SDK API, and there is no handling for the transient errors.

More precisely, if the effect has actually taken place in the server but the response of leaseClient.ReleaseAsync() never returns to the client on time due to transient network errors, the retry logic in Azure Storage SDK will send another release request again. Since the first release operation has released the lease successfully in the remote, there will be a Conflict error when the second request arrives.

The default value for shouldthrow is false in TryReleaseAsync, thus in the catch block, it will directly return false without any other handling. However, the operation indeed succeeded so it should return true instead of false. The following lines of code show the detail:

...
    try
    {
        await leaseClient.ReleaseAsync();
        return true;
    }
    catch (RequestFailedException ex) when (ex.Status == (int)HttpStatusCode.Conflict)
    {
        if (shouldThrow)
        {
            throw new InvalidOperationException(StorageLeaseResult.AcquiredBySomeoneElse, ex);
        }
        else
        {
            return false;
...

How to reproduce

As shown above, when the first release request sent by leaseClient.ReleaseAsync() in TryReleaseAsync succeeds in the remote but transient errors happen, the second retry request will lead to a Conflict error, which is handled wrongly.

Discussion

We should distinguish between the errors led by contention or transient error. For transient ones, we can ignore the exception and return true. We believe a similar bug also happens when shouldthrow is true because InvalidOperationException should not be thrown when transient errors happen.

SAS token may expire before the Azure Blob operation finishes

Description

For Blob service, GetUserDelegationKeyAsync is used in GetServiceClientsAsync to get the delegation key and then use the key to sign the SAS token in GetBlobReadUrlAsync via BlobSasBuilder.

But I notice that the token expiration time is 1 hour, and it will be refreshed at the half time (i.e., 30 mins), so I wonder whether it is possible to have a single operation that cannot be finished in 30 mins. I think IKustoQueuedIngestClient.IngestFromStorageAsync uses the Blob SAS token to access data in the remote. If there is a large file to ingest, which may need more than 30 mins, it is possible to encounter an unexpected HTTP 403 Forbidden error, right?

Test integration with Azurite and update docs

Per @khalidabuhakmeh:
https://twitter.com/buhakmeh/status/1394990191740346370

Blocked by:

Intermittent transient errors trigger retry mechanism of Azure SDK APIs and lead to Conflict/NotFound errors

Version

18d694c

Description

We found that transient network errors (e.g., network timeout or package drop) may happen during the execution of Azure SDK APIs, and the retry mechanism of SDK would lead to Conflict/Not Found errors. There are several places we found the errors may happen: AddEntity, DeleteEntity, UpdateEntity, AddEntity, UpdateEntity, UpdateEntity, UpdateEntity.

How to reproduce

More precisely, if the effect has actually taken place in the server but the response never returns to the client on time due to transient network errors, the retry logic in Azure Storage SDK will send another request again. Since the first operation has been successfully done in the remote, there will be a Conflict or Not Found error when the second request arrives, DeleteEntity will lead to 404, AddEntity will encounter 409.

Discussion

A better practice could be to wrap these APIs into the try-catch block and handle the RequestFailedException exceptions as the lease acquire. We are willing to contribute to this, however, we are not sure how to do it in an elegant way. Is it possible to wrap each of them into a try-catch block?

Add readme content to Kusto

Things to ensure:

  • Legacy readme changes get picked up from time to time
  • Readme content might be too big sometimes
  • Add projected text content for easy searching

Fix PackageAssemblyToCsv handling of non-Int Enums

See dotnet/runtime#57531 (comment) for details.

The code in question is here:

private class TypelessDecoder : ICustomAttributeTypeProvider<object>
{
private int _arrayCount;
public int ArrayCount => _arrayCount;
public object GetPrimitiveType(PrimitiveTypeCode typeCode) => null;
public object GetSystemType() => null;
public object GetSZArrayType(object elementType)
{
Interlocked.Increment(ref _arrayCount);
return null;
}
public object GetTypeFromDefinition(MetadataReader reader, TypeDefinitionHandle handle, byte rawTypeKind) => null;
public object GetTypeFromReference(MetadataReader reader, TypeReferenceHandle handle, byte rawTypeKind) => null;
public object GetTypeFromSerializedName(string name) => null;
public PrimitiveTypeCode GetUnderlyingEnumType(object type) => PrimitiveTypeCode.Int32;
public bool IsSystemType(object type) => false;
}

Support macOS Builds

Currently running dotnet restore and dotnet build on macOS ends with the following failure.

โžœ dotnet build
Microsoft (R) Build Engine version 16.10.0-preview-21181-07+073022eb4 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  All projects are up-to-date for restore.
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  SourceGenerator -> /Users/khalidabuhakmeh/Projects/Dotnet/Insights/artifacts/SourceGenerator/bin/Debug/netstandard2.0/NuGet.Insights.SourceGenerator.dll
  Logic -> /Users/khalidabuhakmeh/Projects/Dotnet/Insights/artifacts/Logic/bin/Debug/netcoreapp3.1/NuGet.Insights.Logic.dll
/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/CatalogScan/Drivers/NuGetPackageExplorerToCsv/NuGetPackageExplorerFileRecord.cs(5,15): error CS0234: The type or namespace name 'AssemblyMetadata' does not exist in the namespace 'NuGetPe' (are you missing an assembly reference?) [/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/Worker.Logic.csproj]
/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/CatalogScan/Drivers/NuGetPackageExplorerToCsv/NuGetPackageExplorerFileRecord.cs(42,16): error CS0246: The type or namespace name 'PdbType' could not be found (are you missing a using directive or an assembly reference?) [/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/Worker.Logic.csproj]
  Logic.Test -> /Users/khalidabuhakmeh/Projects/Dotnet/Insights/artifacts/Logic.Test/bin/Debug/netcoreapp3.1/NuGet.Insights.Logic.Test.dll

Build FAILED.

/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/CatalogScan/Drivers/NuGetPackageExplorerToCsv/NuGetPackageExplorerFileRecord.cs(5,15): error CS0234: The type or namespace name 'AssemblyMetadata' does not exist in the namespace 'NuGetPe' (are you missing an assembly reference?) [/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/Worker.Logic.csproj]
/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/CatalogScan/Drivers/NuGetPackageExplorerToCsv/NuGetPackageExplorerFileRecord.cs(42,16): error CS0246: The type or namespace name 'PdbType' could not be found (are you missing a using directive or an assembly reference?) [/Users/khalidabuhakmeh/Projects/Dotnet/Insights/src/Worker.Logic/Worker.Logic.csproj]
    0 Warning(s)
    2 Error(s)

Index VS Code extensions

The NuGet Insights project lets us understand the .NET ecosystem. This has been a huge win for the NuGet team.

We should also index VS Code extensions to better understand VS Code's ecosystem.

Add table for package signing certificate chains

Add a table to answer questions like:

  • How many package signatures use a certificate that is revoked? What is the revocation date? How many packages are affected?
  • How many package signatures use a certificate that chains to an untrusted root on Windows?
  • How many package signatures use a certificate that is invalid?

This table should contain all certificates used to author sign, repository sign, or timestamp packages. This certificate data should be joinable against JverPackageSignatures.

Consider reusing: https://github.com/NuGet/NuGet.Jobs/blob/main/src/Validation.PackageSigning.ValidateCertificate/OnlineCertificateVerifier.cs

Improve recovery from Kusto query validation steps

From time to time the ingestion pipeline gets blocked because a Kusto validation query fails. Example:

A Kusto validation query failed.
Validation label: full outer set comparison of NiCatalogLeafItems_Temp.Identity and NiPackageSignatures_Temp.Identity
Error: The set of values in the Identity columns in the NiCatalogLeafItems_Temp and NiPackageSignatures_Temp tables do not match.
Identity values in NiCatalogLeafItems_Temp but not NiPackageSignatures_Temp:
- Count: 1
- Sample: ["drewsubmissiontest/1.0.0"]
Identity values in NiPackageSignatures_Temp but not NiCatalogLeafItems_Temp:
- Count: 0
- Sample: []

NiCatalogLeafItems_Temp
| distinct Identity
| join kind=fullouter (
NiPackageSignatures_Temp
| distinct Identity
) on Identity
| where isempty(Identity) or isempty(Identity1)
| summarize
LeftOnlyCount = countif(isnotempty(Identity)),
LeftOnlySample = make_set_if(Identity, isnotempty(Identity), 5),
RightOnlyCount = countif(isnotempty(Identity1)),
RightOnlySample = make_set_if(Identity1, isnotempty(Identity1), 5)

I think there's some race condition related that causes this to happen sometimes.

We should have an easy way to abort the current Kusto ingestion and re-run the whole workflow from the beginning.

Ensure "package downloads" and other reports contain all package identities

Currently the "package downloads" report does not have all package identities. This report should be joined with kind=leftouter.

Ideally, all tables should have the same set of distinct Identity values. We can consider using the "package versions" report to fill missing records with sensible default data. For example, if a package exists in "package versions" but not in "package downloads", we could insert a record for tha missing package with downloads of 0.

Add an integer that maps to SemVer order within a package ID

This is to ease the SemVer sorting complexity on the data query side. If there is a simple integer that says what position a version version is in the list of ordered versions in that ID, things could be easier.

Version VersionIndex
2.0.0 0
2.0.1 1
10.0.0 2

Create public dashboards

As we improve the .NET ecosystem we need to understand the adoption of new features and best practices. This will help us make informed engineering investments. This could be done using public Power BI dashboards of NuGet insight's data. For example:

image

Microsoft employees can play with my dashboard prototype here: https://msit.powerbi.com/groups/me/reports/0c673992-f323-44b5-be81-f8e75afbaee0/ReportSection24621557a494a04b6f43

This is similar to the ASP.NET Core team's public Power BI dashboards: https://aka.ms/aspnet/benchmarks

We can use Power BI's data refresh feature to keep these dashboards up-to-date: https://docs.microsoft.com/en-us/power-bi/connect-data/refresh-data#data-refresh

Errors met when running unit tests under "Insights\test\Logic.Test" folder

[xUnit.net 00:00:04.97]     Knapcode.ExplorePackages.CatalogCommitTimestampProviderTest.ReturnsExpectedFirstTimestamps [FAIL]
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.
  Failed Knapcode.ExplorePackages.CatalogCommitTimestampProviderTest.ReturnsExpectedFirstTimestamps [2 s]
  Error Message:
   System.InvalidProgramException : The JIT compiler encountered invalid IL code or an internal limitation.
  Stack Trace:
     at Azure.Data.Tables.Queryable.ExpressionWriter.ConvertExpressionToString(Expression e)
   at Azure.Data.Tables.Queryable.ExpressionWriter.ExpressionToString(Expression e)
   at Azure.Data.Tables.Queryable.ExpressionParser.VisitLambda(LambdaExpression lambda)
   at Azure.Data.Tables.Queryable.LinqExpressionVisitor.Visit(Expression exp)
   at Azure.Data.Tables.Queryable.ExpressionParser.Translate(Expression e)
   at Azure.Data.Tables.TableClient.Bind(Expression expression)
   at Azure.Data.Tables.TableServiceClient.QueryAsync(Expression`1 filter, Nullable`1 maxPerPage, CancellationToken cancellationToken)
   at Knapcode.ExplorePackages.TableExtensions.QueryAsync(TableServiceClient client, String prefix) in C:\Users\XXX\Insights-consistency\src\Logic\Storage\TableExtensions.cs:line 16
   at Knapcode.ExplorePackages.BaseLogicIntegrationTest.DisposeAsync() in C:\Users\XXX\Insights-consistency\test\Logic.Test\TestSupport\BaseLogicIntegrationTest.cs:line 289
  Standard Output Messages:
 [INF]   GET https://api.nuget.org/v3/catalog0/index.json
 [INF]   OK https://api.nuget.org/v3/catalog0/index.json 350ms
 [INF]   GET https://api.nuget.org/v3/catalog0/page0.json
 [INF]   OK https://api.nuget.org/v3/catalog0/page0.json 77ms
 [INF] Using the configured storage connection string.
 [INF] Blob endpoint: http://127.0.0.1:10000/devstoreaccount1
 [INF] Queue endpoint: http://127.0.0.1:10001/devstoreaccount1

This is one example of the error System.InvalidProgramException : The JIT compiler encountered invalid IL code or an internal limitation. when I ran this release version, and I have encountered many same errors when running the lastest version and the old release version. I have not found any helpful solutions when searching this on Google. Could you please give me some advice on how to solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.