Git Product home page Git Product logo

machinelearning's Introduction

Machine Learning for .NET

ML.NET is a cross-platform open-source machine learning (ML) framework for .NET.

ML.NET allows developers to easily build, train, deploy, and consume custom models in their .NET applications without requiring prior expertise in developing machine learning models or experience with other programming languages like Python or R. The framework provides data loading from files and databases, enables data transformations, and includes many ML algorithms.

With ML.NET, you can train models for a variety of scenarios, like classification, forecasting, and anomaly detection.

You can also consume both TensorFlow and ONNX models within ML.NET which makes the framework more extensible and expands the number of supported scenarios.

Getting started with machine learning and ML.NET

Roadmap

Take a look at ML.NET's Roadmap to see what the team plans to work on in the next year.

Operating systems and processor architectures supported by ML.NET

ML.NET runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework.

ML.NET also runs on ARM64, Apple M1, and Blazor Web Assembly. However, there are some limitations.

64-bit is supported on all platforms. 32-bit is supported on Windows, except for TensorFlow and LightGBM related functionality.

ML.NET NuGet packages status

NuGet Status

Release notes

Check out the release notes to see what's new. You can also read the blog posts for more details about each release.

Using ML.NET packages

First, ensure you have installed .NET Core 2.1 or later. ML.NET also works on the .NET Framework 4.6.1 or later, but 4.7.2 or later is recommended.

Once you have an app, you can install the ML.NET NuGet package from the .NET Core CLI using:

dotnet add package Microsoft.ML

or from the NuGet Package Manager:

Install-Package Microsoft.ML

Alternatively, you can add the Microsoft.ML package from within Visual Studio's NuGet package manager or via Paket.

Daily NuGet builds of the project are also available in our Azure DevOps feed:

https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json

Building ML.NET (For contributors building ML.NET open source code)

To build ML.NET from source please visit our developer guide.

codecov

Debug Release
CentOS Build Status Build Status
Ubuntu Build Status Build Status
macOS Build Status Build Status
Windows x64 Build Status Build Status
Windows FullFramework Build Status Build Status
Windows x86 Build Status Build Status
Windows NetCore3.1 Build Status Build Status

Release process and versioning

Major releases of ML.NET are shipped once a year with the major .NET releases, starting with ML.NET 1.7 in November 2021 with .NET 6, then ML.NET 2.0 with .NET 7, etc. We will maintain release branches to optionally service ML.NET with bug fixes and/or minor features on the same cadence as .NET servicing.

Check out the Release Notes to see all of the past ML.NET releases.

Contributing

We welcome contributions! Please review our contribution guide.

Community

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community. For more information, see the .NET Foundation Code of Conduct.

Code examples

Here is a code snippet for training a model to predict sentiment from text samples. You can find complete samples in the samples repo.

var dataPath = "sentiment.csv";
var mlContext = new MLContext();
var loader = mlContext.Data.CreateTextLoader(new[]
    {
        new TextLoader.Column("SentimentText", DataKind.String, 1),
        new TextLoader.Column("Label", DataKind.Boolean, 0),
    },
    hasHeader: true,
    separatorChar: ',');
var data = loader.Load(dataPath);
var learningPipeline = mlContext.Transforms.Text.FeaturizeText("Features", "SentimentText")
        .Append(mlContext.BinaryClassification.Trainers.FastTree());
var model = learningPipeline.Fit(data);

Now from the model we can make inferences (predictions):

var predictionEngine = mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(model);
var prediction = predictionEngine.Predict(new SentimentData
{
    SentimentText = "Today is a great day!"
});
Console.WriteLine("prediction: " + prediction.Prediction);

License

ML.NET is licensed under the MIT license, and it is free to use commercially.

.NET Foundation

ML.NET is a part of the .NET Foundation.

machinelearning's People

Contributors

abgoswam avatar anipik avatar antoniovs1029 avatar artidoro avatar asmirnov82 avatar codemzs avatar daholste avatar dmitry-a avatar dotnet-maestro[bot] avatar eerhardt avatar ericstj avatar frank-dong-ms-zz avatar harishsk avatar ivanidzo4ka avatar littlelittlecloud avatar lynx1820 avatar michaelgsharp avatar mstfbl avatar najeeb-kazmi avatar rogancarr avatar sfilipi avatar sharwell avatar shauheen avatar shmoradims avatar srsaggam avatar tomfinley avatar wschin avatar yaeldms avatar zeahmed avatar zruty0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machinelearning's Issues

The Gitter link in Readme.md points to the wrong chate

Line 47 of our README.md says:

Please join our community on Gitter [![Join the chat at https://gitter.im/dotnet/corefx](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dotnet/?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

While it really should be:

Please join our community on Gitter [![Join the chat at https://gitter.im/dotnet/mlnet](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dotnet/mlnet)

Some tests (eight) fail in Microsoft.ML.Predictor.Tests

System information

.NET Command Line Tools (2.1.200)

Product Information:
Version: 2.1.200
Commit SHA-1 hash: 2edba8d7f1

Runtime Environment:
OS Name: Windows
OS Version: 10.0.17134
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.200\

Microsoft .NET Core Shared Framework Host

Version : 2.0.7
Build : 2d61d0b043915bc948ebf98836fefe9ba942be11

Issue

  1. I checkout the project,
  2. Ran .\build.cmd on PowerShell prompt
  3. Opened the project in VS 2017 15.7
  4. Ran all tests, some of which failed (see attached image).

I did expect all tests to pass.

fail line
The assert here fails. I didn't go further to check if the cause is a programming error or something else as this was more of curiosity, but noted here. :)

These are all of the eight tests that fail due to the assert in the previous image:
failed tests

Need to get access to external data sets

There are some data sets we can't commit into the repository. We should download these data sets as part of the initial build, and then cache them in the bin directory (or similarly gitignored folder). That way we can use them in our tests.

EntryPointChainedCrossValMacros test fails in CI occasionally

System information

  • OS version/distro: Linux Debug
  • .NET Version (eg., dotnet --info): .NET Core

Issue

Source code / logs

MESSAGE:
                                        Assert failed: longIdx=328, invariants.Length=328
Expected: True
Actual:   False
                                        +++++++++++++++++++
                                        STACK TRACE:
                                           at Microsoft.ML.Runtime.Internal.Internallearn.Test.GlobalBase.AssertHandler(String msg, IExceptionContext ectx) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/test/Microsoft.ML.TestFramework/GlobalBase.cs:line 47
   at Microsoft.ML.Runtime.Contracts.DbgFailCore(String msg, IExceptionContext ctx) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Core/Utilities/Contracts.cs:line 751
   at Microsoft.ML.Runtime.Contracts.DbgFail(String msg) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Core/Utilities/Contracts.cs:line 764
   at Microsoft.ML.Runtime.Contracts.Assert(Boolean f, String msg) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Core/Utilities/Contracts.cs:line 822
   at Microsoft.ML.Runtime.Learners.SdcaTrainerBase`1.TrainCore(IChannel ch, RoleMappedData data, LinearPredictor predictor) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.StandardLearners/Standard/LinearClassificationTrainer.cs:line 520
   at Microsoft.ML.Runtime.Learners.LinearTrainerBase`1.TrainEx(RoleMappedData data, LinearPredictor predictor) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.StandardLearners/Standard/LinearClassificationTrainer.cs:line 76
   at Microsoft.ML.Runtime.Learners.LinearTrainerBase`1.Train(RoleMappedData examples) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.StandardLearners/Standard/LinearClassificationTrainer.cs:line 84
   at Microsoft.ML.Runtime.Data.TrainUtils.TrainCore[TDataSet,TPredictor](IChannel ch, ITrainer trainer, Action`1 train, TDataSet data, TDataSet validData, TPredictor predictor) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Data/Commands/TrainCommand.cs:line 324

Add mono support

From @alexanderkyte on Apr 27, 2018, 10:58 AM PDT

In order for our models to be useful on mobile platforms, we're going to need to get this working on mono. It'll probably simply be some infrastructure work.

I can address it, as I'm a mono runtime engineer.

Currently on backlog / low-priority

System.MachineLearning.Runtime namespaces cleanup

From @KrzysztofCwalina on Apr 16, 2018, 9:11 AM PDT

There are several problems with System.MachineLearning.Runtime namespaces. We should clean these problems out:

  • There are too many System.MachineLearning.Runtime subnamepsaces. This will overwhelm users browsing the documentation on MSDN, IDE browsers, etc. We should combine the APIs in fewer subnamespaces.

  • We should not use "Api" in namespace or project names. All public things are "APIs" as far as .NET developers are concerned.

  • We have "EntryPoints" namespace. We should rename it. All "entry point" APIs should simply be in the root namespace (or subnamesapce of the root..

  • We should not have "tools" APIs in the main assmebly. Remove/rename Microsoft.MachineLearning.Runtime.Internal.Tools

Doesn't support partitioned directories.

Storage formats such as Parquet allow partitioning their data through multiple files and structured directories. This library has no such way to load these partitioned files into one IDataView.

Hyperparameters reversed in Scenario3

From @justinormont on May 2, 2018, 8:51 AM PDT

Line below on file machinelearning/test/Microsoft.ML.Tests/Scenarios/Scenario3_SentimentPrediction.cs

CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true },

Currently: (trigrams & unichargrams+bichargrams)

  CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true },
  WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false }

Should be: (unigram+bigram & trichargrams)

  CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false },
  WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true }

Remove/Rename LotusIR namespace

From @KrzysztofCwalina on Apr 16, 2018, 9:13 AM PDT

There is a public top level LotusIR namespace in our repo. We agreed that all our namespaces will be in Microsoft.MachineLearning, and so we should rename or remove this LotusIR namespace.

Nice work!

I haven't worked with C# in a while because I've been working on some machine learning projects in Python with Scikit-Learn. It's so awesome to see .NET is getting its own built-in, high performance machine learning package! 🎉

I have a question about the goals of this project-- how exactly does it relate to existing high-quality frameworks like Accord.NET or all of the projects listed here? Thanks.

Use .NET Core Hardware Intrinsics to optimize the code?

It's great to see a C# machine learning framework!

It seems some linear algebra algorithms are implemented by calling into native cpp code. I assume this is due to the rich SIMD instructions in cpp. Since .NET Core 2.1 has the preview feature of Hardware Intrinsics, using hardware intrinsics is another option to use SIMD instructions.

ML.Net and Azure ML Relationship

It would be beneficial for potential users to understand the relationship between ml.net and azure ML.

Is ml.net the lib behind azure ml? Is ml.net going to be a block inside azure ml? How do those things.work together, is it going to be possible to seamlessly move models, flows, etc between the two?

Add support for training on a collection of objects

Right now, the LearningPipeline can only learn from data consumed via the TextLoader or any other Loader component. For many scenarios, it would be nice to allow for the consumption of collections of objects for training.

Simplify Pipeline Initalization with Append() that returns the pipeline

Issue:
Currently we have to use the pipeline instance to append additional items to the pipeline. It looks something like this:

var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader<SentimentData>(dataPath, separator: ","));
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

We can add the ability to add a pipeline item in a fluent fashion. The benefit would be reduction of typing and cleaner API.
This would require pipeline to add the following method:

public LearningPipeline Append(ILearningPipelineItem item);

The user code will look like this:

var pipeline = new LearningPipeline()
   .Append(new TextLoader<SentimentData>(dataPath, separator: ","))
   .Append(new TextFeaturizer("Features", "SentimentText"));

(optional) with extension methods, this can be simplified even further to:

var pipeline = new LearningPipeline()
   .AddTextLoader<SentimentData>(dataPath, separator: ",")
   .AddTextFeaturizer("Features", "SentimentText");

Refactor Scenario tests...

Issue: In Scenarios, there is a class named "Top5Scenarios"
Depending on how you count, there are either 2 or 4 scenarios.
It is recommended renaming the class.


Let's also review the names of the files and make sure they are descriptive.

Fill out nupkg metadata completely

System information

  • OS version/distro: All
  • .NET Version (eg., dotnet --info): All

Issue

  • What did you do?

Inspect the NuGet metadata for Microsoft.ML

  • What happened?

Only some of the info is filled out. For example, for "description" it says "Package Description". It also doesn't have a link to release notes or license/project URL.

  • What did you expect?

I expected all the info to be filled out.

CpuMath: Sse code is executed when Avx is available, except on .NET Core 3.0

System information

  • Windows 10 Enterprise:
  • dotnet Version: 2.1.105:

Issue

A few questions

Hey,

That's a great initiative to see a machine learning FW for .NET 👍

I have a couple of questions about the rationale behind the project:

  • Apart the difference between a pure .NET implementation and a mixed C++&.NET Bindings, what are the advantages/differences with CNTK?
  • What are the training algorithms supported? (e.g CNN, RNN?)
  • What is the plan about multi-machine, multi-CPU, GPU (and multi-GPU) support?
  • Is it aimed at providing an abstract API (and a default implementation) that could plug to any implementation behind (e.g CNTK?)?

Thanks!

Binary Training Data

As shown during the .NET Overview Session today at BUILD and as discussed with the ML Team there is only a text based reader for training data. Having a binary reader for training data would be highly beneficial to my use case.

Need to support more types than string and float

From @eerhardt on Apr 12, 2018, 6:52 AM PDT

See code in machinelearning/src/Microsoft.MachineLearning.EntryPoints/TextLoader.cs

	private string TypeToName(Type type) 
	{ 
	if (type == typeof(string)) 
	return "TX"; 
	else if (type == typeof(float) || type == typeof(double)) 
	return "R4"; 
	else 
	throw new Exception("Type not implemented or supported."); //Add more types. 
	} 

(Refers to Lines 70 to 78 in 6e74d72)

We should fill all supported types out and add tests.

Simple example using House Pricing Scenario

This is the first time I've looked into machine learning but I have a use case I'd like to test with it.

To get started I've created a simple example from the house pricing scenario which somewhat closely matches my use case but the results I'm getting are not at all close to what I expected. The data I'm providing is simply linear in terms of just the SqftLiving input parameter to the Price where Price = SqftLiving * 100. The SqftLot is held constant for training and prediction so it should be a non-factor.

I'm just trying to predict the Price when the SqftLiving is 1500 which with the linear model created by the provided data should make it about $150,000.

However, the results I get vary wildly from the negative to the postive 10's of millions every time I run the program which is unexpected. Could someone look into this simple example and let me know what if anything I'm doing is causing these poor results?

class Program
{
    static void Main(string[] args)
    {
        var filePath = "C://Temp/kc_house_data.csv";

        File.WriteAllText(filePath, @"100000,1000,8000
200000,2000,8000
400000,4000,8000");

        var pipeline = new LearningPipeline
        {
            new TextLoader<HousePriceData>(filePath, separator: ","),
            new ColumnConcatenator("Features", "SqftLiving", "SqftLot"),
            new StochasticDualCoordinateAscentRegressor()
        };

        var model = pipeline.Train<HousePriceData, HousePricePrediction>();

        var prediction = model.Predict(new HousePriceData { SqftLiving = 1500, SqftLot = 8000 });

        Console.WriteLine(prediction.Price);
        Console.ReadLine();
    }
}

public class HousePriceData
{
    [Column(ordinal: "0", name: "Label")]
    public float Price;

    [Column(ordinal: "1")]
    public float SqftLiving;

    [Column(ordinal: "2")]
    public float SqftLot;
}

public class HousePricePrediction
{
    [ColumnName("Score")]
    public float Price;
}

Building in Visual Studio

From @terrajobst on Mar 26

On a clean machine, opening the solution file and building in VS fails as the Tools folder doesn't exist. Just running init-tools.cmd doesn't fix it either as it now fails with malformed AssemblyInfo.cs files.
I'm still trying to get build working from the command line (where I get mostly actionable error messages, like Install CMake) this feels like the sort of thing that discourages contributors quickly. Ideally, you should be able to clone the repo, open the solution in VS, and building immediately.
Thoughts?

Build failed under OSX

System information

  • OS version/distro: OSX 10.12.6
  • .NET Version (eg., dotnet --info):
.NET Command Line Tools (2.1.4)

Product Information:
 Version:            2.1.4
 Commit SHA-1 hash:  5e8add2190

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.12
 OS Platform: Darwin
 RID:         osx.10.12-x64
 Base Path:   /usr/local/share/dotnet/sdk/2.1.4/

Microsoft .NET Core Shared Framework Host

  Version  : 2.0.5
  Build    : 17373eb129b3b05aa18ece963f8795d65ef8ea54

Issue

  • What did you do?
> git clone [email protected]:dotnet/machinelearning.git
> cd machinelearning
> ./build.sh
  • What happened?
    Build failed

  • What did you expect?
    ML goodness

Source code / logs

... truncated for brevity ...

  + cmake /Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native -G 'Unix Makefiles' -DCMAKE_BUILD_TYPE=Debug -DVERSION_FILE_PATH:STRING=/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/../../bin/obj/version.c
  -- The C compiler identification is Clang 3.5.1
  -- The CXX compiler identification is Clang 3.5.1
  -- Check for working C compiler: /usr/local/bin/clang-3.5
  -- Check for working C compiler: /usr/local/bin/clang-3.5 -- works
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Check for working CXX compiler: /usr/local/bin/clang++-3.5
  -- Check for working CXX compiler: /usr/local/bin/clang++-3.5 -- works
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Configuring done
  -- Generating done
EXEC : CMake warning (dev):  [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
    Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
    --help-policy CMP0042" for policy details.  Use the cmake_policy command to
    set the policy and suppress this warning.
  
    MACOSX_RPATH is not specified for the following targets:
  
     CpuMathNative
     FastTreeNative
  
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  -- Build files have been written to: /Users/justinormont/Documents/Microsoft/src/machinelearning/bin/obj/x64.Debug/Native
  + set +x
  Scanning dependencies of target CpuMathNative
  [  8%] Building CXX object CpuMathNative/CMakeFiles/CpuMathNative.dir/Sse.cpp.o
EXEC : error : unknown warning option '-Wno-unused-local-typedef' [-Werror,-Wunknown-warning-option] [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
  make[2]: *** [CpuMathNative/CMakeFiles/CpuMathNative.dir/Sse.cpp.o] Error 1
  make[1]: *** [CpuMathNative/CMakeFiles/CpuMathNative.dir/all] Error 2
  make: *** [all] Error 2
/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj(33,5): error MSB3073: The command "/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.sh --configuration Debug --arch x64 " exited with code 2.

Build FAILED.

EXEC : CMake warning (dev):  [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
EXEC : error : unknown warning option '-Wno-unused-local-typedef' [-Werror,-Wunknown-warning-option] [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj(33,5): error MSB3073: The command "/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.sh --configuration Debug --arch x64 " exited with code 2.
    1 Warning(s)
    2 Error(s)

Time Elapsed 00:01:28.38
Command execution failed with exit code 1.

Attempts

I first suspected an older version of clang or cmake:

> clang++ --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin16.7.0
Thread model: posix

> cmake --version
cmake version 3.10.1

After updating:

> clang++ --version
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

> cmake --version
cmake version 3.11.1

Still no luck after updating clang++ & cmake.

Where is SentimentData defined?

System information

  • OS version/distro: Win10
  • .NET Version (eg., dotnet --info): 2.1.200

Issue

  • What did you do?
    I'm try the examples in the readme.

  • What happened?

I installed Microsoft.ML package to a newly created .Net core console app.

And in the Program.cs file,

using System;
using Microsoft.ML;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;

namespace MLTry
{
    class Program
    {
        static void Main(string[] args)
        {
            var pipeline = new LearningPipeline();
            pipeline.Add(new TextLoader<SentimentData>(dataPath, separator: ","));
//...

It doesn't compile because SentimentData is not defined. And I searched for SentimentData in the repo but found nothing related.

  • What did you expect?
    Found the type SentimentData.

R and Python integration / interoperability

Will this library offer R and Python integration? Where is it on the roadmap? What kind of data transfer library/format will it use, Apache arrow? Something else?

It is important for solution architects to understand how ml.net is going to fit into the big data picture, it is necessary these days given that java, R and Python are dominant in this space.

Exception message missing parameter name

This message probably meant to have the parameter name in the string.

"System.ArgumentOutOfRangeException:  is missing ColumnAttribute
Parameter name: Name
   at Microsoft.ML.TextLoader`1.SetCustomStringFromType(Boolean useHeader, String separator, Boolean allowQuotedStrings, Boolean supportSparse, Boolean trimWhitespace)
   at Microsoft.ML.TextLoader`1..ctor(String inputFilePath, Boolean useHeader, String separator, Boolean allowQuotedStrings, Boolean supportSparse, Boolean trimWhitespace)
   at Microsoft.ML.NYCTaxiFare.Program.Train() in C:\Users\danmose\source\repos\ConsoleApp81\ConsoleApp81\Program.cs:line 25
   at Microsoft.ML.NYCTaxiFare.Program.Main(String[] args) in C:\Users\danmose\source\repos\ConsoleApp81\ConsoleApp81\Program.cs:line 17"

It's because it's thrown like
throw Contracts.ExceptParam(nameof(field.Name), " is missing ColumnAttribute");
and the ExceptParam does not prefix it for free:

  306:         public static Exception ExceptParam(string paramName, string msg)
  307              => Process(new ArgumentOutOfRangeException(paramName, msg));

Enable cross-validation for LearningPipeline

Cross-validation would make it easier to validate models if the data is not already split into train and test files (and generally helps give more precise metrics). This is already exposed as an entrypoint here, but needs to be enabled for LearningPipeline.

Figure out a long term name for CpuMathNative

From @eerhardt on March 26th

from
- internal const string NativePath = @"Microsoft.MachineLearning.CpuMathNative.dll";
to
+ internal const string NativePath = "CpuMathNative";    

Glad this worked, and thanks for filing the CoreCLR issue. As far as I can tell this should work on netcoreapp2.0 and desktop as well, but we should test that since this codepath has undergone a few changes in 2.1. Also we may consider adding something more specific to the name since we won't be able to rely on a fix for dotnet/coreclr#17150. 

-rp.txt files are not getting generated on linux and mac

From @Anipik on May 2, 2018, 1:49 PM PDT

https://ci2.dot.net/job/Private/job/dotnet_machinelearning/job/master/job/linux_debug_prtest/388/console

13:43:16    Running as: TrainTest tr=LogisticRegression{l1=1.0 l2=0.1 ot=1e-3 nt=1} data=/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/test/data/breast-cancer.txt seed=1 test=/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/test/data/breast-cancer.txt out={/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/bin/AnyCPU.Debug/Microsoft.ML.Predictor.Tests/netcoreapp2.0/TestOutput/LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer-model.zip} dout={/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/bin/AnyCPU.Debug/Microsoft.ML.Predictor.Tests/netcoreapp2.0/TestOutput/LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer.txt}
13:43:16  Output matches baseline: 'LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer-out.txt'
13:43:16  *** Failure: Output file not found: /mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/bin/AnyCPU.Debug/Microsoft.ML.Predictor.Tests/netcoreapp2.0/TestOutput/LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer-rp.txt
13:43:16  Output matches baseline: 'LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer.txt'
13:43:16  Suffix of length 34 compared against sequence of length 42
13:43:16  Running 'LogisticRegression' on 'breast-cancer'
13:43:16    Running as: CV tr=LogisticRegression{l1=1.0 l2=0.1 ot=1e-3 nt=1} data=/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/test/data/breast-cancer.txt seed=1 dout=

cc @danmosemsft @eerhardt @codemzs

Enabling different settings for the TextLoader on train and test data might lead to incorrect metrics with no error

The train and test data can be read with different TextLoaders, which makes it easy to introduce bugs. If there is a difference in the settings, the metrics are likely to be wrong but there are no errors.
Example:

string trainDataPath= "sentiment_data.tsv";
pipeline.Add(new TextLoader<SentimentData>(trainDataPath, useHeader: true));

// Later in the file, after completing the pipeline and training the model
string testDataPath = "sentiment_test.tsv";
var testData = new TextLoader<SentimentData>(testDataPath, userHeader: true, sep: ',');

var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

Evaluating on the test data will result in incorrect metrics as the test file will be parsed incorrectly. However, the experiment runs successfully with no errors, so this might be difficult to detect.

Bring CNTK model evaluation to this library

I had a great conversation with some of the team at Build today. I hope this is the right place - would love to see ML.Net bring evaluation of CNTK models. The team suggested I place a note here to hopefully help vote this up.
We use custom CNTK models for behind the scenes processing in our app to do various things. We do the evaluation in Azure Function, and would love to see a pure-.net deploy instead of the current deploy where we have to bin deploy lots of C++ libraries and manually work around getting those into a path location so they are callable via c#/managed code.

So, +1 for our team - we would love to see this feature add.

Add Evaluate overload that takes IDataView

From @eerhardt on May 1, 2018, 9:52 AM PDT

See comment From @glebuk created May 1, 2018, 8:52 AM PDT for file src/Microsoft.ML/Models/BinaryClassificationEvaluator.cs on line below:

public BinaryClassificationMetrics Evaluate(PredictionModel model, ILearningPipelineLoader testData)

Evaluate [](start = 43, length = 8)

We should add another overload to evaluate against a scored IDV. There you'd need to specify columns for actual, predicted labels and score. (ok to ship after build) #Closed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.