dotnet / machinelearning Goto Github PK

View Code? Open in Web Editor NEW

8.8K 577.0 1.8K 95.7 MB

ML.NET is an open source and cross-platform machine learning framework for .NET.

Home Page: https://dot.net/ml

License: MIT License

Batchfile 0.03% Shell 0.57% C# 97.01% CMake 0.15% C++ 1.35% Assembly 0.01% C 0.06% PowerShell 0.78% F# 0.04% Ruby 0.01%

machine-learning algorithms dotnet ml

machinelearning's Introduction

Machine Learning for .NET

ML.NET is a cross-platform open-source machine learning (ML) framework for .NET.

ML.NET allows developers to easily build, train, deploy, and consume custom models in their .NET applications without requiring prior expertise in developing machine learning models or experience with other programming languages like Python or R. The framework provides data loading from files and databases, enables data transformations, and includes many ML algorithms.

With ML.NET, you can train models for a variety of scenarios, like classification, forecasting, and anomaly detection.

You can also consume both TensorFlow and ONNX models within ML.NET which makes the framework more extensible and expands the number of supported scenarios.

Getting started with machine learning and ML.NET

Learn more about the basics of ML.NET.
Build your first ML.NET model by following our ML.NET Getting Started tutorial.
Check out our documentation and tutorials.
See the API Reference documentation.
Clone our ML.NET Samples GitHub repo and run some sample apps.
Take a look at some ML.NET Community Samples.
Watch some videos on the ML.NET videos YouTube playlist.

Roadmap

Take a look at ML.NET's Roadmap to see what the team plans to work on in the next year.

Operating systems and processor architectures supported by ML.NET

ML.NET runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework.

ML.NET also runs on ARM64, Apple M1, and Blazor Web Assembly. However, there are some limitations.

64-bit is supported on all platforms. 32-bit is supported on Windows, except for TensorFlow and LightGBM related functionality.

ML.NET NuGet packages status

Release notes

Check out the release notes to see what's new. You can also read the blog posts for more details about each release.

Using ML.NET packages

First, ensure you have installed .NET Core 2.1 or later. ML.NET also works on the .NET Framework 4.6.1 or later, but 4.7.2 or later is recommended.

Once you have an app, you can install the ML.NET NuGet package from the .NET Core CLI using:

dotnet add package Microsoft.ML

or from the NuGet Package Manager:

Install-Package Microsoft.ML

Alternatively, you can add the Microsoft.ML package from within Visual Studio's NuGet package manager or via Paket.

Daily NuGet builds of the project are also available in our Azure DevOps feed:

https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json

Building ML.NET (For contributors building ML.NET open source code)

To build ML.NET from source please visit our developer guide.

	Debug	Release
CentOS
Ubuntu
macOS
Windows x64
Windows FullFramework
Windows x86
Windows NetCore3.1

Release process and versioning

Major releases of ML.NET are shipped once a year with the major .NET releases, starting with ML.NET 1.7 in November 2021 with .NET 6, then ML.NET 2.0 with .NET 7, etc. We will maintain release branches to optionally service ML.NET with bug fixes and/or minor features on the same cadence as .NET servicing.

Check out the Release Notes to see all of the past ML.NET releases.

Contributing

We welcome contributions! Please review our contribution guide.

Community

Join our community on Discord.
Tune into the .NET Machine Learning Community Standup every other Wednesday at 10AM Pacific Time.

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community. For more information, see the .NET Foundation Code of Conduct.

Code examples

Here is a code snippet for training a model to predict sentiment from text samples. You can find complete samples in the samples repo.

var dataPath = "sentiment.csv";
var mlContext = new MLContext();
var loader = mlContext.Data.CreateTextLoader(new[]
    {
        new TextLoader.Column("SentimentText", DataKind.String, 1),
        new TextLoader.Column("Label", DataKind.Boolean, 0),
    },
    hasHeader: true,
    separatorChar: ',');
var data = loader.Load(dataPath);
var learningPipeline = mlContext.Transforms.Text.FeaturizeText("Features", "SentimentText")
        .Append(mlContext.BinaryClassification.Trainers.FastTree());
var model = learningPipeline.Fit(data);

Now from the model we can make inferences (predictions):

var predictionEngine = mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(model);
var prediction = predictionEngine.Predict(new SentimentData
{
    SentimentText = "Today is a great day!"
});
Console.WriteLine("prediction: " + prediction.Prediction);

License

ML.NET is licensed under the MIT license, and it is free to use commercially.

.NET Foundation

ML.NET is a part of the .NET Foundation.

machinelearning's People

Contributors

Stargazers

Watchers

Forkers

danmoseley markusweimer oliag eerhardt shalinparmar pavlvstc tylercode tyclintw yamachu maryamariyan mobileapps1 hkf57 neo4reo calebjenkins hydpublic mfaticaearnin montebhoover sfilipi hamielkuo onthelake vishal-h wangchengqun techlord-rce ifle galoshri farrjere glebuk chenyi2006520 geffzhang spol-rafasoftware dominikherold rnrneverdies rbnswartz dotnetrt jabbottincomm jeremyabbott bahamuttg stymch vonbv25 crystalwindsnake azureexpert mariszo weiplanet jessebenson alexanderkyte ctorx highwayns zevfung hbcbh1999 oneofbestman fengxing666 coreyan liuyl1992 liuguojiang tomliu-github jiangxiangji zhangzhenling prisar asthana86 feng2012 cheahengsoon forki justinormont jmma jlw123199 chenshuguang jacklau88 dulerad94 yaeldekel crokus masums chaseofspades khalidabuhakmeh benzei kevinmel2000 ivanidzo4ka techmilano sirbeansbj justgohead jangocheng ynghhhhhhhhh veikkoeeva dongrizhixue vivenci panyoujin sunth2010 zfxu gavinhwa ikvm chenlongxi666 csharpgit garora dekajp yuan39 mandyshieh codemzs zhoufoxcn anyangmaxin caofangsheng93 chewel611

machinelearning's Issues

Intellisense is not helpful with filling in pipeline components.

When adding transforms/trainers into the "LearningPipeline" object. There is no info/docs/help available through the intellisense. It would be nice to add docs to explain what to add.

Enable/increase verbosity for LearningPipeline and its components (like TextFeaturizer)

Transforms like TextFeaturizer do not show sufficient logs. These logs exist but are currently not exposed.

In an experiment where the TextFeaturizer is the first transform, a user might not see any logs for several minutes, which might suggest there is an issue. Increasing verbosity will clarify what is happening.

Publish daily official builds to myget

We should publish our daily official builds to https://dotnet.myget.org. That way developers external to Microsoft can get access to the official daily packages.

Add release notes for ML.NET 0.1

Need to add release notes for ML.NET 0.1 with installation steps and available components.

The Gitter link in Readme.md points to the wrong chate

Line 47 of our README.md says:

Please join our community on Gitter [![Join the chat at https://gitter.im/dotnet/corefx](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dotnet/?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

While it really should be:

Please join our community on Gitter [![Join the chat at https://gitter.im/dotnet/mlnet](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dotnet/mlnet)

Verifying steps to build machinelearning repo on Linux and Mac

Need to verify the instructions in the Documentation folder for building this repo on different distos have no missing steps.

Windows

Has been verified already.

Unix:

Linux: clang version 3.9+ is the minimum pre-requisite not 3.5+
macOS: to be verified.

cc: @danmosemsft @eerhardt

Some tests (eight) fail in Microsoft.ML.Predictor.Tests

System information

.NET Command Line Tools (2.1.200)

Product Information:
Version: 2.1.200
Commit SHA-1 hash: 2edba8d7f1

Runtime Environment:
OS Name: Windows
OS Version: 10.0.17134
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.200\

Microsoft .NET Core Shared Framework Host

Version : 2.0.7
Build : 2d61d0b043915bc948ebf98836fefe9ba942be11

Issue

I checkout the project,
Ran .\build.cmd on PowerShell prompt
Opened the project in VS 2017 15.7
Ran all tests, some of which failed (see attached image).

I did expect all tests to pass.

The assert here fails. I didn't go further to check if the cause is a programming error or something else as this was more of curiosity, but noted here. :)

These are all of the eight tests that fail due to the assert in the previous image:

Need to get access to external data sets

There are some data sets we can't commit into the repository. We should download these data sets as part of the initial build, and then cache them in the bin directory (or similarly gitignored folder). That way we can use them in our tests.

EntryPointChainedCrossValMacros test fails in CI occasionally

System information

OS version/distro: Linux Debug
.NET Version (eg., dotnet --info): .NET Core

Issue

What did you do?
Submitted a PR that only changed a markdown file: #48
What happened?
The Linux Debug leg failed. See https://ci2.dot.net/job/dotnet_machinelearning/job/master/job/linux_debug_prtest/1/
What did you expect?
I expected all the tests to pass on all legs since I didn't change any code.

Source code / logs

MESSAGE:
                                        Assert failed: longIdx=328, invariants.Length=328
Expected: True
Actual:   False
                                        +++++++++++++++++++
                                        STACK TRACE:
                                           at Microsoft.ML.Runtime.Internal.Internallearn.Test.GlobalBase.AssertHandler(String msg, IExceptionContext ectx) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/test/Microsoft.ML.TestFramework/GlobalBase.cs:line 47
   at Microsoft.ML.Runtime.Contracts.DbgFailCore(String msg, IExceptionContext ctx) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Core/Utilities/Contracts.cs:line 751
   at Microsoft.ML.Runtime.Contracts.DbgFail(String msg) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Core/Utilities/Contracts.cs:line 764
   at Microsoft.ML.Runtime.Contracts.Assert(Boolean f, String msg) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Core/Utilities/Contracts.cs:line 822
   at Microsoft.ML.Runtime.Learners.SdcaTrainerBase`1.TrainCore(IChannel ch, RoleMappedData data, LinearPredictor predictor) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.StandardLearners/Standard/LinearClassificationTrainer.cs:line 520
   at Microsoft.ML.Runtime.Learners.LinearTrainerBase`1.TrainEx(RoleMappedData data, LinearPredictor predictor) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.StandardLearners/Standard/LinearClassificationTrainer.cs:line 76
   at Microsoft.ML.Runtime.Learners.LinearTrainerBase`1.Train(RoleMappedData examples) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.StandardLearners/Standard/LinearClassificationTrainer.cs:line 84
   at Microsoft.ML.Runtime.Data.TrainUtils.TrainCore[TDataSet,TPredictor](IChannel ch, ITrainer trainer, Action`1 train, TDataSet data, TDataSet validData, TPredictor predictor) in /mnt/resource/j/w/dotnet_machinelearning/master/linux_debug_prtest/src/Microsoft.ML.Data/Commands/TrainCommand.cs:line 324

Add mono support

From @alexanderkyte on Apr 27, 2018, 10:58 AM PDT

In order for our models to be useful on mobile platforms, we're going to need to get this working on mono. It'll probably simply be some infrastructure work.

I can address it, as I'm a mono runtime engineer.

Currently on backlog / low-priority

System.MachineLearning.Runtime namespaces cleanup

From @KrzysztofCwalina on Apr 16, 2018, 9:11 AM PDT

There are several problems with System.MachineLearning.Runtime namespaces. We should clean these problems out:

There are too many System.MachineLearning.Runtime subnamepsaces. This will overwhelm users browsing the documentation on MSDN, IDE browsers, etc. We should combine the APIs in fewer subnamespaces.
We should not use "Api" in namespace or project names. All public things are "APIs" as far as .NET developers are concerned.
We have "EntryPoints" namespace. We should rename it. All "entry point" APIs should simply be in the root namespace (or subnamesapce of the root..
We should not have "tools" APIs in the main assmebly. Remove/rename Microsoft.MachineLearning.Runtime.Internal.Tools

Enable using OneVersusAll (OVA) in LearningPipeline

OneVersusAll would enable using more learners in multiclass classification problems (e.g. FastTree). OVA is currently available but not as part of LearningPipeline.

Doesn't support partitioned directories.

Storage formats such as Parquet allow partitioning their data through multiple files and structured directories. This library has no such way to load these partitioned files into one IDataView.

Add ML.NET Roadmap

Add ML.NET roadmap for near and long term

Hyperparameters reversed in Scenario3

From @justinormont on May 2, 2018, 8:51 AM PDT

Line below on file machinelearning/test/Microsoft.ML.Tests/Scenarios/Scenario3_SentimentPrediction.cs

CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true },

Currently: (trigrams & unichargrams+bichargrams)

  CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true },
  WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false }

Should be: (unigram+bigram & trichargrams)

  CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false },
  WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true }

Remove/Rename LotusIR namespace

From @KrzysztofCwalina on Apr 16, 2018, 9:13 AM PDT

There is a public top level LotusIR namespace in our repo. We agreed that all our namespaces will be in Microsoft.MachineLearning, and so we should rename or remove this LotusIR namespace.

Nice work!

I haven't worked with C# in a while because I've been working on some machine learning projects in Python with Scikit-Learn. It's so awesome to see .NET is getting its own built-in, high performance machine learning package! 🎉

I have a question about the goals of this project-- how exactly does it relate to existing high-quality frameworks like Accord.NET or all of the projects listed here? Thanks.

Use .NET Core Hardware Intrinsics to optimize the code?

It's great to see a C# machine learning framework!

It seems some linear algebra algorithms are implemented by calling into native cpp code. I assume this is due to the rich SIMD instructions in cpp. Since .NET Core 2.1 has the preview feature of Hardware Intrinsics, using hardware intrinsics is another option to use SIMD instructions.

ML.Net and Azure ML Relationship

It would be beneficial for potential users to understand the relationship between ml.net and azure ML.

Is ml.net the lib behind azure ml? Is ml.net going to be a block inside azure ml? How do those things.work together, is it going to be possible to seamlessly move models, flows, etc between the two?

Add support for training on a collection of objects

Right now, the LearningPipeline can only learn from data consumed via the TextLoader or any other Loader component. For many scenarios, it would be nice to allow for the consumption of collections of objects for training.

Add/expose random seed in LearningPipeline to get deterministic results

It is important to be able to set a random seed for ML experiments so the results are reproducible. Add random seed to LearningPipeline (or elsewhere) to ensure a full experiment is deterministic.

Simplify Pipeline Initalization with Append() that returns the pipeline

Issue:
Currently we have to use the pipeline instance to append additional items to the pipeline. It looks something like this:

var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader<SentimentData>(dataPath, separator: ","));
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

We can add the ability to add a pipeline item in a fluent fashion. The benefit would be reduction of typing and cleaner API.
This would require pipeline to add the following method:

public LearningPipeline Append(ILearningPipelineItem item);

The user code will look like this:

var pipeline = new LearningPipeline()
   .Append(new TextLoader<SentimentData>(dataPath, separator: ","))
   .Append(new TextFeaturizer("Features", "SentimentText"));

(optional) with extension methods, this can be simplified even further to:

var pipeline = new LearningPipeline()
   .AddTextLoader<SentimentData>(dataPath, separator: ",")
   .AddTextFeaturizer("Features", "SentimentText");

Provide Scenario Sample Code for All Scenarios

From @KrzysztofCwalina on Apr 16, 2018, 9:07 AM PDT

Currently, we have only one scenario code sample (house price prediction). We should have top 5 scenario samples.

Refactor Scenario tests...

Issue: In Scenarios, there is a class named "Top5Scenarios"
Depending on how you count, there are either 2 or 4 scenarios.
It is recommended renaming the class.

Let's also review the names of the files and make sure they are descriptive.

Fill out nupkg metadata completely

System information

OS version/distro: All
.NET Version (eg., dotnet --info): All

Issue

What did you do?

Inspect the NuGet metadata for Microsoft.ML

What happened?

Only some of the info is filled out. For example, for "description" it says "Package Description". It also doesn't have a link to release notes or license/project URL.

What did you expect?

I expected all the info to be filled out.

CpuMath: Sse code is executed when Avx is available, except on .NET Core 3.0

System information

Windows 10 Enterprise:
dotnet Version: 2.1.105:

Issue

What did you do?
I followed the tutorial https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet/get-started/windows to install SDK and create the sample app. I ran this example in release mode dotnet run -c Release and profiled it.
What happened?
The profiling shows Sse code is executed as shown below.
What did you expect?
Since my machine is Avx capable and AddScale has Avx version https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.CpuMath/Avx.cs#L721, I would expect the Avx version is used.

A few questions

Hey,

That's a great initiative to see a machine learning FW for .NET 👍

I have a couple of questions about the rationale behind the project:

Apart the difference between a pure .NET implementation and a mixed C++&.NET Bindings, what are the advantages/differences with CNTK?
What are the training algorithms supported? (e.g CNN, RNN?)
What is the plan about multi-machine, multi-CPU, GPU (and multi-GPU) support?
Is it aimed at providing an abstract API (and a default implementation) that could plug to any implementation behind (e.g CNTK?)?

Thanks!

Binary Training Data

As shown during the .NET Overview Session today at BUILD and as discussed with the ML Team there is only a text based reader for training data. Having a binary reader for training data would be highly beneficial to my use case.

URL to sample is incorrect in readme.md

The correct URL should be https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs

it changed with the latest pull request but the readme.md hasn't been updated.

Move Samples/UCI and ZBaseline Folders to Test/Data Folder

These folders should not be in the root of the repo.

Need to support more types than string and float

From @eerhardt on Apr 12, 2018, 6:52 AM PDT

See code in machinelearning/src/Microsoft.MachineLearning.EntryPoints/TextLoader.cs

	private string TypeToName(Type type) 
	{ 
	if (type == typeof(string)) 
	return "TX"; 
	else if (type == typeof(float) || type == typeof(double)) 
	return "R4"; 
	else 
	throw new Exception("Type not implemented or supported."); //Add more types. 
	}

(Refers to Lines 70 to 78 in 6e74d72)

We should fill all supported types out and add tests.

Samples for trainers

Looking for samples for supported trainers:

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers?view=ml-dotnet

Thanks

Add performance tests for scenarios we care about

From @KrzysztofCwalina on Apr 19, 2018, 2:41 PM PDT

No description provided.

Simple example using House Pricing Scenario

This is the first time I've looked into machine learning but I have a use case I'd like to test with it.

To get started I've created a simple example from the house pricing scenario which somewhat closely matches my use case but the results I'm getting are not at all close to what I expected. The data I'm providing is simply linear in terms of just the SqftLiving input parameter to the Price where Price = SqftLiving * 100. The SqftLot is held constant for training and prediction so it should be a non-factor.

I'm just trying to predict the Price when the SqftLiving is 1500 which with the linear model created by the provided data should make it about $150,000.

However, the results I get vary wildly from the negative to the postive 10's of millions every time I run the program which is unexpected. Could someone look into this simple example and let me know what if anything I'm doing is causing these poor results?

class Program
{
    static void Main(string[] args)
    {
        var filePath = "C://Temp/kc_house_data.csv";

        File.WriteAllText(filePath, @"100000,1000,8000
200000,2000,8000
400000,4000,8000");

        var pipeline = new LearningPipeline
        {
            new TextLoader<HousePriceData>(filePath, separator: ","),
            new ColumnConcatenator("Features", "SqftLiving", "SqftLot"),
            new StochasticDualCoordinateAscentRegressor()
        };

        var model = pipeline.Train<HousePriceData, HousePricePrediction>();

        var prediction = model.Predict(new HousePriceData { SqftLiving = 1500, SqftLot = 8000 });

        Console.WriteLine(prediction.Price);
        Console.ReadLine();
    }
}

public class HousePriceData
{
    [Column(ordinal: "0", name: "Label")]
    public float Price;

    [Column(ordinal: "1")]
    public float SqftLiving;

    [Column(ordinal: "2")]
    public float SqftLot;
}

public class HousePricePrediction
{
    [ColumnName("Score")]
    public float Price;
}

Building in Visual Studio

From @terrajobst on Mar 26

On a clean machine, opening the solution file and building in VS fails as the Tools folder doesn't exist. Just running init-tools.cmd doesn't fix it either as it now fails with malformed AssemblyInfo.cs files.
I'm still trying to get build working from the command line (where I get mostly actionable error messages, like Install CMake) this feels like the sort of thing that discourages contributors quickly. Ideally, you should be able to clone the repo, open the solution in VS, and building immediately.
Thoughts?

Build failed under OSX

System information

OS version/distro: OSX 10.12.6
.NET Version (eg., dotnet --info):

.NET Command Line Tools (2.1.4)

Product Information:
 Version:            2.1.4
 Commit SHA-1 hash:  5e8add2190

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.12
 OS Platform: Darwin
 RID:         osx.10.12-x64
 Base Path:   /usr/local/share/dotnet/sdk/2.1.4/

Microsoft .NET Core Shared Framework Host

  Version  : 2.0.5
  Build    : 17373eb129b3b05aa18ece963f8795d65ef8ea54

Issue

What did you do?

> git clone [email protected]:dotnet/machinelearning.git
> cd machinelearning
> ./build.sh

What happened?
Build failed
What did you expect?
ML goodness

Source code / logs

... truncated for brevity ...

  + cmake /Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native -G 'Unix Makefiles' -DCMAKE_BUILD_TYPE=Debug -DVERSION_FILE_PATH:STRING=/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/../../bin/obj/version.c
  -- The C compiler identification is Clang 3.5.1
  -- The CXX compiler identification is Clang 3.5.1
  -- Check for working C compiler: /usr/local/bin/clang-3.5
  -- Check for working C compiler: /usr/local/bin/clang-3.5 -- works
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Check for working CXX compiler: /usr/local/bin/clang++-3.5
  -- Check for working CXX compiler: /usr/local/bin/clang++-3.5 -- works
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Configuring done
  -- Generating done
EXEC : CMake warning (dev):  [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
    Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
    --help-policy CMP0042" for policy details.  Use the cmake_policy command to
    set the policy and suppress this warning.
  
    MACOSX_RPATH is not specified for the following targets:
  
     CpuMathNative
     FastTreeNative
  
  This warning is for project developers.  Use -Wno-dev to suppress it.
  
  -- Build files have been written to: /Users/justinormont/Documents/Microsoft/src/machinelearning/bin/obj/x64.Debug/Native
  + set +x
  Scanning dependencies of target CpuMathNative
  [  8%] Building CXX object CpuMathNative/CMakeFiles/CpuMathNative.dir/Sse.cpp.o
EXEC : error : unknown warning option '-Wno-unused-local-typedef' [-Werror,-Wunknown-warning-option] [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
  make[2]: *** [CpuMathNative/CMakeFiles/CpuMathNative.dir/Sse.cpp.o] Error 1
  make[1]: *** [CpuMathNative/CMakeFiles/CpuMathNative.dir/all] Error 2
  make: *** [all] Error 2
/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj(33,5): error MSB3073: The command "/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.sh --configuration Debug --arch x64 " exited with code 2.

Build FAILED.

EXEC : CMake warning (dev):  [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
EXEC : error : unknown warning option '-Wno-unused-local-typedef' [-Werror,-Wunknown-warning-option] [/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj]
/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.proj(33,5): error MSB3073: The command "/Users/justinormont/Documents/Microsoft/src/machinelearning/src/Native/build.sh --configuration Debug --arch x64 " exited with code 2.
    1 Warning(s)
    2 Error(s)

Time Elapsed 00:01:28.38
Command execution failed with exit code 1.

Attempts

I first suspected an older version of clang or cmake:

> clang++ --version
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin16.7.0
Thread model: posix

> cmake --version
cmake version 3.10.1

After updating:

> clang++ --version
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

> cmake --version
cmake version 3.11.1

Still no luck after updating clang++ & cmake.

Enabling LearningPipeline to use IEnumerable input data

ML.NET currently enables me to load in data through a file (e.g. CSV/TSV). However, I might have collected the dataset through another source and want to train on it without first saving it to a file.

Where is SentimentData defined?

System information

OS version/distro: Win10
.NET Version (eg., dotnet --info): 2.1.200

Issue

What did you do?
I'm try the examples in the readme.
What happened?

I installed Microsoft.ML package to a newly created .Net core console app.

And in the Program.cs file,

using System;
using Microsoft.ML;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;

namespace MLTry
{
    class Program
    {
        static void Main(string[] args)
        {
            var pipeline = new LearningPipeline();
            pipeline.Add(new TextLoader<SentimentData>(dataPath, separator: ","));
//...

It doesn't compile because SentimentData is not defined. And I searched for SentimentData in the repo but found nothing related.

What did you expect?
Found the type SentimentData.

Add ML.NET samples to https://github.com/dotnet/samples

From @KrzysztofCwalina on Apr 23, 2018, 8:38 AM PDT

No description provided

R and Python integration / interoperability

Will this library offer R and Python integration? Where is it on the roadmap? What kind of data transfer library/format will it use, Apache arrow? Something else?

It is important for solution architects to understand how ml.net is going to fit into the big data picture, it is necessary these days given that java, R and Python are dominant in this space.

Exception message missing parameter name

This message probably meant to have the parameter name in the string.

"System.ArgumentOutOfRangeException:  is missing ColumnAttribute
Parameter name: Name
   at Microsoft.ML.TextLoader`1.SetCustomStringFromType(Boolean useHeader, String separator, Boolean allowQuotedStrings, Boolean supportSparse, Boolean trimWhitespace)
   at Microsoft.ML.TextLoader`1..ctor(String inputFilePath, Boolean useHeader, String separator, Boolean allowQuotedStrings, Boolean supportSparse, Boolean trimWhitespace)
   at Microsoft.ML.NYCTaxiFare.Program.Train() in C:\Users\danmose\source\repos\ConsoleApp81\ConsoleApp81\Program.cs:line 25
   at Microsoft.ML.NYCTaxiFare.Program.Main(String[] args) in C:\Users\danmose\source\repos\ConsoleApp81\ConsoleApp81\Program.cs:line 17"

It's because it's thrown like
throw Contracts.ExceptParam(nameof(field.Name), " is missing ColumnAttribute");
and the ExceptParam does not prefix it for free:

  306:         public static Exception ExceptParam(string paramName, string msg)
  307              => Process(new ArgumentOutOfRangeException(paramName, msg));

Adding dataset and license for NYC Taxi Fare

Per conversation with CELA adding license and datasets.

Microsoft.ML.Scenarios.Top5Scenarios.TrainAndPredictIrisModelTest fails intermittently

Enable cross-validation for LearningPipeline

Cross-validation would make it easier to validate models if the data is not already split into train and test files (and generally helps give more precise metrics). This is already exposed as an entrypoint here, but needs to be enabled for LearningPipeline.

ML.NET Github tagging example from MS Build

I just attended to the session on BUILD 2018 with Scott Hanselman who showed an example of ML.NET for tagging Github pull requests. Is this example available somewhere?

Figure out a long term name for CpuMathNative

From @eerhardt on March 26th

from
- internal const string NativePath = @"Microsoft.MachineLearning.CpuMathNative.dll";
to
+ internal const string NativePath = "CpuMathNative";

Glad this worked, and thanks for filing the CoreCLR issue. As far as I can tell this should work on netcoreapp2.0 and desktop as well, but we should test that since this codepath has undergone a few changes in 2.1. Also we may consider adding something more specific to the name since we won't be able to rely on a fix for dotnet/coreclr#17150.

-rp.txt files are not getting generated on linux and mac

From @Anipik on May 2, 2018, 1:49 PM PDT

https://ci2.dot.net/job/Private/job/dotnet_machinelearning/job/master/job/linux_debug_prtest/388/console

13:43:16    Running as: TrainTest tr=LogisticRegression{l1=1.0 l2=0.1 ot=1e-3 nt=1} data=/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/test/data/breast-cancer.txt seed=1 test=/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/test/data/breast-cancer.txt out={/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/bin/AnyCPU.Debug/Microsoft.ML.Predictor.Tests/netcoreapp2.0/TestOutput/LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer-model.zip} dout={/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/bin/AnyCPU.Debug/Microsoft.ML.Predictor.Tests/netcoreapp2.0/TestOutput/LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer.txt}
13:43:16  Output matches baseline: 'LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer-out.txt'
13:43:16  *** Failure: Output file not found: /mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/bin/AnyCPU.Debug/Microsoft.ML.Predictor.Tests/netcoreapp2.0/TestOutput/LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer-rp.txt
13:43:16  Output matches baseline: 'LogisticRegression/LogisticRegression-norm-TrainTest-breast-cancer.txt'
13:43:16  Suffix of length 34 compared against sequence of length 42
13:43:16  Running 'LogisticRegression' on 'breast-cancer'
13:43:16    Running as: CV tr=LogisticRegression{l1=1.0 l2=0.1 ot=1e-3 nt=1} data=/mnt/resource/j/w/Private/dotnet_machinelearning/master/linux_debug_prtest/test/data/breast-cancer.txt seed=1 dout=

cc @danmosemsft @eerhardt @codemzs

Enabling different settings for the TextLoader on train and test data might lead to incorrect metrics with no error

The train and test data can be read with different TextLoaders, which makes it easy to introduce bugs. If there is a difference in the settings, the metrics are likely to be wrong but there are no errors.
Example:

string trainDataPath= "sentiment_data.tsv";
pipeline.Add(new TextLoader<SentimentData>(trainDataPath, useHeader: true));

// Later in the file, after completing the pipeline and training the model
string testDataPath = "sentiment_test.tsv";
var testData = new TextLoader<SentimentData>(testDataPath, userHeader: true, sep: ',');

var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

Evaluating on the test data will result in incorrect metrics as the test file will be parsed incorrectly. However, the experiment runs successfully with no errors, so this might be difficult to detect.

Bring CNTK model evaluation to this library

I had a great conversation with some of the team at Build today. I hope this is the right place - would love to see ML.Net bring evaluation of CNTK models. The team suggested I place a note here to hopefully help vote this up.
We use custom CNTK models for behind the scenes processing in our app to do various things. We do the evaluation in Azure Function, and would love to see a pure-.net deploy instead of the current deploy where we have to bin deploy lots of C++ libraries and manually work around getting those into a path location so they are callable via c#/managed code.

So, +1 for our team - we would love to see this feature add.

Add Evaluate overload that takes IDataView

From @eerhardt on May 1, 2018, 9:52 AM PDT

See comment From @glebuk created May 1, 2018, 8:52 AM PDT for file src/Microsoft.ML/Models/BinaryClassificationEvaluator.cs on line below:

public BinaryClassificationMetrics Evaluate(PredictionModel model, ILearningPipelineLoader testData)

Evaluate [](start = 43, length = 8)

We should add another overload to evaluate against a scored IDV. There you'd need to specify columns for actual, predicted labels and score. (ok to ship after build) #Closed

dotnet / machinelearning Goto Github PK

machinelearning's Introduction

Machine Learning for .NET

Getting started with machine learning and ML.NET

Roadmap

Operating systems and processor architectures supported by ML.NET

ML.NET NuGet packages status

Release notes

Using ML.NET packages

Building ML.NET (For contributors building ML.NET open source code)

Release process and versioning

Contributing

Community

Code examples

License

.NET Foundation

machinelearning's People

Contributors

Stargazers

Watchers

Forkers

machinelearning's Issues

System information

Issue

System information

Issue

Source code / logs

System information

Issue

System information

Issue

System information

Issue

Source code / logs

Attempts

System information

Issue

Recommend Projects

Recommend Topics

Recommend Org