Git Product home page Git Product logo

azureml's Issues

publish/consume data frame input example fails

This example fails, not sure yet why:

  f <- function(a,b,c,d) list(sum = a+b+c+d, prod = a*b*c*d)
  ep <-  publishWebService(ws,
                           f,
                           name = "rowSums",
                           inputSchema = list(
                             a="numeric",
                             b="numeric",
                             c="numeric",
                             d="numeric"
                           ),
                           outputSchema = list(
                             sum ="numeric",
                             prod = "numeric")
  )
  x <- head(iris[,1:4])  # First four columns of iris

  # Note the following will FAIL because of a name mismatch in the arguments
  # (with an informative error):
  consume(ep, x, retryDelay=1)
  # We need the columns of the data frame to match the inputSchema:
  names(x) <- letters[1:4]
  # Now we can evaluate all the rows of the data frame in one call:
  consume(ep, x)
  # output should look like:
  #    sum    prod
  #1 10.2   4.998
  #2  9.5   4.116
  #3  9.4  3.9104
  #4  9.4   4.278
  #5 10.2    5.04
  #6 11.4 14.3208

Add mechanism for rate limiting

As noted in #44, if requests are sent too frequently to the AzureML API, it is very easy to exceed the rate limit.

One possibility is to add multiple retry, possibly with exponential back-off (#48) to all functions that call the API

experiments() throws POSIXct warnings

Sample code:

ws <- workspace()
experiments(ws)

Result:

Warning messages:
1: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
  unable to identify current timezone 'C':
please set environment variable 'TZ'
2: In strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) :
  unknown timezone 'localtime'

Warn if trying to download a zipped dataset

Right now, download.dataset() throws an error when trying to download a Zip file. However, this error only happens after the API call returns.

We have two options:

  1. Display a helpful message and stop before making the API call
  2. Add ability to download Zip to local file system.

We should definitely do 1.

Should we also support the ability to download a Zip file?

License & copyright

What are the appropriate statements? Where do we need to place them? Thanks
David, please answer or re-assign as appropriate. I don't have access to OSS training.

Specify geolocation url in workspace object

At the moment, the user can specify the URL in each function.

It would be a cleaner implementation to specify this in the workspace object, and then each function reads the url from the workspace object.

In callAPI() break immediately for certain types of errors

Not all errors returned by a 400 response is because the service isn't available.

Sometimes the error code contains helpful debugging information.

In these cases, callAPI() should break immediately, rather than retrying several times.

publishWebservice() should throw an error if fun is not a function

I just spent several hours tracing some unexpected behaviour.

If the argument to publishWebservice() is a character (allowed in the previous incarnation of AzureML) then consume() always throws an error.

This throws an error:

ws <- workspace()
api <- publishWebService(
  ws,
  fun = "add", 
  name = "add",
  inputSchema = list(
    x = "numeric", 
    y = "numeric"
  ), 
  outputSchema = list(
    ans = "numeric"
  )
)

consume(api, df, retryDelay = 0.1)

Error:

> consume(api, df, retryDelay = 0.1)
Warning: Request failed with status 400. Retrying request...
List of 1
 $ error:List of 3
  ..$ code   : chr "LibraryExecutionError"
  ..$ message: chr "Module execution encountered an internal library error."
  ..$ details:'data.frame': 1 obs. of  3 variables:
  .. ..$ code   : chr "FailedToEvaluateRScript"
  .. ..$ target : chr "Execute R Script Piped (RPackage)"
  .. ..$ message: chr "The following error occurred during evaluation of R script: R_tryEval: return error: Error in do.call(\"..fun\", as.list(inputD"| __truncated__
NULL
 Show Traceback

 Rerun with Debug
 Error in callAPI(apiKey, requestUrl, requestsLists, globalParam, retryDelay) : 

Give more descriptive error message when zip isn't available

User report:

I’m trying this on Windows… when I try to use publishWebService, I get:

Error in publishWebService(ws, fun = add, name = "AzureML-vignette-add",  : 
  Requires external zip utility. Please install zip and try again.

But Windows already has a zip utility, doesn’t It? I able to zip and unzip files from the File Explorer.

I installed http://www.7-zip.org/ but that didn’t work either. Any ideas what zip utility must be installed and how to make it work?


Since the zip utility should also be in the path, we can make this message more clear and explicit.

Add MIT license text to the top of each file

As required by Microsoft policy. The text is:

------------------------------------------ START OF LICENSE -----------------------------------------
RA-Internal-azureml ver. 0.1
Copyright (c) Microsoft Corporation
All rights reserved. 
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ""Software""), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
----------------------------------------------- END OF LICENSE ------------------------------------------

Automatic discover of data frame classes in publishWebservice() throws error

Example:

ws <- workspace()

library(lme4)
set.seed(1)
model <- lmer(Reaction ~ Days + (Days | Subject), data=sleepstudy[sample(nrow(sleepstudy), 120),])
p <- function(newdata){
  predict(model, newdata=newdata)
}


ep <- publishWebService(ws, fun = p, name = "sleepy lmer 2",
                        inputSchema = sleepstudy,
                        data.frame=TRUE
)

Result:

 Error: sum(nb) == q is not TRUE 

Use roxygen for help documentation

Andrie said:

Do you think it’s possible to use roxygen to document the package? I think this is easier in the long run, but I have no idea how this will work with R6 functions.

Antonio said:

I think it would clash with the export stuff, but I am not an R6 expert, so I am not sure. The problem is that roxygen doesn't work with R unless you use a subset of it. Since I use higher order functions, and there is no way of documenting a dynamically generated one, it can't be used with most of my code, azureml being no exception. The whole design of roxygen is anti-modular and inspired by an inferior static language that forces people to use preprocessors. I am just waiting patiently for the moment Roxygen fades away.

Andrie says: I think there is a way to get the best of both worlds. Not having to learn latex must be a benefit. I'll do some experiments to see if I can convert your .rd files into sensible roxygen.

Use public workspace in examples and tests

API URL:

https://ussouthcentral.services.azureml.net/workspaces/f5e8e9bc4eed4034b78567449cfca779/services/b4caf0f0ebfd451bbc187741894e213b/execute?api-version=2.0&details=true

host <- "https
host <- "ussouthcentral.services.azureml.net
endpointId <- "b4caf0f0ebfd451bbc187741894e213b"
workspaceId <- "f5e8e9bc4eed4034b78567449cfca779"

Add minimal unit test framework

I propose a method like the following to include tests in the package:

  • Create a config.txt file that contains placeholder values for the API keys. (Or just instructions on setting up the keys)
  • Add config.R to .gitignore
  • The test user has to manually create config.R on his own machine. (The .gitignore ensures this never gets uploaded to github.)
  • The unit tests read values from config.R – if this file doesn’t exist, then no tests are run.

Remove dependency on purrr ?

Andrie said:

I notice that your toolchain depends on the packages R6 and purr. The purr dependency worries me a little bit. The package isn’t on CRAN yet, so presumed to be still unstable? (minor concern)

Antonio said:

It seems stable enough to me but I am worried about distribution. What will happen when people try to install azureml from CRAN? I don't think there's a way the dependency will be fulfilled automatically. Let's think for a moment if there is an alternative to dropping the dependency. It improves code quality by quite a bit. It's a great package.

Update vignette

The vignette still uses the original publishWebservice() API

Use idiomatic R for API

Andrie said

Are you open to suggestions for what the API looks like?

For example, your current code has this:

irisaz1 = ws$datasets$get.item("Iris Two Class Data")
irisaz1$as.data.frame()

Perhaps we could define S3 methods to simplify the user experience to something like:

irisaz1 = ws$datasets["Iris Two Class Data"]
as.data.frame(irisaz1)

Antonio responded:

Absolutely. The goal right now is python parity, and being aligned with the equivalent python package has cut dev time and it's going to be easier to track what they do. But as far as the API, you noticed everything is public. That can't stay that way. It's more than accepting suggestions. Let's write an API

I just found out that s3 dispatch works on r6 classes with little fuss (http://stackoverflow.com/questions/28117585/proper-way-to-implement-s3-dispatch-on-r6-classes) so this is possible. As far as complexity:

irisaz1$as.data.frame()
        # VS
as.data.frame(irisaz1)

I have seen code complexity. That is not it.

Maybe more idiomatic R, yes. But that was a method that clearly lent itself to be associated with a known and loved generic. What should we do with get.item, datasets etc? Maybe there is a mapping, but it's not so clear to me. Using some generics and some methods, maybe, let me think about that. As far as following python, I think we can keep mimicking the object structure there with R6 objects and then layer generics on top of them as in

as.data.frame.Dataset = function(data) data$as.data.frame()

It is that simple, courtesy Winston Chang. Let's just keep these generics in a separate file because even the file structure follows the python original. You even find python in the comments (gradually being removed). Then when with restrict the exports, we can point mostly at the generic functions. Right now my focus is still on getting the functionality in, so if you want to lay it out, even just the names, but I have to focus on uploads.

Read workspace data from json file by default

The data is:

{"workspace":{
  "id":"test_id",
  "authorization_token": "test_token",
  "api_endpoint":"api_endpoint",
  "management_endpoint":"management_endpoint"
}}

The idea here is that if workspace() is called with no parameters we should try and read ~/.azureml/settings.json and get the workspace id/authorization token/end points and use the information there. That way we can keep the tokens out of the notebooks.

read settings file

equivalent to get_workspace_info in python. Not user accessible, provides default for Workspace constructor

Add ability to read .dataset formats (deserialization)

The .dataset format is used as the output of most modules in ML Studio (intermediate datasets). For example, the Split module results are in that format.

Studio currently disables the Generate Data Access Code and Open in Notebook features on those output nodes due to lack of deserialization support for that format in Python.

To access those intermediate datasets from Python code, the user needs to insert a Convert to CSV module. Note that this conversion loses some metadata, such as column type information. Pandas can infer the types most of the time, but sometimes it requires user post-processing.

Modify mechanism to read JSON file in unit tests

The current mechanism is:

keyfile <- system.file("tests/testthat/config.json", package = "AzureML")

This is flawed since this json file can't be found during package build.

The fix might be to put this json file in /inst/...

Use exponential backoff when encountering 503 errors in consumeWebservice

A response of 503 means either:

  1. The webservice is getting requests more than what its configured for (steady state error)
  2. The webservice is initializing containers and AzureML is serializing the initialization (because initializing new containers takes time). This happens when there is a spike.

If the service encounters 503, it should retry with exponential backoff (up to n times per your business logic).

Note that the time it takes to initialize a container is dependent on the nature of the model that you are uploading.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.