claimed-framework / component-library Goto Github PK

The goal of CLAIMED is to enable low-code/no-code rapid prototyping style programming to seamlessly CI/CD into production.

License: Apache License 2.0

Jupyter Notebook 97.57% Python 0.78% Shell 0.18% Dockerfile 0.36% Common Workflow Language 1.11%

data-science machine-learning

component-library's People

Stargazers

Watchers

Forkers

drboltzmann rojaster richardcattell raina140291 vishal-official velsarav charlesdlandau srvds brunodeabreu juneetoile lilyluo26 freyreeste itdata5 amanrana20 bineshjose andreasibm rpj911 jacobamiller eleuuterio jeroendh itzikmaoz baderrasheed sadashivpal jaykaphle atulgupta70 amangitcode vatikaki65 aswaniarisetty saurabhkumar8112 yunus111 jacoboggleon gedman4b anandprabhakar0507 vsivakumar9 ypan1047 yulepan hnetss chandanabasani willhess92 direkshan-digital kaysemca leandrohmvieira azmikamis levgr sgaied inamlan kapoorsande shaktisingh2323 speaker90 adawimo jessezhuang nikhil-bollapragada yvanss vikram216 shahir123 sid321axn tommypratama lukamrpython erkaya gs03-06-kiminori-isaka swapnil2095 yao-c joebo2014 bbdamodaran modfa harshcic gbemileke anypm nathantarbox osamasarhan labeebee samo4fun orlandoescobosa aboussetta maxwellsarmento fer408 mahmoud-nabil nmanchev iambenn albo-ai alamelum rezaabdullah ramana24 timbake saranratchk shan4224 dgsudheer furious-sml tayojabar yinghuisusanzeng pamelasantos girivallepu jxtrbtk matthijskeep sophiadr mrinmoysh stabuev justinpenton robertsoh gsezenoktem

component-library's Issues

Wiki FAQ: Lambda Functions Update and Formatting

I wanted to push an update for the FAQ in the Wiki section. I just cleaned up and added a little bit to the section about Lambda notation. Below is the updated replacement.

# syntax error, lambda notation python 2.7 vs 3.x

<style type="text/css">
.container {
  display: inline-grid;
  align-items: center;
  justify-content: center;
}

.cap-map {
    text-align: center;
    font-size: 24px;
    text-decoration: none;
    vertical-align: middle;
}

.header-cell {
  text-align: center;
  font-weight: bold;
  vertical-align: middle;
}

.center-cell {
    text-align: center;
    vertical-align: middle;
}

.function-cell {
    text-align: left;
    font-family: monospace;
    white-space: pre;
    vertical-align: middle;
}
</style>

<div class="container">
    <table>
        <caption class="cap-map">Mapping RDDs with Single Values</caption>
        <thead>
            <tr>
                <th class="header-cell">Python Version</th>
                <th class="header-cell">Solution #</th>
                <th class="header-cell">Lambda Function</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td class="center-cell">2.7</td>
                <td class="center-cell">1</td>
                <td class="function-cell">result_array_ts = result_rdd.map(lambda (ts, voltage): ts).collect()</td>
            </tr>
            <tr>
                <td class="center-cell">3.x</td>
                <td class="center-cell">1</td>
                <td class="function-cell">result_array_ts = result_rdd.map(lambda ts_voltage: ts_voltage[0]).collect()</td>
            </tr>
            <tr>
                <td class="center-cell">3.x</td>
                <td class="center-cell">2</td>
                <td class="function-cell">result_array_voltage = result_rdd.map(lambda ts_voltage: ts_voltage[1]).collect()</td>
            </tr>        
        </tbody>
    </table>
    <br />
    <table>
        <caption class="cap-map">Mapping RDDs with Multiple Values</caption>
        <thead>
            <tr>
                <th class="header-cell">Python Version</th>
                <th class="header-cell">Lambda Function</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td class="center-cell">2.7</td>
                <td class="function-cell">rdd.map(lambda (x, y): pow(x * y, 2)).sum()</td>
            </tr>
            <tr>
                <td class="center-cell">3.x</td>
                <td class="function-cell">rdd.map(lambda xy: pow(xy[0] * xy[1], 2)).sum()</td>
            </tr>
        </tbody>
    </table>
</div>

The less-fancy versions (HTML and markdown) are:

Mapping RDDs with Multiple Values

Python Version	Solution #	Lambda Function
2.7	1	result_array_ts = result_rdd.map(lambda (ts, voltage): ts).collect()
3.x	1	result_array_ts = result_rdd.map(lambda ts_voltage: ts_voltage[0]).collect()
3.x	2	result_array_voltage = result_rdd.map(lambda ts_voltage: ts_voltage[1]).collect()

Mapping RDDs with Multiple Values

Python Version	Lambda Function
2.7	rdd.map(lambda (x, y): pow(x * y, 2)).sum()
3.x	rdd.map(lambda xy: pow(xy[0] * xy[1], 2)).sum()

Mapping RDDs with Single Values

Python Ver.	Solution #	Lambda Func.
2.7	1	`result_array_ts = result_rdd.map(lambda (ts, voltage): ts).collect()`
3.x	1	`result_array_ts = result_rdd.map(lambda ts_voltage: ts_voltage[0]).collect()`
3.x	2	`result_array_voltage = result_rdd.map(lambda ts_voltage: ts_voltage[1]).collect()`

Mapping RDDs with Multiple Values

Python Ver.	Lambda Func.
2.7	`rdd.map(lambda (x, y): pow(x * y, 2)).sum()`
3.x	`rdd.map(lambda xy: pow(xy[0] * xy[1], 2)).sum()`

Good idea but...

Idea of : Write once, re-use anywhere is good.

Think it woyld be good to remove dependency on jupyter
and only allows :

python base code 
CLI

to ensure more portability.

Have got similar idea :
pip install utilmy
A collection of One Liner to do tasks

         df = pd_read_file ( from anywhere S3, local, Hadooop, .  any format)

Got "IndexError: list index out of range" when following spark_setup_anaconda.ipynb

At first intended to install spark into the local computer and so ran the code provided from spark_setup_anaconda.ipynb to install Spark. However, got "IndexError: list index out of range"

`---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
8 os.environ["SPARK_HOME"] = "/home/dsxuser/work/spark-2.3.4-bin-hadoop2.7"
9 import findspark
---> 10 findspark.init()
11 from pyspark import SparkContext, SparkConf
12 from pyspark.sql import SQLContext, SparkSession

~\Anaconda3\lib\site-packages\findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
133 # add pyspark to sys.path
134 spark_python = os.path.join(spark_home, 'python')
--> 135 py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0]
136 sys.path[:0] = [spark_python, py4j]
137

IndexError: list index out of range`

Watson Studio offers 50 execution hours per month

@romeokienzler I am unable to select the Free environment anymore on Watson Studio, for Notebooks. However, I get the error message "Watson Studio no longer offers free environments. Select another environment." while I try to open my previous notebooks created in the free Python instance.

Simplify creating multidimensional array

You can get rid of 2nd map if you use square brackets instead of curved brackets in the first map.

data = column1.zip(column2).zip(column3).zip(column4).map(lambda a_b_c_d : [a_b_c_d[0][0][0],a_b_c_d[0][0][1],a_b_c_d[0][1],a_b_c_d[1]] )

AttributeError: encode when executing the code provided

Improve test coverage

Describe the issue
Components are added at a higher pace than corresponding automated tests. We are in the process of only accepting new components with corresponding tests but still, there is a gap to be filled

To Reproduce
Steps to reproduce the behavior:

assess test coverage after an automated build
see the amount of test coverage

Expected behavior
100% test coverage on statemet coverage level

Runtime for Notebook (Step 3 of Environment Setup)

Python version 3.5 1vCPU and 4GB ram isn't an option anymore.
Either I am doing something wrong or maybe the setup tutorial needs to be updated.

Regards,
Arjun

Meta issue for newcomers to provide new components

New components are always welcome. Feel free to create a feature request to provide information what you are planning to deliver. If you feel confident you can also just contribute new components via PR

IBM Cloud Account Setup Issue

Hello, I cannot proceed to the link of IBM cloud setup. Could you correct the link?

notebook asset not showing

On Add to Project, the Notebook option is not showing.
So, how to add notebook to project.

dataplatform.cloud.ibm.com links to wiki

On the Waston Studio Setup Wiki https://github.com/IBM/coursera/wiki/Watson-Studio-Setup
dataplatform.cloud.ibm.com links to https://github.com/IBM/coursera/wiki/dataplatform.cloud.ibm.com

how to add a new notebook?

I watched the video and i am having hard time to add a new notebook to submit my assignment

: A question and require your guidance on the Course : Advanced Data Science with IBM Specialization on Coursera

I have enrolled for the course Advanced Data Science with IBM specialization and completed Week 1 through 4 partially. https://www.coursera.org/specializations/advanced-data-science-ibm

Please note, I am a new to coding person with only minimal knowledge on Phyton - based on my course on www.cognitiveclass.ai.

In this, on week 2, 3, when try to complete the programming assignments on Spark and tried to set up and learn, I am stuck. Unfortunately, the guidelines video hosted here : is also not available / working.

I have also posted this query on the discussions forum there in Coursera last week and awaiting advise.
Ref : https://www.coursera.org/learn/ds/discussions/weeks/2/threads/Y40fZ4hDEemB1RILjckZRA

However I am worried that I might loose time in this process.

In fact, I require advise on how and where to start the environment and continue.

Could you please guide and advise me for successful learning of this course.

nane1

mlkil

Typo's in transformation functions

I'm currently working through IBM's coursera notebooks, and there appear to be some errors in the .ipynb's for certain transformations. Specifically:

"claimed/component-library/transform/spark-csv-to-parquet.ipynb" : destination path and parqet filename is stored in a variable "output_data_parquet" (third code cell). In code cell 5: data_dir + data_parquet fails to run because data_parquet is not defined. I think this should be output_data_parquet as appears in the eighth code cell.

"claimed/component-library/transform/spark-sql.ipynb" : In cell 4, where the environment variables are defined, "data_dir" is defined twice. The first occurance appears to be correct based on the comment. The second occurance appears to be incorrect, as the comment suggests it should be a sql query. As a result, in cell 7, the variable "sql" is not defined. I think that the second occurance of data_dir should really be a line along the lines of: "sql = os.environ.get('sql_query, 'select * from df')"

Document from Refat Fatima

Code for the liberary management.txt

AssignmentML4.ipynb not compatible with Python 3.9 which is the only one available on Watson

@romeokienzler week 4 assignment of the Advanced machine learning and signal processing course on Coursera needs some updates. Many students and I are stuck on it and can't validate the course because of it. It seems that the provided systemml version is not compatible with Python 3.9, but works only until Python 3.8, which for some reason can't be selected in Watson studio as it always says "Notebook environments with Python 3.8 are restricted to notebooks that are already using them. Choose an environment with Python 3.9 instead."
Here are the available versions (the notebook doesn't work with the option Default Python 3.8 + Watson NLP (beta)):

The error occurs in cell 6 when calling MLContext(spark). This is the error: TypeError: 'JavaPackage' object is not callable

Thank you.

Remove deprecatred docker code

Describe the issue
The new C3 doesn't need inlin docker code

To Reproduce
Open a notebook and search for 'docker' - it should not find anything

ValueError: x and y must have same first dimension, but have shapes (118,) and (134,)

Tried to create a run chart using two lists - but held back due to sample data returned in second query is not consistent -

result = spark.sql("select temperature from washing where temperature is not null")
print(result.count())
result_array = result.rdd.map(lambda row : row.temperature).sample(False,0.1).collect()
len(result_array)

**printed below as expected -
1342
134

result_ts = spark.sql("select ts from washing where temperature is not null")
print(result_ts)
result_array_ts = result_ts.rdd.map(lambda row : row.ts).sample(False,0.1).collect()
len(result_array_ts)
** Was able to verify result_ts as '1342' but sample is not giving 10% of given data, i.e. result_array_ts returned 118 records

plt.plot(result_array_ts,result_array)
plt.xlabel("time")
plt.ylabel("temperature")
plt.show()
** failed to create run chart with the error mentioned
"ValueError: x and y must have same first dimension, but have shapes (118,) and (134,)"

Build broken although not showing error

Found this error (due to branch protection)

https://github.com/claimed-framework/component-library/actions/runs/9515342979/job/26229386335#step:6:26

Push all components to latest spec of the component compiler

Describe the issue
The components and also the component compiler are evolving over time. This means not all components (e.g., notebooks) are sticking to the latest standards - therefore the claimed component compiler fails in the automated build for some of the components

To Reproduce
Steps to reproduce the behavior:

commit to main
wait for the build to be triggered
observe the errors of the component compiler

Expected behavior
All components are build successfully

showing uploads is desable

Create component for image tiling

Is your feature request related to a problem? Please describe.
The image tiling problem is part of nearly every computer vision pipeline. This component should read arbitrary image formats and create the desired tiles (also using sliding and tumbling windows and with configurable stride size
In case of geospatial image formats like geotiff/cog the tile's metadata should be kept / adjusted in the tiles. E.g. coordinates

Push container images for all components automatically to the container registries

Describe the issue
Currently, pushing the components to the container registries doesn't work automatically

To Reproduce
Steps to reproduce the behavior:

Trigger a build
Observe that the new containers are not pushed

Expected behavior
Repository and container registry are in sync

Link Requires Credit Card Details

When creating account using given link, it still requires credit card information.

XLM

Can't register because of credit card authorization issue

First of all, you do need a credit card information to register an account
Also, the cc verification doesn't work and I can't register the account - I tried 3 different cc.
I sent a message to support but they keep sending reply to my IBM account to where I can't log in and therefore I can't read the reply.
So, please let me know how I can register without creditt card information or when you going to resolve cc verification issue.

Raos problem

https://www.coursera.org/learn/ds/discussions/all/threads/OD-do7OsEeiTjBLDL4RcNA

The "logging" module import is forgotten in "claimed/component-library/output/upload-to-cos.ipynb"

After using the claimed library for practicing purpose in a coursera course, i got this error "NameError: name ‘logging’ is not defined?" solved by importing this module in the "upload-to-cos.ipynb".

By the way thank you guys for the "Data Engineering and Machine Learning using Spark" course ;)

Tom's problem

https://www.coursera.org/learn/ds/discussions/all/threads/h207-6B5Eeiz4w5JxTItyg

Final assigment error

I only change the country and get this error

Git_my_project

Learning Github is a dream come true

Java port number issue

The installation is succesful but when trying to execute the code . Getting the following error
Java gateway process exited before sending the driver its port number

Clean up main branch

Describe the issue
Remove unneccesary files and folder and move them to a branch

Preserve metadata on geospatial images when tiling

Is your feature request related to a problem? Please describe.
When using the image tiling operator on geotiff or cloud optimized geotiff, the tiled images are not conaining any geospatial metadata anymore

Describe the solution you'd like
Copy all metadata over to the tiles, adjust the relevant information (e.g. lon/lat coordinated must be re-computed)

Programming Assignment: Week 2 Programming Assignment

I'm having troubles to access the dataset to do the exercice. I executed the command "spark.catalog.listTables()" but is empty, how can I import the data to execute the exercice?

'JavaPackage' object is not callable in week 4 of ml course (course 2)

Hello, I encounter the same problem in the following post. There is no option for me to create a notebook in Spark 2.1 on IBM Watson. So what is the work around for it?
https://www.coursera.org/learn/advanced-machine-learning-signal-processing/discussions/weeks/4/threads/gL2TijzcEemtzA4VZFkW2g

outdated pyspark installs

Notebook: https://github.com/IBM/claimed/blob/master/coursera_ds/assignment1.2.ipynb (and later assignments in the coursera_ds directory) uses !pip install pyspark==2.4.5 under the assumption that the user set up an environment with Python 3.6 (https://github.com/IBM/claimed/wiki/Watson-Studio-Setup). Python 3.6 is not available... the only Python environment that I'm able to create is 3.9, which is incompatible with 2.4.5. It's not clear whether I'm okay commenting out the version on pyspark and using the current version (3.3). Given I don't know pyspark (hence, why I'm taking the coursera class), I don't know if things down the road will break by switching to 3.3. At any rate, the notebook should not force the installation of an outdated pyspark version if such a version is not compatible with the Python environments that are available.

Applied AI with DeepLearning Course -Lock questionary

Hi, the Questionaries the weeks two and three are Locked , please , i need to that someone Unlock the questionary to continue with the course.

I have not pendient task from the firts week and i have a automatic pay for a deep learning course.

Regards !

Unable to use system ml 1.3 snapshot for coursera Advanced ML and signal processing course.

@romeokienzler pip installing this package isn't leading to a system ml version upgrade in the watson studio.
https://github.com/IBM/coursera/blob/master/systemml-1.3.0-SNAPSHOT-python.tar.gz
Please refer to the following thread on coursera

How to recieve XLM from lobstr

Originally posted by @Scottyoun in #180

Improve functionality of CLAIMED CLI

The CLAIMED CLI allows all CLAIMED components to be used from the command line in the form:

claimed <component_name:version> <parameters, ...>

claimed claimed-util-cos:0.3 access_key_id=xxx secret_access_key=xxx endpoint=https://s3.us-east.cloud-object-storage.appdomain.cloud bucket_name=era5-cropscape-zarr path=/ recursive=True operation=ls

This is implemented as an exemplar for the claimed-util-cos component but not yet generic.

Making this generic (the CLI can work with any CLAIMED component and make it available via CLI automatically) will allow to use and CLAIMED component from the terminal and in shell scripts

Possibly the intent is to point to here: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/dataframe.html.

claimed-framework / component-library Goto Github PK

component-library's People

Stargazers

Watchers

Forkers

component-library's Issues

Mapping RDDs with Single Values

Mapping RDDs with Multiple Values

Recommend Projects

Recommend Topics

Recommend Org