claimed-framework / component-library Goto Github PK
View Code? Open in Web Editor NEWThe goal of CLAIMED is to enable low-code/no-code rapid prototyping style programming to seamlessly CI/CD into production.
License: Apache License 2.0
The goal of CLAIMED is to enable low-code/no-code rapid prototyping style programming to seamlessly CI/CD into production.
License: Apache License 2.0
I wanted to push an update for the FAQ in the Wiki section. I just cleaned up and added a little bit to the section about Lambda notation. Below is the updated replacement.
# syntax error, lambda notation python 2.7 vs 3.x
<style type="text/css">
.container {
display: inline-grid;
align-items: center;
justify-content: center;
}
.cap-map {
text-align: center;
font-size: 24px;
text-decoration: none;
vertical-align: middle;
}
.header-cell {
text-align: center;
font-weight: bold;
vertical-align: middle;
}
.center-cell {
text-align: center;
vertical-align: middle;
}
.function-cell {
text-align: left;
font-family: monospace;
white-space: pre;
vertical-align: middle;
}
</style>
<div class="container">
<table>
<caption class="cap-map">Mapping RDDs with Single Values</caption>
<thead>
<tr>
<th class="header-cell">Python Version</th>
<th class="header-cell">Solution #</th>
<th class="header-cell">Lambda Function</th>
</tr>
</thead>
<tbody>
<tr>
<td class="center-cell">2.7</td>
<td class="center-cell">1</td>
<td class="function-cell">result_array_ts = result_rdd.map(lambda (ts, voltage): ts).collect()</td>
</tr>
<tr>
<td class="center-cell">3.x</td>
<td class="center-cell">1</td>
<td class="function-cell">result_array_ts = result_rdd.map(lambda ts_voltage: ts_voltage[0]).collect()</td>
</tr>
<tr>
<td class="center-cell">3.x</td>
<td class="center-cell">2</td>
<td class="function-cell">result_array_voltage = result_rdd.map(lambda ts_voltage: ts_voltage[1]).collect()</td>
</tr>
</tbody>
</table>
<br />
<table>
<caption class="cap-map">Mapping RDDs with Multiple Values</caption>
<thead>
<tr>
<th class="header-cell">Python Version</th>
<th class="header-cell">Lambda Function</th>
</tr>
</thead>
<tbody>
<tr>
<td class="center-cell">2.7</td>
<td class="function-cell">rdd.map(lambda (x, y): pow(x * y, 2)).sum()</td>
</tr>
<tr>
<td class="center-cell">3.x</td>
<td class="function-cell">rdd.map(lambda xy: pow(xy[0] * xy[1], 2)).sum()</td>
</tr>
</tbody>
</table>
</div>
The less-fancy versions (HTML and markdown) are:
Mapping RDDs with Multiple ValuesPython Version | Solution # | Lambda Function |
---|---|---|
2.7 | 1 | result_array_ts = result_rdd.map(lambda (ts, voltage): ts).collect() |
3.x | 1 | result_array_ts = result_rdd.map(lambda ts_voltage: ts_voltage[0]).collect() |
3.x | 2 | result_array_voltage = result_rdd.map(lambda ts_voltage: ts_voltage[1]).collect() |
Python Version | Lambda Function |
---|---|
2.7 | rdd.map(lambda (x, y): pow(x * y, 2)).sum() |
3.x | rdd.map(lambda xy: pow(xy[0] * xy[1], 2)).sum() |
Python Ver. | Solution # | Lambda Func. |
---|---|---|
2.7 | 1 | result_array_ts = result_rdd.map(lambda (ts, voltage): ts).collect() |
3.x | 1 | result_array_ts = result_rdd.map(lambda ts_voltage: ts_voltage[0]).collect() |
3.x | 2 | result_array_voltage = result_rdd.map(lambda ts_voltage: ts_voltage[1]).collect() |
Python Ver. | Lambda Func. |
---|---|
2.7 | rdd.map(lambda (x, y): pow(x * y, 2)).sum() |
3.x | rdd.map(lambda xy: pow(xy[0] * xy[1], 2)).sum() |
Idea of : Write once, re-use anywhere is good.
Think it woyld be good to remove dependency on jupyter
and only allows :
python base code
CLI
to ensure more portability.
Have got similar idea :
pip install utilmy
A collection of One Liner to do tasks
df = pd_read_file ( from anywhere S3, local, Hadooop, . any format)
At first intended to install spark into the local computer and so ran the code provided from spark_setup_anaconda.ipynb to install Spark. However, got "IndexError: list index out of range"
`---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in
8 os.environ["SPARK_HOME"] = "/home/dsxuser/work/spark-2.3.4-bin-hadoop2.7"
9 import findspark
---> 10 findspark.init()
11 from pyspark import SparkContext, SparkConf
12 from pyspark.sql import SQLContext, SparkSession
~\Anaconda3\lib\site-packages\findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
133 # add pyspark to sys.path
134 spark_python = os.path.join(spark_home, 'python')
--> 135 py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0]
136 sys.path[:0] = [spark_python, py4j]
137
IndexError: list index out of range`
@romeokienzler I am unable to select the Free environment anymore on Watson Studio, for Notebooks. However, I get the error message "Watson Studio no longer offers free environments. Select another environment." while I try to open my previous notebooks created in the free Python instance.
You can get rid of 2nd map if you use square brackets instead of curved brackets in the first map.
data = column1.zip(column2).zip(column3).zip(column4).map(lambda a_b_c_d : [a_b_c_d[0][0][0],a_b_c_d[0][0][1],a_b_c_d[0][1],a_b_c_d[1]] )
Describe the issue
Components are added at a higher pace than corresponding automated tests. We are in the process of only accepting new components with corresponding tests but still, there is a gap to be filled
To Reproduce
Steps to reproduce the behavior:
Expected behavior
100% test coverage on statemet coverage level
Python version 3.5 1vCPU and 4GB ram isn't an option anymore.
Either I am doing something wrong or maybe the setup tutorial needs to be updated.
Regards,
Arjun
New components are always welcome. Feel free to create a feature request to provide information what you are planning to deliver. If you feel confident you can also just contribute new components via PR
Hello, I cannot proceed to the link of IBM cloud setup. Could you correct the link?
On the Waston Studio Setup Wiki https://github.com/IBM/coursera/wiki/Watson-Studio-Setup
dataplatform.cloud.ibm.com links to https://github.com/IBM/coursera/wiki/dataplatform.cloud.ibm.com
I watched the video and i am having hard time to add a new notebook to submit my assignment
I have enrolled for the course Advanced Data Science with IBM specialization and completed Week 1 through 4 partially. https://www.coursera.org/specializations/advanced-data-science-ibm
Please note, I am a new to coding person with only minimal knowledge on Phyton - based on my course on www.cognitiveclass.ai.
In this, on week 2, 3, when try to complete the programming assignments on Spark and tried to set up and learn, I am stuck. Unfortunately, the guidelines video hosted here : is also not available / working.
I have also posted this query on the discussions forum there in Coursera last week and awaiting advise.
Ref : https://www.coursera.org/learn/ds/discussions/weeks/2/threads/Y40fZ4hDEemB1RILjckZRA
However I am worried that I might loose time in this process.
In fact, I require advise on how and where to start the environment and continue.
Could you please guide and advise me for successful learning of this course.
mlkil
I'm currently working through IBM's coursera notebooks, and there appear to be some errors in the .ipynb's for certain transformations. Specifically:
"claimed/component-library/transform/spark-csv-to-parquet.ipynb" : destination path and parqet filename is stored in a variable "output_data_parquet" (third code cell). In code cell 5: data_dir + data_parquet fails to run because data_parquet is not defined. I think this should be output_data_parquet as appears in the eighth code cell.
"claimed/component-library/transform/spark-sql.ipynb" : In cell 4, where the environment variables are defined, "data_dir" is defined twice. The first occurance appears to be correct based on the comment. The second occurance appears to be incorrect, as the comment suggests it should be a sql query. As a result, in cell 7, the variable "sql" is not defined. I think that the second occurance of data_dir should really be a line along the lines of: "sql = os.environ.get('sql_query, 'select * from df')"
@romeokienzler week 4 assignment of the Advanced machine learning and signal processing course on Coursera needs some updates. Many students and I are stuck on it and can't validate the course because of it. It seems that the provided systemml version is not compatible with Python 3.9, but works only until Python 3.8, which for some reason can't be selected in Watson studio as it always says "Notebook environments with Python 3.8 are restricted to notebooks that are already using them. Choose an environment with Python 3.9 instead."
Here are the available versions (the notebook doesn't work with the option Default Python 3.8 + Watson NLP (beta)):
The error occurs in cell 6 when calling MLContext(spark). This is the error: TypeError: 'JavaPackage' object is not callable
Thank you.
Describe the issue
The new C3 doesn't need inlin docker code
To Reproduce
Open a notebook and search for 'docker' - it should not find anything
Tried to create a run chart using two lists - but held back due to sample data returned in second query is not consistent -
result = spark.sql("select temperature from washing where temperature is not null")
print(result.count())
result_array = result.rdd.map(lambda row : row.temperature).sample(False,0.1).collect()
len(result_array)
**printed below as expected -
1342
134
result_ts = spark.sql("select ts from washing where temperature is not null")
print(result_ts)
result_array_ts = result_ts.rdd.map(lambda row : row.ts).sample(False,0.1).collect()
len(result_array_ts)
** Was able to verify result_ts as '1342' but sample is not giving 10% of given data, i.e. result_array_ts returned 118 records
plt.plot(result_array_ts,result_array)
plt.xlabel("time")
plt.ylabel("temperature")
plt.show()
** failed to create run chart with the error mentioned
"ValueError: x and y must have same first dimension, but have shapes (118,) and (134,)"
Found this error (due to branch protection)
Describe the issue
The components and also the component compiler are evolving over time. This means not all components (e.g., notebooks) are sticking to the latest standards - therefore the claimed component compiler fails in the automated build for some of the components
To Reproduce
Steps to reproduce the behavior:
Expected behavior
All components are build successfully
Is your feature request related to a problem? Please describe.
The image tiling problem is part of nearly every computer vision pipeline. This component should read arbitrary image formats and create the desired tiles (also using sliding and tumbling windows and with configurable stride size
In case of geospatial image formats like geotiff/cog the tile's metadata should be kept / adjusted in the tiles. E.g. coordinates
Describe the issue
Currently, pushing the components to the container registries doesn't work automatically
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Repository and container registry are in sync
When creating account using given link, it still requires credit card information.
First of all, you do need a credit card information to register an account
Also, the cc verification doesn't work and I can't register the account - I tried 3 different cc.
I sent a message to support but they keep sending reply to my IBM account to where I can't log in and therefore I can't read the reply.
So, please let me know how I can register without creditt card information or when you going to resolve cc verification issue.
After using the claimed library for practicing purpose in a coursera course, i got this error "NameError: name ‘logging’ is not defined?" solved by importing this module in the "upload-to-cos.ipynb".
By the way thank you guys for the "Data Engineering and Machine Learning using Spark" course ;)
Learning Github is a dream come true
The installation is succesful but when trying to execute the code . Getting the following error
Java gateway process exited before sending the driver its port number
Describe the issue
Remove unneccesary files and folder and move them to a branch
Is your feature request related to a problem? Please describe.
When using the image tiling operator on geotiff or cloud optimized geotiff, the tiled images are not conaining any geospatial metadata anymore
Describe the solution you'd like
Copy all metadata over to the tiles, adjust the relevant information (e.g. lon/lat coordinated must be re-computed)
I'm having troubles to access the dataset to do the exercice. I executed the command "spark.catalog.listTables()" but is empty, how can I import the data to execute the exercice?
Hello, I encounter the same problem in the following post. There is no option for me to create a notebook in Spark 2.1 on IBM Watson. So what is the work around for it?
https://www.coursera.org/learn/advanced-machine-learning-signal-processing/discussions/weeks/4/threads/gL2TijzcEemtzA4VZFkW2g
Notebook: https://github.com/IBM/claimed/blob/master/coursera_ds/assignment1.2.ipynb (and later assignments in the coursera_ds directory) uses !pip install pyspark==2.4.5
under the assumption that the user set up an environment with Python 3.6 (https://github.com/IBM/claimed/wiki/Watson-Studio-Setup). Python 3.6 is not available... the only Python environment that I'm able to create is 3.9, which is incompatible with 2.4.5. It's not clear whether I'm okay commenting out the version on pyspark and using the current version (3.3). Given I don't know pyspark (hence, why I'm taking the coursera class), I don't know if things down the road will break by switching to 3.3. At any rate, the notebook should not force the installation of an outdated pyspark version if such a version is not compatible with the Python environments that are available.
Hi, the Questionaries the weeks two and three are Locked , please , i need to that someone Unlock the questionary to continue with the course.
I have not pendient task from the firts week and i have a automatic pay for a deep learning course.
Regards !
@romeokienzler pip installing this package isn't leading to a system ml version upgrade in the watson studio.
https://github.com/IBM/coursera/blob/master/systemml-1.3.0-SNAPSHOT-python.tar.gz
Please refer to the following thread on coursera
Originally posted by @Scottyoun in #180
The CLAIMED CLI allows all CLAIMED components to be used from the command line in the form:
claimed <component_name:version> <parameters, ...>
claimed claimed-util-cos:0.3 access_key_id=xxx secret_access_key=xxx endpoint=https://s3.us-east.cloud-object-storage.appdomain.cloud bucket_name=era5-cropscape-zarr path=/ recursive=True operation=ls
This is implemented as an exemplar for the claimed-util-cos component but not yet generic.
Making this generic (the CLI can work with any CLAIMED component and make it available via CLI automatically) will allow to use and CLAIMED component from the terminal and in shell scripts
Hi,
I have problem downloading the file into my local jupyter notebooks by using pyspark. Can you please helpme.
Thanks,
I didn't understand this test in this part ###
By the logic is a field, or the code generated on the coursera site?
https://colab.research.google.com/drive/1XmH5pgmLDKqRNcl-4VkaK6wHDv5czwmM
Please help
the resource group xxxxxx is inactive with state: suspended open a support case and include the resource group name in the case details.
How to use it specifically?
Notebook: https://github.com/IBM/coursera/blob/master/coursera_ds/assignment2.1_spark2.3_python3.6.ipynb
The link: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame is dead.
Possibly the intent is to point to here: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/dataframe.html.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.