Comments (7)
Same question.
It seem sparselr in the project also fetch all feature value when it perform a getRow op.
I wonder how many rows does PSModel support?
How about I implement it with one rows as one feature, like:
val model = PSModel[DenseDoubleVector](FTRL_WEIGHT, feaNum, 1)
and
val featrueValues = model.getRows(indexArr)
Will it be efficient?
And, What is the difference between row and column for the matrix storage? what is their capacity?
from angel.
@piaoxijun Great idea. It sounds reasonable to use multiple rows for fetching parameters flexibly if row operator is only supported.
from angel.
@facaiy thanks for your issue.
-
Dense data:
For gradient descent LR, we use DenseDoubleVector to represent data and model, each vector is a double array, LR model is only a long array, even super large model doesn't cost much memmory, so dense LR can support super-large dimension data.
Besedes, in pracitve, many elements of LR model is not zero. We have impleted densefloat datatype, you can chage LR datatype to densefloat to save more memmory. -
Sparse data:
For sparse datatype, we use map to store keys and values, so it cost more memmory whe the data and model is not very sparse, and cost even much more time to compute. -
Optimizer:
For LR, we implemented 2 optimizers, gradient descent (as you see) and ADMM optimizer:SparseLogisticRegression Line 65
val (history, z) = ADMM.runADMM(train, ...
in ADMM optimizer, we use LBFGS. Besides, LBFGS is not a memmory friendly optimization algorithm.
We also implemented LR on Spark on Angel, with gradient descent、OWLQN、LBFGS and ADMM.
-
LR and SparseLR:
In LR, we use dense datatype and gradient descent, in sparseLR, we use sparse data type and ADMM. There's seperation for small and huge model, just a kind of implementation. We have implemented DenseDouble\DenseInt\DenseFloat\SparseDouble\SparseFloat ... data type, you can choose your data type and operatin type according to your data. You can edit LR's code to chage data type to sparse.
from angel.
- You can define your PSModel sparse double datatye to pull only none zero elements.
- Yor can use psFunc to diy the element you pull.
- About @piaoxijun 's idea, it's a great idea, but not efficient. For we sotre matrix in the way of lines, if you define too many lines in a PSModel, it will cost huge memmory.
from angel.
@TAAAN Thanks for your quick reply.
-
Perhaps you mistake my question. I agree that parameters in LR might be dense, while training data is not. In practice, high-dimension data is always very sparse. Hence, for non-zero bits of data, it is more efficient to fetch the corresponding bits (about 10~15% or less of total bits) of
lrModel.weight
, and then calculate gradient, update parameter. -
I agree that multiple rows might be not efficient, which depends on partition strategy. If by default, ps parameter is always partitioned by column, all rows of 1-dim vector will be stored at single machine. Worst case.
from angel.
- For Model,you can set it dense or sparse
- For training data, it's sparse
val batchGD = GradientDescent.miniBatchGD(trainData, lrModel.weight, lr, logLoss,
batchSize, batchNum)
The feature size of LabeledData in Worker(TrainData), has no direct relationship with Model in the PSServer
from angel.
@TAAAN Thanks for the detailed explanation. However, my point is not about model or data indeed, but the cost of communication between worker and ps. Perhaps we could have further discussion in the future. Anyway, thank you all the same, TAAAN.
from angel.
Related Issues (20)
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Xinhong Ma-Week2
- write xxx meta to file failed HOT 1
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Zihan Li-Week3&4
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Yinan Zhang-Week1
- 2022Tencent Rhino-bird Open-source Training Program—Angel-YinHan Zhang-Week3&4
- yarn-client模式下运行 cluster.LINEExample 用例,checkpoint 步骤报错 HOT 9
- 2022Tencent Rhino-bird Open-source Training Program—Angel-YinHan Zhang-Week5
- AngelException: init AngelPSContext fail HOT 3
- Possible Word2Vec optimization HOT 2
- LINEModel训练时,每个epoch只训练一个batch? HOT 3
- 2022Tencent Rhino-bird Open-source Training Program—Angel-YinHan Zhang-Week7
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Zihan Li-Week5&6
- 2022Tencent Rhino-bird Open-source Training Program—Angel-Zihan Li-Week7&8
- 2022Tencent Rhino-bird Open-source Training Program—Angel-ZhangYinHan-Week8
- 2022Tencent Rhino-bird Open-source Training Program—YuFei Zhang
- Broken link Angel homepage in Linux FD HOT 1
- 使用doker的方式安装失败 HOT 2
- java: package com.tencent.angel.protobuf.generated not exist
- 编译问题,Maven build error HOT 2
- spark-on-angel-graph 编译报错 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from angel.