rapidsai / cudf Goto Github PK
View Code? Open in Web Editor NEWcuDF - GPU DataFrame Library
Home Page: https://docs.rapids.ai/api/cudf/stable/
License: Apache License 2.0
cuDF - GPU DataFrame Library
Home Page: https://docs.rapids.ai/api/cudf/stable/
License: Apache License 2.0
Getting the following error when I try import pygdf. Appreciative of any and all guidance!
error: symbol 'gdf_left_join_generic' not found in library 'libgdf.so': /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/../../libgdf.so: undefined symbol: gdf_left_join_generic
Would it be possible to add pymapd to notebook_py35.yml? It would take away one step from having to install pymapd after creating the pygdf environment.
Implement support for filter()
, using Numba to JIT the implementation on the fly.
So that dask_gdf doesn't need to use internal api: https://github.com/gpuopenanalytics/dask_gdf/pull/5/files#diff-60c51fb7fec020591436a7b48b60823cR246
I am wondering if it would be faster and more maintainable to handle serialization, data movement, and other interactions between this system and other platforms (e.g. MapD) in a C++ library that is not Python specific. This would add some complexity to the build system for this project, but that seems inevitable given the nature of the problem (that you'll end up needing to use nvcc at some point to create some support libraries).
Currently, you can only join on a single column by setting that column as the index on both dataframes.
Creating this issue to open discussion and track progress for the implementation of multi-column joins.
It appears that instead of casting to a common type, the bytes of other
in a binary operation (e.g. __add__(self, other)
are viewed in the dtype of self
:
In [54]: df = pd.DataFrame({'x': range(10), 'y': list(map(float, range(10)))})
In [55]: gdf = gd.DataFrame.from_pandas(df)
In [56]: gdf
Out[56]:
x y
0 0 0.0
1 1 1.0
2 2 2.0
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 8.0
9 9 9.0
In [57]: gdf.x + gdf.y # Int + Float
Out[57]:
0 0
1 4607182418800017409
2 4611686018427387906
3 4613937818241073155
4 4616189618054758404
5 4617315517961601029
6 4618441417868443654
7 4619567317775286279
8 4620693217682128904
9 4621256167635550217
In [58]: gdf.y + gdf.x # Float + Int
Out[58]:
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
5 5.0
6 6.0
7 7.0
8 8.0
9 9.0
In [59]: df.y + df.x.view('f8') # Comparison using pandas showing it's view instead of astype
Out[59]:
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
5 5.0
6 6.0
7 7.0
8 8.0
9 9.0
dtype: float64
In [60]: df.x + df.y.view('i8')
Out[60]:
0 0
1 4607182418800017409
2 4611686018427387906
3 4613937818241073155
4 4616189618054758404
5 4617315517961601029
6 4618441417868443654
7 4619567317775286279
8 4620693217682128904
9 4621256167635550217
dtype: int64
Based on gpuopenanalytics/libgdf/#9.
While there's no obligation to do so, it would be positive for the OSS community to mention that Arrow is an important component of how the GDF works.
For example, I have had people ask me about this article https://devblogs.nvidia.com/parallelforall/goai-open-gpu-accelerated-data-analytics/, and the lines
from pygdf.gpuarrow import GpuArrowReader
reader = GpuArrowReader(darr)
e.g. "is that the same Arrow?". It's mentioned nowhere in that article. In the blog post it says "Without shared GPU data structures provided by GOAI". This is a little bit misleading. It would be good to acknowledge that this isn't all an endogenous creation and there is a broader community at work on these problems (zero-copy columnar data interchange).
Wrapper around the sort functionality provided by rapidsai/libgdf#8.
Currently these only work for comparison operators
In [31]: df = pd.DataFrame({'x': range(10), 'y': list(map(float, range(10)))})
In [32]: gdf = gd.DataFrame.from_pandas(df)
In [33]: gdf.x > 0
Out[33]:
0 False
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True
In [34]: gdf.x + 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-7eff5c11afd0> in <module>()
----> 1 gdf.x + 0
TypeError: unsupported operand type(s) for +: 'Series' and 'int'
When I tried to build the documentation on CentOS 7, using the packaged version of python-sphinx, its toolchain didn't work well because sphinx-build distributed with RPM package was too old. (python-sphinx-1.1.3-11.el7.noarch.rpm
does not support -M
option.)
It is helpful to describe the minimum required version.
[kaigai@namazu docs]$ make html
Sphinx v1.1.3
Usage: /usr/bin/sphinx-build [options] sourcedir outdir [filenames...]
Options: -b <builder> -- builder to use; default is html
-a -- write all files; default is to only write new and changed files
-E -- don't use a saved environment, always read all files
-t <tag> -- include "only" blocks with <tag>
-d <path> -- path for the cached environment and doctree files
(default: outdir/.doctrees)
-c <path> -- path where configuration file (conf.py) is located
(default: same as sourcedir)
-C -- use no config file at all, only -D options
-D <setting=value> -- override a setting in configuration
-A <name=value> -- pass a value into the templates, for HTML builder
-n -- nit-picky mode, warn about all missing references
-N -- do not do colored output
-q -- no output on stdout, just warnings on stderr
-Q -- no output at all, not even warnings
-w <file> -- write warnings (and errors) to given file
-W -- turn warnings into errors
-P -- run Pdb on exception
Modi:
* without -a and without filenames, write new and changed files.
* with -a, write all files.
* with filenames, write these.
make: *** [html] Error 1
Documentation could be built with the latest version download and overwritten with pip
command, however, it was little bit inconvenient for CentOS/RHEL environment.
Thanks,
Depends on rapidsai/libgdf#7.
I am re-installing pygdf on a linux ubuntu 16.04 instance with 1 NVIDIA K80 (Driver version 384.111). I started to have a pygdf 'invalid shared memory key' error and decided to reinstall pygdf. I am now trying to re-create an environment from notebook_py35.yml. The steps to remove and re-create the conda environment are shown below:
Steps taken to reinstall:
Not sure why the it won't pass py.test anymore.
The downloading of packages while creating pygdf env has no warning or error
Depends on rapidsai/libgdf#6.
currently, it's filling it with nan and that requires casting to float64.
hello, I am trying to use pygdf for Big Data Analytics (billion data row). but when i try to convert pandas dataframe with million data of row an error occurred with message CUDA_ERROR_OUT_OF_MEMORY. it's seem that pygdf memory is limited with vram memory. Is there a solution to this problem?
And also everytime I display DataFrame it's also use vram memory. is there a way to clean vram memory?
Right now there is no way to add categories to a categorical column without defining a new column. This would be useful for UDFs, fillna, etc. where a user may want to split a categorical column based on another column or fill in nulls.
Example:
import pandas as pd
from pygdf.dataframe import DataFrame
pdf = pd.DataFrame({"cat_key": ["a", "b", None, "c", None], "value": [1, 2, 3, 4, 5]})
pdf['cat_key'] = pdf['cat_key'].astype("category")
gdf = DataFrame.from_pandas(pdf)
In the above the only thing we can fillna
to is "a", "b", or "c". There should be an add_categories function similar to pandas.Series.cat.add_categories.
wen i use
python setup.py install
python setup.py build
install and build pygdf.
then,run py.test
an OSError exist:
cannot load library 'libgdf.so': libgdf.so: cannot open shared object file: No such file or directory
One-hot is not respecting the mask and is considering invalid locations. This will lead to incorrect result if the invalid location contains value that matches one of the category by accident.
Current workaround:
masked_series.fillna(something).one_hot_encoding()
Most pygdf.Series
methods return a pygdf.Series
, but Series.unique_k
returns a numpy array. Is there a good reason for this, or can the data remain on the gpu?
In [15]: df = pd.DataFrame({'x': range(10), 'y': list(map(float, range(10)))})
In [16]: gdf = gd.DataFrame.from_pandas(df)
In [17]: x1 = gdf.x[:1]
In [18]: x2 = gdf.x[:2]
In [19]: x1
Out[19]:
0 0
In [20]: x2
Out[20]:
0 0
1 1
In [21]: x2 == 0
Out[21]:
0 True
1 False
In [22]: x1 == 0
<long traceback>
...
~/miniconda/envs/gdf/lib/python3.5/site-packages/numba/cuda/cudadrv/driver.py in host_pointer(obj)
1505
1506 forcewritable = isinstance(obj, np.void)
-> 1507 return mviewbuf.memoryview_get_buffer(obj, forcewritable)
1508
1509
TypeError: expected a writable bytes-like object
In [23]: x1 == x1 # Does work with series
Out[23]:
0 True
A user reported that one_hot_encoding with float64 uses too much memory. It was observed that float32 uses the correct amount of memory.
I just did a fresh conda
install (OS X) and tried to install pygdf
, but hit ResolvePackageNotFound
on libgdf
. Pointers / any recommendations on things to try / report back?
Thanks!
Leos-MBP:pygdf lmeyerov$ conda --version
conda 4.3.30
Leos-MBP:pygdf lmeyerov$ conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
Fetching package metadata .................
ResolvePackageNotFound:
- libgdf 0.1.0a2.dev hb999fd6_2
Reproducible use case:
import pandas as pd
from pygdf.dataframe import DataFrame
pdf1 = pd.DataFrame({"join_col": ["a", "b", "c", "d", "e"], "data_col_left": [1, 2, 3, 4, 5]})
pdf2 = pd.DataFrame({"join_col": ["c", "e", "f"], "data_col_right": [6, 7, 8]})
pdf1["join_col"] = pdf1["join_col"].astype("category")
pdf2["join_col"] = pdf2["join_col"].astype("category")
gdf1 = DataFrame.from_pandas(pdf1)
gdf2 = DataFrame.from_pandas(pdf2)
gdf1 = gdf1.set_index("join_col")
gdf2 = gdf2.set_index("join_col")
join_gdf = gdf1.join(gdf2)
join_pdf = pdf1.join(pdf2)
GDF incorrect output:
data_col_left data_col_right
a 1 6
b 2 7
c 3 8
d 4 -1
e 5 -1
PDF correct output:
data_col_left data_col_right
join_col
a 1 NaN
b 2 NaN
c 3 6.0
d 4 NaN
e 5 7.0
Much of the current functionality in libgdf should be delegated to the C library libgdf. This will allow this functionality to be reused in other projects. PyGDF can then just provide a python wrapper around those functions.
In [1]: %load test.py
In [2]: # %load test.py
...: import pygdf as gd
...: import pandas as pd
...: import dask_gdf as dgd
...: import numpy as np
...:
...: df = pd.DataFrame({'x': np.random.randint(0, 5, size=10000),
...: 'y': np.random.normal(size=10000)})
...:
...: gdf = gd.DataFrame.from_pandas(df)
...:
...: works = (dgd.from_pygdf(gdf, npartitions=2)
...: .query('x > 2'))
...:
...: fails = works.to_dask_dataframe()
...:
In [3]: works.head()
Out[3]:
x y
2 4 -1.73270757966
4 3 -0.308836664379
6 3 -0.241514128025
8 3 -0.348121287014
15 3 0.377489009207
In [4]: fails.head()
<long traceback>
...
~/miniconda/envs/gdf/lib/python3.5/site-packages/numba/cuda/cudadrv/driver.py in _check_error(self, fname, retcode)
321 _logger.critical(msg, _getpid(), self.pid)
322 raise CudaDriverError("CUDA initialized before forking")
--> 323 raise CudaAPIError(retcode, msg)
324
325 def get_device(self, devnum=0):
CudaAPIError: [201] Call to cuMemcpyDtoH results in CUDA_ERROR_INVALID_CONTEXT
In [5]: import dask
In [6]: fails.compute(get=dask.get).head() # Using no threads
Out[6]:
x y
2 4 -1.732708
4 3 -0.308837
6 3 -0.241514
8 3 -0.348121
15 3 0.377489
The only difference between works
and fails
is that fails
calls to_pandas
in parallel, resulting in pulling data off the gpu and back onto the host.
What kind of groupby functionality is actually useful from the Python API? (This will almost certainly require Numba to JIT some operations.)
Hi,
I am trying to do some basic add and multiply operations on dataframe columns but it is throwing GDF_CUDA_ERROR everytime.
Below is the code and stacktrace.
from pygdf.dataframe import DataFrame
import numpy as np
size = 100000000
df = DataFrame([('a', np.random.random(size)),('b', np.random.random(size))])
print('some rows',df.loc[:5])
df['a'] = df['a'] + df['b']
print('some rows',df.loc[:5])
Traceback (most recent call last):
File "test.py", line 9, in
df['a'] = df['a'] + df['b']
File "/home/test_proj/pygdf/pygdf/series.py", line 234, in add
return self._binaryop(other, 'add')
File "/home/test_proj/pygdf/pygdf/series.py", line 221, in _binaryop
outcol = self._column.binary_operator(fn, other._column)
File "/home/test_proj/pygdf/pygdf/numerical.py", line 65, in binary_operator
out_dtype=self.dtype)
File "/home/test_proj/pygdf/pygdf/numerical.py", line 180, in numeric_column_binop
null_count = _gdf.apply_binaryop(op, lhs, rhs, out)
File "/home/test_proj/pygdf/pygdf/_gdf.py", line 68, in apply_binaryop
binop(*args)
File "/root/miniconda3/envs/pygdf_dev/lib/python3.5/site-packages/libgdf_cffi/wrapper.py", line 28, in wrap
raise GDFError(errcode, errname)
libgdf_cffi.wrapper.GDFError: GDF_CUDA_ERROR
While working to implement things like concat
, I've noticed a few (potential) issues with the current design that may bite us in the future. I'll lay them out here. Feel free to ignore, I'm very new to working with this codebase and may be missing reasons for the existing layout.
A DataFrame contains:
A Series contains:
An index could be many things, but the generic one just wraps a series.
A numeric implementation contains:
A categorical implementation contains:
This structure results in a few odd things:
A DataFrame contains:
A series contains:
A data object contains:
An Index is basically the same as before, but contains a data object instead of a series.
This new layout is nicer because:
def __getitem__(self, key):
return Series(self._columns[key], index=self.index)
concat
becomes easier, as all necessary state is located on the same object.Potential Example Class Hierarchy
class Data(object):
pass
class NumericData(Data):
def __init__(self, buffer, mask, dtype):
pass
class CategoricalData(Data):
def __init__(self, codes : NumericData, categories, ordered=False):
pass
class Series(object):
def __init__(self, data, index=None):
pass
class DataFrame(object):
def __init__(self, cols_and_data, index=None):
pass
I don't think rearranging this would be that tricky. It basically amounts to moving the buffer storage into the implementation classes, and fixing existing code as necessary.
You say to
conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
, but it seems to produce a conflict (Mac OSX) since in the name
field in the yaml file there's another string.
Using
conda env create --file conda_environments/testing_py35.yml
just goes fine.
(Will need option to convert categorical column during export to fixed width string or integer.)
The tests in https://github.com/gpuopenanalytics/pygdf/blob/master/pygdf/tests/test_gpu_arrow_parser.py use hard-coded binary data. This makes it more difficult to programmatically extend these tests.
I opened https://issues.apache.org/jira/browse/ARROW-1406 to ensure that this is easy to do and documented. So you will be able to obtain a memoryview containing the serialized schema and record batch in the form that you have currently.
If you currently want to append GDFs you can create an empty GDF and define each column by appending the columns of the dataframes.
Example:
import pandas as pd
from pygdf.dataframe import DataFrame
pdf1 = pd.DataFrame({"col1": [1,2,3], "col2": [11,12,13]})
pdf2 = pd.DataFrame({"col1": [4,5,6], "col2": [14,15,16]})
pdf3 = pd.DataFrame({"col1": [7,8,9], "col2": [17,18,19]})
gdf1 = DataFrame.from_pandas(pdf1)
gdf2 = DataFrame.from_pandas(pdf2)
gdf3 = DataFrame.from_pandas(pdf3)
append_gdfs = [gdf1, gdf2, gdf3]
newgdf = DataFrame()
for col in gdf1.columns:
new_col = None
for gdf in append_gdfs:
if new_col is None:
new_col = gdf[col]
else:
new_col = new_col.append(gdf[col])
newgdf[col] = new_col
There should be append and concat functions similar to pandas.DataFrame.append and pandas.concat for appending dataframe(s).
More docs for #96
Minimal case to reproduce:
import pandas as pd
from pygdf.dataframe import DataFrame
pdf1 = pd.DataFrame({"join_col": [1,2,3,4,5], "data_col_left": [10, 11, 12, 13, 14]})
pdf2 = pd.DataFrame({"join_col": [1,2], "data_col_right1": [15, 16]})
pdf3 = pd.DataFrame({"join_col": [1,2,3,4,5], "data_col_right2": [17, 18, 19, 20, 21]})
gdf1 = DataFrame.from_pandas(pdf1)
gdf2 = DataFrame.from_pandas(pdf2)
gdf3 = DataFrame.from_pandas(pdf3)
gdf1 = gdf1.set_index("join_col")
gdf2 = gdf2.set_index("join_col")
gdf3 = gdf3.set_index("join_col")
test = gdf1.join(gdf2).join(gdf3)
print(test.head().to_pandas()
Returns:
data_col_left data_col_right1 data_col_right2
1 10 15 17
2 11 16 18
3 12 12 19
4 13 13 20
5 14 14 21
Instead of the expected:
data_col_left data_col_right1 data_col_right2
1 10 15 17
2 11 16 18
3 12 NaN 19
4 13 NaN 20
5 14 NaN 21
Executing more code and doing other operations ends up returning arbitrary values for data_col_right1
3, 4, and 5.
Support for apply()'ing a function to each row of a GDF. Will use Numba JIT to generate code.
conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
with wrong error:
quant@quantaxis:~$ conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
conda: command not found
quant@quantaxis:~$
Minimal case to reproduce:
import pandas as pd
from pygdf.dataframe import DataFrame
pdf = pd.DataFrame({"key": [1,1,1,2,2,2,3,3,3,4,4,4], "value": [1,2,3,4,5,6,7,8,9,10,11,12]})
gdf = DataFrame.from_pandas(pdf)
gdf['newcol'] = 100
pdf['newcol'] = 100
Incorrect GDF result:
len(gdf['newcol'])
1
Correct PDF result:
len(pdf['newcol'])
12
This works fine if the frame is sliced from the start, but fails if the slice is in the middle:
In [29]: import pandas as pd, pygdf as gd
In [30]: df = pd.DataFrame({'x': range(100), 'y': list(map(float, range(100)))})
In [31]: gdf = gd.DataFrame.from_pandas(df)
In [32]: gdf[:10].nlargest(5, 'x')
Out[32]:
x y
9 9 9.0
8 8 8.0
7 7 7.0
6 6 6.0
5 5 5.0
In [33]: gdf[10:20].nlargest(5, 'x')
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-33-51dda14cf36d> in <module>()
----> 1 gdf[10:20].nlargest(5, 'x')
~/Code/pygdf/pygdf/dataframe.py in nlargest(self, n, columns, keep)
439 * Only a single column is supported in *columns*
440 """
--> 441 return self._n_largest_or_smallest('nlargest', n, columns, keep)
442
443 def nsmallest(self, n, columns, keep='first'):
~/Code/pygdf/pygdf/dataframe.py in _n_largest_or_smallest(self, method, n, columns, keep)
465 df[k] = sorted_series
466 else:
--> 467 df[k] = self[k].take(df.index.gpu_values)
468 return df
469
~/Code/pygdf/pygdf/dataframe.py in __setitem__(self, name, col)
144 self._cols[name] = self._prepare_series_for_add(col)
145 else:
--> 146 self.add_column(name, col)
147
148 def __delitem__(self, name):
~/Code/pygdf/pygdf/dataframe.py in add_column(self, name, data)
304 if name in self._cols:
305 raise NameError('duplicated column name {!r}'.format(name))
--> 306 series = self._prepare_series_for_add(data)
307 self._cols[name] = series
308
~/Code/pygdf/pygdf/dataframe.py in _prepare_series_for_add(self, col)
290 return series
291 else:
--> 292 raise NotImplementedError("join needed")
293
294 def add_column(self, name, data):
NotImplementedError: join needed
Currently this only happens if the dispatch hits categorical first:
In [21]: df = pd.DataFrame({'x': range(10), 'y': list(map(float, range(10))), 'z': list('abcde')*2})
In [22]: df.z = df.z.astype('category')
In [23]: gdf = gd.DataFrame.from_pandas(df)
In [24]: gdf.x + gdf.z
Out[24]:
0 144396680282898688
1 1028
2 7
3 8
4 9
5 10
6 11
7 12
8 13
9 14
In [25]: gdf.z + gdf.x
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-01cf7f26eb92> in <module>()
----> 1 gdf.z + gdf.x
~/Code/pygdf/pygdf/series.py in __add__(self, other)
356
357 def __add__(self, other):
--> 358 return self._binaryop(other, 'add')
359
360 def __sub__(self, other):
~/Code/pygdf/pygdf/series.py in _binaryop(self, other, fn)
345 if not isinstance(other, Series):
346 return NotImplemented
--> 347 return self._impl.binary_operator(fn, self, other)
348
349 def _unaryop(self, fn):
~/Code/pygdf/pygdf/categorical.py in binary_operator(self, binop, lhs, rhs)
73 def binary_operator(self, binop, lhs, rhs):
74 msg = 'Categorical cannot perform the operation: {}'.format(binop)
---> 75 raise TypeError(msg)
76
77 def unary_operator(self, unaryop, series):
TypeError: Categorical cannot perform the operation: add
If using a categorical column and you try to use .query()
or .fillna()
you need to use the categorical code rather than the value of the categorical.
Example:
import pandas as pd
from pygdf.dataframe import DataFrame
pdf = pd.DataFrame({"key": ["a", "b", "c", "d", "e", None, "f", None, "null_key"], "value": [1, 2, 3, 4, 5, 6, 7, 8, 9]})
pdf['key'] = pdf['key'].astype('category')
gdf = DataFrame.from_pandas(pdf)
These work by giving the index of null_key
in gdf['key'].cat.categories
:
gdf['key'] = gdf['key'].fillna(6)
gdf.query('key == 6').head().to_pandas()
These do not work and fail with "Failed at nopython (nopython frontend)". I assume this is due to numba being unable to compile the function with a string type?
gdf['key'] = gdf['key'].fillna("null_key")
gdf.query('key == "null_key"').head().to_pandas()
Excited to see this new org created. I am interested to see if Apache Arrow (i.e. contiguous columnar data, validity bitmap for nulls) is the appropriate data model for data on the GPU, and if we can collaborate on some aspects of the code. It seems that CUDA 7 now supports C++11, so in theory we could compile the Arrow C++ libraries with nvcc and provide necessary APIs to enable Numba to interact with the raw memory buffers. This might simplify IPC with GPU main memory (record batch loading and unloading) and make less work for you here. I have an NVIDIA GPU on my home desktop, so I could help with testing.
Add support for label encoding
for both series and dataframe operations.
I use Linux at work but at home I have windows and would like to be able to run it on my main machine with conda. Currently when I run:
conda env create --name pygdf_dev --file conda_environments/testing_py35.yml
I am seeing this error:
NoPackagesFoundError: Package missing in current win-64 channels:
- libgdf_cffi >=0.1.0a1.dev
I am hoping that this could be easily added to the win-64 channels?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.