bodywork-ml / bodywork-core Goto Github PK
View Code? Open in Web Editor NEWML pipeline orchestration and model deployments on Kubernetes.
Home Page: https://bodywork.readthedocs.io/en/latest/
License: GNU Affero General Public License v3.0
ML pipeline orchestration and model deployments on Kubernetes.
Home Page: https://bodywork.readthedocs.io/en/latest/
License: GNU Affero General Public License v3.0
Change all Optional parameter declarations from namespace: Optional[str] = None,
to namespace: str = None
. These are equivalent and recognised by Mypy with the latter declaration being more concise thus improving the readability of the code.
Description
As a machine learning engineer, I would like to be able to use public and private repos hosted on GitLab with Bodywork, because I cannot use GitHub at my place of work and this prevents me from adopting Bodywork.
Tasks
bodywork.git
module to work with GitLab repos, either public or private (via SSH).SSH_GITLAB_KEY_ENV_VAR
will be injected as an environment variable in bodywork.k8s.batch_jobs.configure_batch_stage
and bodywork.k8s.service_deployments.configure_service_stage_deployment
, or consider refactoring SSH_GITLAB_KEY_ENV_VAR
into SSH_GIT_KEY_ENV_VAR
, so that it can be used with any remote Git repository host.Tasks
Story
As a Bodywork Developer, I would like to be able to manage Ingress resources, to enable high-level Ingress functionality for ML engineers.
Tasks
k8s.service_deployments.create_ingress_to_cluster_service
function that will create an Ingress resource for a ClusterIP Service.k8s.service_deployments.delete_ingress_to_cluster_service
that will that will delete an Ingress resource for a ClusterIP Service.Notes
Description
The bodywork.workflow_execution.run_workflow
function should have repo cloning and config parsing refactored into a separate function, such that run_workflow
is responsible solely for managing workflow execution (and is called by the new function). This will facilitate future development - e.g. Bodywork REST API server.
Tasks
bodywork.workflow_execution.run_workflow
purely for workflow management.bodywork.cli.cli.workflow
to use the new function."As an ML Engineer I want to be able to delete a deployment"
Implement a CLI command to delete a namespace (effectively kubectl delete ns project-name
)
N.B. We need to update the docs to make it clear that if a user has specified a specific namespace to deploy to in the Bodywork YAML then this will be the name they have to give to the CLI or they will need to specify this as a parameter (would make the usage more consistent).
"As an ML Engineer I want to see the deployment history for a project"
Need to refine what this is exactly before going ahead with this ticket.
Is this really required? What useful information does this give the user as it is a list of undated deployment names for a project?
"As an ML Engineer I would like to update an existing Cronjob"
Create the CLI command option to update an existing cronjob i.e. bodywork cronjob update --ns project-name --name cronjob-name --schedule * * * 5
First of all great project folks.
Secondly, its not a convention to use print statements, instead logging is used, I am just curious to know why you folks opted for print.
I can make a PR and introduce logging (using structlog) which will help this project to grow and make it easy for people to use API instead of using CLI.
Please let me know :)
Story
As a ML engineer, I would like to be able to write stages that reference files relative to the working directory containing the executable Python module for the stage, so that working with files paths is as easy as it would be when developing locally.
Task
bodywork.stage.run_stage
use the optional cwd
argument in subprocess.run
to change the working directory; and,As a Machine Learning Engineer, I would like the creation and specification of k8s namespaces to be handled automatically by Bodywork, so that I do not have to know about k8s namespaces or what to do with them (I find this all very intimidating).
Tasks
bodywork.yaml
- e.g. in a project.namespace
field - but if this isn't provided, it should default to project.name
.bodywork-deployment-jobs
namespace, which will be created automatically when the cluster is configured for Bodywork. This means that the --namespace
flag can be dropped for all bodywork deployment
commands.--name
flag and instead use one based on the git-repo URL and the current timestamp, so that the resulting command looks like bodywork deployment create MY_REPO MY_BRANCH
.bodywork.workflow_execution.run_workflow
, so that it doesn't take namespace
as an argument and instead creates the namespace, if it needs to, based on the config parameters. If the workflow does not deploy any services, then the namespace should be deleted when the workflow has successfully completed.bodywork.k8s.workflow_jobs
and bodywork.cli.cli.workflow
, to reflect the fact that namespace
no longer needs to be thrown around as an argument.As a Machine Learning Engineer, I would like all k8s cluster setup to be done for me, so that I don't need to understand k8s before deploy my pipelines.
Tasks
bodywork.cli.configure_cluster
module.setup_cluster_for_bodywork
function that will perform all the necessary steps to configure k8s for use with bodywork - e.g.,
bodywork-deployment-jobs
namespace, into which all deployment (workflow) jobs will be run.bodywork-deployment-jobs
, so that workflow-jobs will be able to create resources. This could be done with bodywork.k8s.auth.setup_workflow_service_account, but it may have to be modified with the option to grant permission for namespaces to be created, which will be required by workflow-jobs in bodywork-deployment-jobs
(as we want to automate namespace creation).Currently, a secret in group dev
with name api-password
will be given a k8s resource name of dev-api-password
, which makes it hard to figure-out what group it belongs to, if you've forgotten what groups are in existence and you have other k8s secrets floating around. Maybe, this would be easier if the secret name would be something like dev--api-password
?
Additionally, it would be useful to have a command along the lines of bodywork secret display groups
, that lists the secrets groups in existence by looking at the labels of all secrets.
I am exploring Bodywork for Last few days.
We have multiple models running in Production. I want to configure multiple model workflow in same git repo without changing branch. Can you please help me ?
When trying to deploy a private repo via SSH using,
$ bodywork deployment create \
--namespace=arc-cpre \
--name=d1 \
[email protected]/everlution/arc-cpre.git \
--git-repo-branch=rest-api-definition \
-L
I got the following unhandled exception:
testing with local workflow-controller - retries are inactive
namespace=arc-cpre is setup for use by Bodywork
2021-06-22 19:48:29,882 - INFO - workflow_execution.run_workflow - attempting to run workflow for [email protected]/everlution/arc-cpre.git on branch=rest-api-definition in kubernetes namespace=arc-cpre
2021-06-22 19:48:29,939 - ERROR - workflow_execution.run_workflow - failed to execute workflow for rest-api-definition branch of project repository at [email protected]/everlution/arc-cpre.git: Unable to setup SSH for Github and you are trying to connect via SSH: 'failed to setup SSH for github.com - cannot find BODYWORK_GIT_SSH_PRIVATE_KEY environment variable'
Traceback (most recent call last):
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/git.py", line 63, in download_project_code_from_repo
setup_ssh_for_git_host(hostname)
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/git.py", line 137, in setup_ssh_for_git_host
raise KeyError(msg)
KeyError: 'failed to setup SSH for github.com - cannot find BODYWORK_GIT_SSH_PRIVATE_KEY environment variable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/workflow_execution.py", line 81, in run_workflow
download_project_code_from_repo(repo_url, repo_branch, cloned_repo_dir)
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/git.py", line 74, in download_project_code_from_repo
raise BodyworkGitError(msg)
bodywork.exceptions.BodyworkGitError: Unable to setup SSH for Github and you are trying to connect via SSH: 'failed to setup SSH for github.com - cannot find BODYWORK_GIT_SSH_PRIVATE_KEY environment variable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/workflow_execution.py", line 156, in run_workflow
if config.project.run_on_failure and type(e) not in [
UnboundLocalError: local variable 'config' referenced before assignment
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/workflow_execution.py", line 167, in run_workflow
f"Error executing failure stage: {config.project.run_on_failure}"
UnboundLocalError: local variable 'config' referenced before assignment
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/bin/bodywork", line 8, in <module>
sys.exit(cli())
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/cli/cli.py", line 302, in cli
args.func(args)
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/cli/cli.py", line 324, in wrapper
func(*args, **kwargs)
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/cli/cli.py", line 393, in deployment
workflow(pass_through_args)
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/cli/cli.py", line 324, in wrapper
func(*args, **kwargs)
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/cli/cli.py", line 562, in workflow
run_workflow(
File "/Users/alexioannides/Dropbox/bodywork_client_repos/arc-CPRE/.venv/lib/python3.8/site-packages/bodywork/workflow_execution.py", line 176, in run_workflow
if config is not None and config.project.usage_stats:
UnboundLocalError: local variable 'config' referenced before assignment
"As an ML Engineer I would like to see all the deployed services (on the cluster)".
Extend the existing 'Services' CLI command with the option to view all the services on the cluster which is equivalent to kubectl get services --all-namespaces
Bodywork CLI command will be bodywork service display
"As a ML engineering manager, I want to ensure that projects from independent repos cannot be deployed into the same namespace, so that separate teams cannot interfere with one another's work (e.g accidentally override or delete another team's services)."
Tasks
bodywork.k8s.utils.make_valid_k8s_name
.bodywork service display
isn't reliant on namespace
as an argument."As a ML engineer, I would like all redundant services to be automatically deleted at the end of a successful workflow execution."
At the end of bodywork.workflow.run_workflow
, execute the following:
bodywork service delete
command from the CLI.This now means that Bodywork is doing 100% GitOps.
There is currently no command for,
$ bodywork deployment delete \
--ns NAMESPACE \
--name DEPLOYMENT_NAME
As it was never implemented. It was our intention to rely on the Time To Live (TTL) settings and controller to clean-up finished jobs after they have completed (successfully, or otherwise).
Over time, I've noticed that this function hasn't been working. After some digging, it transpires that this features of the k8s hasn't been enabled on AWS EKS as it's only in beta
(where it has been for a few years). So, some cluster may not be able to rely on it, so we should provide a simple method for deleting these deployment jobs...
Tasks
delete_workflow_job
method in the bodywork.cli.workflow_jobs
module; and,bodywork deployment delete
command in the bodywork.cli.cli
module.The bodywork.k8s.cronjobs.configure_cronjob
function accepts successful_jobs_history_limit
and failed_jobs_history_limit
arguments, for configuring how many historical runs to keep.
The bodywork.cli.create_cronjob_in_namespace
function, however, does not support these arguments (and likewise in bodywork.cli.cli.py
), which is an oversight that needs to be corrected.
Ensure that the docs are updated to reflect any changes.
Story
As a ML engineer, I would like service ingress to be setup (or updated) as required, so that I can expose my services to the world beyond the k8s cluster.
Tasks
workflow.py
to create Ingress resources alongside Service resources and update changes accordingly.WARNING
is logged when ingress is configured, but there is no NGINX ingress controller deployed in the cluster.Hi guys,
I set up my own project to test Bodywork (started with pipelines). Now I have three simple stages:
Step no. 1 downloads data and saves it to a file which then should be used in step no. 3 i.e. training.
Unfortunately, step 3 cannot find the data, me too. Logs below.
Is it because each stage is executed in its own container and everything gets wiped out after stage execution? Is there any recommended way of sharing data between stages so that I don't have to upload data to S3 and then download it back?
Code: https://github.com/mtszkw/turbo_waffle
PS. When executed manually (python), step 1 leaves data_files dir, so let's assume that script works as intended.
Thanks in advance!
---- pod logs for turbo-waffle--3-train-random-forest
2021-07-12 11:58:16,798 - INFO - stage_execution.run_stage - attempting to run stage=3_train_random_forest from main branch of repo at https://github.com/mtszkw/turbo_waffle
2021-07-12 11:58:16,801 - WARNING - git.download_project_code_from_repo - Not configured for use with private GitHub repos
2021-07-12 11:58:28.952 | INFO | main:_load_preprocessed_data:9 - Reading training data from /tmp/data_files/breast_cancer_data.npy and /tmp/data_files/breast_cancer_target.npy...
Traceback (most recent call last):
File "train_random_forest.py", line 32, in
X_train, y_train = _load_preprocessed_data(X_train_full_path, y_train_full_path)
File "train_random_forest.py", line 10, in _load_preprocessed_data
X_train = np.load(X_train_full_path)
File "/usr/local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 417, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/data_files/breast_cancer_data.npy'
2021-07-12 11:58:29,025 - ERROR - stage_execution.run_stage - Stage 3_train_random_forest failed - CalledProcessError(1, ['python', 'train_random_forest.py', '/tmp/data_files/breast_cancer_data.npy', '/tmp/data_files/breast_cancer_target.npy'])
The --retries
flag for the bodywork cronjob
command has not been documented - e.g.,
$ bodywork cronjob create \
--namespace=MY_NAMESPACE \
--name=MY_PIPELINE \
--git-repo-url=https://github.com/MY_USERNAME/MY_REPO \
--git-repo-branch=BRANCH \
--retries=N
This came up during Discussion #18.
Description
As a Bodywork Product Manager, I would like to know how Users are using Bodywork, so I can accurately guide Bodywork's roadmap.
Tasks
logging.usage_stats
- that enables Users to opt-out of usage tracking.bodywork.constants
module.bodywork.workflow_execution.run_workflow
to ping the tracking server when it is called, failing gracefully.It makes no sense for the workflow controller to be run anywhere other than locally - e.g. even if running on a CI/CD box, it needs to run locally. This should become the default option and we should remove the asynchronous option from the CLI, but not from the internal k8s
sub-module, we we'll need this for the REST API.
As a Machine Learning Engineer I would like to use the same set of secrets across multiple workflows/deployments and not deal with namespaces.
Tasks
Currently Secrets are created for each individual namespace and instead they should be arranged into groups and be namespace agnostic.
bodywork secret
commands, replace namespace argument with a name that represents the name of the secrets group this secret is in e.g. --group.bodywork-deployment-jobs
namespace.secrets_group
item to project section of bodywork.yamlsecrets_group
is specified in the config.N.B. Remember to remove the namespace setup and amend the Secret creation in test_workflow_and_service_management_end_to_end_from_cli
As a Machine Learning Engineer I would like to use Bitbucket and Azure DevOps to host my code.
Currently Bodywork does not support Bitbucket or Azure DevOps hosted repositories. Extend git.py to allow connection to these repositories and then test this works by running Bodywork against a repo hosted on these sites.
Story
As a ML engineer, I would like to be able to trigger workflows on an ad hoc basis, without having to have the workflow-controller run locally, so that my machines (e.g. CICD runners) do not have to dedicate non-cluster resources for running workflows.
Tasks
bodywork.k8s.jobs.configure_workflow_job
function for defining a workflow execution job, using bodywork.k8s.cronjobs.configure_cronjob
as a reference.bodywork.cli.deploy
module that contains a function for triggering workflow job.bodywork deploy
command in bodywork.cli.cli
for providing this functionality from the CLI.Currently the Bodywork k8s package 'Get' methods return dictionaries. Now we have started to return objects e.g. display_secrets
these should instead be returned as a list of objects because this is best practice for the retrieval of objects in the data layer.
However, at this moment in time Secrets is the only object that is returned from the data layer with the rest of the objects being dictionaries returning a specific value, therefore it should not be done now to maintain consistency in the data layer. These at some point will become objects too as we expand the data that is returned for each of these items. When this occurs we should refactor all these methods to return a list.
"As an ML Engineer I would like to update an existing Secret"
Extend the Secrets CLI command with a sub command to update a secret i.e. bodywork secret update --ns group-name --name mysecret --data USERNAME=marios PASSWORD=xyz
Some parameter names for CLI commands are unnecessarily long, shorten these to make the UX better. Also make sure they are consistent across commands.
e.g. : git-repo-url
-> git-url
The deployment create
command is equivalent to the workflow
command therefore the workflow
command should be removed to keep things clean and simple.
This will involve updating the bodywork
command used to run the workflows in bodywork.k8s.workflow_jobs.configure_workflow_job
.
"As an ML Engineer I would like to delete a whole group of secrets"
Expand the CLI delete secret
function so that it is possible to delete a whole group of secrets by just providing the group name. This will involve amending the existing 'delete secret' methods in both k8s.secrets.py
& cli.secrets.py
.
Story
As a ML engineer, I would like to be able to see ingress information and use it to manage ingress, so that I have control over how my services are exposed beyond the k8s cluster.
Tasks
cli.service_deployments.display_service_deployments
to include ingress information.cli.service_deployments.delete_service_deployment_in_namespace
to handle ingress alongside deployments.Hi,
there is the possibility to use the CLI to generate yaml files instead of interacting directly with K8s installation trough kubeconfig?
Since I usually apply GitOps pipelines to my MLops clusters I would like to test it putting deployments in a helm chart or kustomize yaml.
Afaik I didn't find any option to achieve this.
Ty.
Description
As a Machine Learning Engineer, I would like to be able to request GPU resources for stages in a workflow, so that tensor-based machine learning (e.g. PyTorch), can benefit from hardware based acceleration for training and serving.
Tasks
gpu_request
config parameter for all stage types.bodywork.k8s.batch_jobs
and bodywork.k8s.service_deployments
to request GPU resources.Resources
Description
As a Machine Learning Engineer, I would like for the number of replicas standing behind my services, to scale automatically, based on CPU utilisation, so that I do not have to frequently monitor CPU utilisation and manually change the number of replicas.
Tasks
scale_out_replicas
parameter, that will represent the number of replicas above those specified in replicas
, that Kuberentes can scale the deployment up to.bodywork.k8s.service_deployments
need to be extended to enable CRUD operations for HorizontalPodAutoscaler
resources.scale_out_replicas
is present, then the bodywork.config.StageConfig
object should flag to bodywork.workflow_execution.run_workflow
, that a HorizontalPodAutoscaler
resource should be created.Resources
15.1.2
and listing 15.2
of 'Kubernetes in Action'.Story
As a Bodywork Developer, I would like to be able to run Bodywork's integrations tests locally, using Minikube, so that I can easily test Bodywork on different versions of Kubernetes, to that on the official development cluster (AWS EKS).
Background
Currently, Bodywork's integration tests expect the ingress load balancer URL to be in the location where they are installed on our AWS EKS dev cluster, as determined here. This means that Bodywork cannot be tested using local clusters, such as Minikube, which install the ingress controller elsewhere.
Tasks
Description
As a Machine Learning Engineer, I would like access to the Git commit hash of the running pipeline, so that I can use it to tag the artefacts generated in my pipelines.
Tasks
bodywork.git
module, that uses git rev-parse --short HEAD
to get the commit hash.bodywork.k8s.batch_jobs
and bodywork.k8s.service_deployments
to take git commit cases as variables as arguments that are then injected as environment variables, in much the same way that GitHub SSH credentials are injected.workflow_execution.py
to retrieve and then inject the git commit hash into jobs and deployments.The config.ini
files in each of the deployment templates will require the INGRESS
parameter to be set for all service stages, in order to be compatible with Bodywork v0.3.0.
This is also a good opportunity to add a license (MIT for these?), version file and possibly a CHANGELOG.md file.
I am following your tutorial. It is amazing, but I have a few questions.
In the part 1 of the tutorial, when I wanted to manually test the deployed prediction endpoint, I typed this in my terminal:
curl http://CLUSTER_IP/pipelines/time-to-dispatch--serve-model/api/v0.1/time_to_dispatch
The CLUSTER_IP is not defined and it gives an error. So, I understand that the CLUSTER_IP might not be accessible if I use minikube, but would it be a problem when I deploy it on AWS?
And also, where does the time-to-dispatch--serve-model
in the endpoint come from?
As a MLOps Engineer, I would like to be able to run custom code (e.g. send notifications) if a workflow has failed to execute, so that I can handle errors appropriately.
Tasks
project.run_on_failure
parameter, which takes the name of a stage that is only to be run if the workflow raises an exception.bodywork.workflow_execution.run_workflow
, then if project.run_on_failure
has been set, start a job to run the required Python module specific in project.run_on_failure
, using the exception as an argument to the module.We should be able to use existing test repos that raise errors, in order to build an integration test for this.
Use this to implement this equivalent of,
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: ingress-scoring-service
namespace: ml-workflow
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
rules:
- http:
paths:
- path: /scoring-service(/|$)(.*)
backend:
serviceName: bodywork-ml-pipeline-project--stage-2-deploy-scoring-service
servicePort: 5000
When deploying the bodywork-ml-pipeline-project
template project and using the Kubernetes NGINX ingress controller as deployed using this.
Story
As a ML engineer, I would like to know when a workflow has failed because it is private and the SSH credentials are missing, so that I can take the necessary steps to correct the problem.
Task
Improve the exception handling in git.py
, so that users are better informed.
That describes what has changed between 0.2.*
and 0.3.0
and which will be kept up-to-date from this point onwards (can we modify the CICD template to check for this?).
The test for:
bodywork.cli.cronjobs.display_cronjobs_in_namespace
bodywork.cli.service_deployments.display_service_deployments_in_namespace
bodywork.cli.secrets.display_secrets_in_namespace
Are rough-and-dirty (approximate at best), relying on looking for simple strings. These could be made more precise by using regex to look for exact matches.
Leading to log messages that make no sense.
For example, in line 161 of stage.py
we have:
time_param_error = BodyworkStageConfigError(
'MAX_STARTUP_TIME_SECONDS',
'batch',
name
)
Which should actually be,
time_param_error = BodyworkStageConfigError(
name,
'service'
'MAX_STARTUP_TIME_SECONDS'
)
Check all uses of BodyworkStageConfigError
in stage.py
and fix where necessary.
Story
As a ML engineer, I would like to know how to setup and manage ingress, so that I can expose my services to the world beyond the k8s cluster.
Task
Document all the extra functionality introduces in the ingress epic.
Story
As a ML engineer, I would like to be able to configure service stages for cluster ingress, so that I can expose my services to the world beyond the k8s cluster.
Tasks
bodywork.stage.ServiceStage
class to be able to parse, validate and store the following config.ini
parameters....
[service]
CREATE_INGRESS=True
Use the Rich package to improve the rendering of Bodywork deployment information for end users.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.