shulietech / takin Goto Github PK

Takin is an Java-based, open-source system designed to measure online environmental performance test for full-links, Especially for microservices. Through Takin, middlewares and applications can identify real online traffic and test traffic, ensure that they enter the right databases.

License: Apache License 2.0

performance-testing performance-analysis takin

takin's Introduction

Takin

English / 中文

What is Takin?

Takin is an Java-based, open-source system designed to measure online or test environmental performance test for full-links, Especially for microservices. Through ArchGuadian, middlewares and applications can identify real online traffic and test traffic, ensure that they enter the right databases.

Why should we do online environmental performance test

Microservices Architecture is used commonly nowadays and it always make system complex to understand for humans. Moreover, businesses are also very complex in huge system. Business complexity and system complexity make it difficult to :

Keep entire system highly available
Maintain Research & Development efficiency.

In order to keep system high available, we usually make performance test on test environment or online single-service. However, test environment is very different from online environment, single-service can't stand for the whole service-links. They can't guarantee system performance.

Microservices Are Complex
Compare with monolithic application, Microservices architecture increases complexity for business system. It may maintain multiple tools and frameworks.

Business Systems Are Complex
Businesses involve different sections and many of them are long-process and complicated, such as E-Commerce businesses.

The Microservices Relation Is Complex
In a microservices architecture system with a lot of business services, the calling relation between services is very complicated. Every change may affect the availability of the entire system and make developers difficult to release new versions Frequently.

Quick Start Instruction

docker:

VM memory requirement ： 8G
Docker mirror size ： 2.1 G

If docker configuration doesn't set AliYun docker source :

vim /etc/docker/daemon.json

Add following configuration：

{
  "registry-mirrors": ["https://q2gr04ke.mirror.aliyuncs.com"]
}

restart service

systemctl daemon-reload

Pull docker

# docker url : registry.cn-hangzhou.aliyuncs.com/shulie-takin/takin:v1.0.0
docker pull registry.cn-hangzhou.aliyuncs.com/shulie-takin/takin:v1.0.1
docker run -e APPIP=your ip address -p 80:80 -p 2181:2181 -p 29900-29999:29900-29999 registry.cn-hangzhou.aliyuncs.com/shulie-takin/takin:v1.0.1

Parameter：-d start in background，-p port.
The Initiation of docker need about 10 mins because it need install necessary components. -d can ignore installment information of components in background. If you dont't want to open your server's port, you can use --net=host and make sure it and host server are in the same network。
Open http://APPIP/web
PS：If Nginx shows 502, the problem mostly is caused when the docker container has just been started, you only need to configure it correctly, and then wait a little (1-2 min) while to refresh and try again.

after installation：

Use in Test Environment : Document
Use in Online Environment : Document
Video Instruction: Video

Instruction

Takin Architecture

Takin consists of Agent, Web App and Surge Data.

Agent

see Agent

Surge Data

see Takin-surge-deploy
see Takin-amdb

Takin Web

Takin Engine

see Takin-pressure-engine
see Takin-jmeter

Community

Mailing List: Mail to [email protected]
Wechat group

QQ group: **118098566**
QR code：

Dingding group：

WeChat Official Account:

Ask Questions in Official Forum

Official Forum

Who use Takin

License

Takin is under the Apache 2.0 license. See the LICENSE file for details.

takin's People

Contributors

Stargazers

Watchers

Forkers

libing8719 john1688 qlw 2015java tealover wanchao1123 wonechao samchen1981 hadoop835 qdriven robin977 binyangchen xyggun caoyangjie ws-zhangteng laashub-soa mistletoe9527 pandaclj lgtming yilishuku tydhot lovergang ittestking jeek888 stupidmuggle wakaaa1234 beaver-company henryxiao-tester mg0934 lcc214321 civvy tigerge000 themycode astra-zhao hwting hyqgod frostingwolf dockersky mickylee zhaofanfan2019 lcb14 deke1521 xk535 fennzheng gmij imath60 anruy vinzhangya mr-luxiaohua withlin yesiaikan sunquanchao ddukee lihao469461062 zhanglei viyond qlchan wildwolfbang henrypfhu jingchen-1002 hujiayi0126 zhaojun-hz initiald0824 zerolugithub ykgarfield wsyu52 ljunwei tomdev2008 mashuangwei xiongxiaoqing614 slzhennan squidyu equinoxia kuangye098 daysdays luyulong liujiafeng622 sdevenchen devon-ye mzeht wgzhxy hengyu-coder sryh endeavor-hxs sdtm1016 shutong12345 qingbusheng yqjack fengleimuai tonylv charlessong herofire aj23pwt zhangli344236745 meihaoyidian tandan330 zhangdinet frankswu xhb7636553 jarkata

takin's Issues

删除应用的时候，相关联表内容没有删除，当存在名称查询的时候，之前的数据依旧被返回

How to open the web

How to open the web.
80 port 403 Forbidden

skywalking的agent兼容规划

目前应用已经安装了skywalking的agent，并且性能损耗在10%，数列的agent虽然说性能损耗能调优到5%以下，但安装两个agent性能，兼容性还是会有一些问题，所以能一个agent是最好的，那么有几种兼容方式，数列的agent兼容skywalking的协议，数列的产品功能兼容skywalking的核心功能，想要了解这块的后续规划

开源版本链路梳理-业务活动新建成功，但是没有"详情"按钮，只有“编辑”和“删除”，没有“详情”按钮怎么查看链路？

代码分支规范谈论

1、关于版本分支是否需要存活？
版本分支：为每一个发布，建一个版本分支，命名为版本号

在开启压测任务的时候，对监听器方法排序的时候，比较器内的compare方法出错误

、、、
public void doEvents(Event event) {
Map<String,ListenerContainer.Listener> map = listenerContainer.getListeners().get(event.getEventName());
List<ListenerContainer.Listener> list = new ArrayList(map.values());
Collections.sort(list, new Comparator<ListenerContainer.Listener>() {
@OverRide
public int compare(ListenerContainer.Listener o1, ListenerContainer.Listener o2) {
return o1.getIntrestFor().order() > o1.getIntrestFor().order() ? 1 : -1;
}

        @Override
        public boolean equals(Object obj) {
            return false;
        }
    });
    for (ListenerContainer.Listener entry : list) {
        try {
            entry.getMethod().invoke(entry.getObject(), event);
        } catch (IllegalAccessException e) {
            e.printStackTrace();
        } catch (InvocationTargetException e) {
            e.printStackTrace();
        }
    }
}

、、、

请问压测是否支持集群压测？

日志插件隔离

1、logback隔离性能优化
2、log4j隔离支持

巡检优化

1.添加技术节点后，如果有多个上游与之关联；由于节点之间线会非常多，这样大屏展示出来就非常乱

2.目前巡检任务配置多个业务活动关联关系时，需要配置多个巡检场景，其中有关联关系的技术节点需要重复配置，操作比较繁琐

“创建多链路场景”是否尚未开放源代码

请问，“创建多链路场景”是否尚未开放源代码？

takin的一些建议

skywalking用slack、gitter我觉得takin也可以参考一下，增加开发者之间的交流，并且可以有一些沟通存档
在代码格式化话上，可以有一个统一的code style
在数列官网上关于开源的内容相对较少，可以有一个开源的官网
产出针对测试人员的takin最小化版本（只有压测相关功能可以线下压测），面向生产环境的全量版本
amdb数据处理只依赖数据库
探针设计文档以及插件的二次开发文档样例，以及插件开发的思路

异常日志记录以及通知

1、代码中log.error 日志落库，并发通知，以便排查

Who is using Takin？（欢迎使用Takin的个人或者公司在此留言）

#谁在关注和使用Takin？

感谢关注和使用Takin的开发者和用户，大家的使用会给我们更大的鼓励。我们会持续的投入，让Takin的项目和社区更加繁荣，给开发者和用户更好的体验。

#为什么会有这个issue？

更多的了解Takin的真实使用场景，以便后续的版本规划
吸引更多的开发者参与到项目建设中

#我们期待您为Takin社区提供

在此提交一条评论，评论内容包括：

您所在公司、学校或组织
您所在的城市、国家
您的联系方式: 邮箱、微信 (至少一个)
您将 Takin 用于哪些业务场景

可以参考以下实例：

公司：数列科技
地点：**杭州
联系方式：[email protected]
使用场景：作为公司的生产压测平台，为公司活动提供准确的系统容量评估

支持Apache-Axis和Apache-Cxf两个WebService中间件

支持cxf-3.2.5和axis-1.4
仅对这两个中间件做trace跟踪

import to idea error

<dependency>
            <groupId>io.shulie.amdb</groupId>
            <artifactId>amdb-common</artifactId>
            <version>1.0-SNAPSHOT</version>
</dependency>

is missing

前端角度提出的几点建议

前端使用的依赖版本太旧了，维护起来会越来越艰难，建议升级

一些依赖使用的还是两三年前的版本，需要引入其他依赖的时候有可能会引发不兼容，修复 bug 可能都找不到文档了，

依赖	使用版本	最新版本	备注
react	16.8.6	17.0.2	核心依赖
umi	2.13.7	3.5.20	脚手架
antd	3.26.13	4.16.13	ui 组件库
racc	0.4.5	0.4.5	前同事基于 antd@3 封装的组件，已无人维护，应考虑移除

UI 整体风格比较陈旧，建议重新设计

仓库管理

前端仓库建议还是独立维护，核心仓库文档加上子项目的仓库地址，可以考虑通过 github actions 自动化的方式将前端构建后的文件跟后端合并，这样用户 pull 下来可以直接运行，需要看源码再去前端仓库查看

形成规范，并完善文档，尽量使用自动化方式处理问题

比如版本号的规范比较通用的是语义化版本
自动打包，自动生成changelog等等

前端升级重构的话工作量还是非常大的

Dependency org.apache.zookeeper:zookeeper, leading to CVE problem

Hi, In Takin/takin-data/surge-data/common，there is a dependency org.apache.zookeeper:zookeeper:3.4.9 that calls the risk method.

CVE-2019-0201

The scope of this CVE affected version is [11.0, 24.1.1-android),(24.1.1-android, 24.1.1-jre)

After further analysis, in this project, the main Api called is <org.apache.zookeeper.server.FinalRequestProcessor: void processRequest(org.apache.zookeeper.server.Request)>

Risk method repair link : GitHub

CVE Bug Invocation Path--

Path Length : 4

<org.apache.zookeeper.server.FinalRequestProcessor: void processRequest(org.apache.zookeeper.server.Request)>
at <org.apache.zookeeper.server.quorum.CommitProcessor: void run()> (org.apache.zookeeper.server.quorum.CommitProcessor.java:[77]) in /.m2/repository/org/apache/zookeeper/zookeeper/3.4.9/zookeeper-3.4.9.jar
at <io.shulie.surge.data.common.zk.impl.CuratorZkPathChildrenCache: void setNewData(java.util.List)> (io.shulie.surge.data.common.zk.impl.CuratorZkPathChildrenCache.java:[185, 187]) in /detect/unzip/Takin-1.0.1/takin-data/surge-data/common/target/classes
at <io.shulie.surge.data.common.zk.impl.CuratorZkPathChildrenCache: void access$1000(io.shulie.surge.data.common.zk.impl.CuratorZkPathChildrenCache,java.util.List)> (io.shulie.surge.data.common.zk.impl.CuratorZkPathChildrenCache.java:[49]) in /detect/unzip/Takin-1.0.1/takin-data/surge-data/common/target/classes

Dependency tree--

[INFO] io.shulie.surge.data:common:jar:1.0
[INFO] +- ch.qos.logback:logback-classic:jar:1.2.3:compile
[INFO] |  +- ch.qos.logback:logback-core:jar:1.2.3:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.7.25:compile
[INFO] +- com.github.stephenc.high-scale-lib:high-scale-lib:jar:1.1.4:compile
[INFO] +- com.github.sgroschupf:zkclient:jar:0.1:compile
[INFO] +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.9:compile
[INFO] |  +- log4j:log4j:jar:1.2.16:compile
[INFO] |  +- jline:jline:jar:0.9.94:compile
[INFO] |  \- io.netty:netty:jar:3.10.5.Final:compile
[INFO] +- commons-codec:commons-codec:jar:1.6:compile
[INFO] +- com.alibaba:fastjson:jar:1.2.72:compile
[INFO] +- com.netflix.curator:curator-framework:jar:1.3.3:compile
[INFO] |  \- com.netflix.curator:curator-client:jar:1.3.3:compile
[INFO] +- com.netflix.curator:curator-recipes:jar:1.3.3:compile
[INFO] +- org.apache.commons:commons-lang3:jar:3.11:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] \- com.google.guava:guava:jar:15.0:compile

Suggested solutions:

Update dependency version

Thank you very much.

docker安装部分内容补充建议

目前 docker 安装部分，进入容器 只有一段注释，没有配套的命令，可能部分对 docker 不熟悉的同学会误以为第一条命令就是进入容器。

建议：
1、docker run 命令里加上 --name takin 参数，指定容器名称
2、下方的文档，加上进入容器命令（如 docker exec -it takin sh ）

密码脱敏

密码脱敏最好使用 git filter-branch 重写历史，并且 git push --force 推送至远端。否则，历史记录中还是会有真实的密码。

874799a

agent是否支持openjdk？在使用openjdk时因为openjdk的lib目录下并没有tools.jar，因此没有传-Xbootclasspath/a:/$JAVA_HOME/lib/tools.jar这个参数，结果agent启动失败了。

支持租户隔离

1、租户隔离彻底性隔离，包括应用配置、redis、zk等等

How to build demo quickly?

I want to build the demo quickly, but I can't download the docker mirror.

统计报表分析功能

1、关于压测统计，以及分析功能，以多次压测结果为标准，不能一次压测结果作为结果
2、关于页面的UV、PV指标统计

开启方法挡板后压测结果不准确

大半CPU损耗在Takin函数，是否有优化空间?

压测流量不阻断

接入调试更简单，新增一种调试的方式，尽量让压测流量忘下走，需要配置的信息不以异常的方式告知，以免阻断流量的调试
，因为只要抛异常，就直接阻断，对使用者的也有要求，比如数据库如果没有
建表权限，比如白名单需要了解这个接口是否可以配置，直接把这些信息类似于链路的方式上报上来

数据库不直接报错，上报了执行的sql语句上报，
白名单也直接通过，上报了走过的白名单的数据
mq支持自动创建影子topic，通过api的方式

Must not modify the content in the container.

It confuses me a bit: Why don't we use --env.

We must not modify the content in the container directly!!!

what's the roadmap

代码重复，没有必要重复赋值

、、、
private void notifyTaskResult(ScheduleRunRequest request) {
SceneTaskNotifyParam notify = new SceneTaskNotifyParam();
notify.setSceneId(request.getRequest().getSceneId());
notify.setTaskId(request.getRequest().getTaskId());
notify.setCustomerId(request.getRequest().getCustomerId());
notify.setCustomerId(request.getRequest().getCustomerId());
notify.setStatus("started");
sceneTaskService.taskResultNotify(notify);
}
、、、

可以使用 Squash 归一重复提交记录

例如重复的提交记录，建议考虑使用 Squash 归一。

压测调试功能太弱，使用难度很大

1、压测接入调试过程中没有详细的技术文档指导接入压测

2、产品使用难度还是很高如何

接入探针报错如何处理
压测调试的时候如何铺数
如果有中间件不支持如何识别

amdb-receiver-service项目表结构缺失

amdb-receiver-service项目表依赖MySQL数据库，但未提供表结构，项目中的数据库也无法连接

对windows本地调试不太友好

对windows本地调试不太友好；LinuxHelper.executeLinuxCmd; File.separator 等。

支持混合场景压测

不同场景需要不同的施压配置，如并发数，tps目标等，需要用多线程组来支持
兼容用户现有的jmeter脚本，减少用户重新编辑脚本的工作
对数据报告能支持线程组和罗静控制器为维度的数据统计
检测用户文件是否变更，并做响应处理

集成开发流水线

在日常环境中使用 Takin 大部场景都会和 DevOps 做整合集成到开发流水线上，现在没有专用的 API 和教程可以支撑用户快速集成现有的流水线

rabbitmq隔离问题

目前无法完美支持fanout的exchang类型，并且配置方式复杂，需要变更实现方式为通过rabbitmq admin api获取consumer信息的方式

增加单元测试的覆盖度

目前各个项目中，没有测试模块，修改代码后验证改动正确性、是否对其他模块有影响，没有测试的基础。
外部开发者、内部测试协同和代码合并场景，按照规范提交的代码必须带有相对完善的单元测试用例
需要在后续的版本中逐步增加单元测试的覆盖度

运行docker命令有报错

按照说明执行一下docker命令有报错：
docker run -d -p 80:80 -p 2181:2181 -p 3306:3306 -p 6379:6379 -p 8086:8086 -p 9000:9000 -p 10032:10032 -p 6628:6628 -p 8000:8000 -p 6627:6627 -p 8888:8888 -p 29900-29999:29900-29999 registry.cn-hangzhou.aliyuncs.com/forcecop/forcecop:v1.0.0

报错部分log：
Archive: /data/apps.zip

replace apps/tro-web/trodb_web_base.sql? [y]es, [n]o, [A]ll, [N]one, [r]ename: NULL

(EOF or read error, treating as "[N]one" ...)

apps_install.sh: line 4: /usr/local/nginx/conf/nginx.conf: No such file or directory

pid not found

apps_install.sh: line 80: nginx: command not found

/usr/bin/tail: inotify cannot be used, reverting to polling: Function not implemented

cloud只能部署一个实例，多实例会有问题

eg：
1、多处内存存储的数据需要改为redis
2、压测结束是删除文件会找不到路径
3、文件上传目前是存本地
...

分析实况详情没有结果

压测目标为总体 tps

现在的压测目标是单独设置压测的 tps，但是对于用户来说只知道当前的系统总共的 tps 应该在多少是安全的，所以建议将压测目标的 tps 设置成应用总的 tps，至于压测 tps 到底是多少则将设置的目标 tps减去当前业务的 tps，这个可以保证在哪个时间段压测系统的总 tps 都是在期望之内，避免产生额外风险

Travis CI

it is recommended to add travis ci to this project and keep it the same as other open source projects.Thanks

对takin的一些建议

1，架构支持saas服务，支持租户，用户组织，支持行级数据权限（且往后传递，amdb等）
2，大数据和agent的数据安全问题相关规范及标准（agent的收集上报，及大数据的存储分析和敏感数据的识别等）
3，大数据的计算分析能提供更多的行业场景（如，根据特定行业背景分析出更多有价值的数据，技术架构支持能力，按租户区分，按配置运行不通分析任务），Lic授权式加载，开放插件市场，吸引更多商业插件开发加入
4，数据核对工具，支持快速验证数据，因为takin的数据产生和加工过程很长（提高交付排障能力）
5，公众号运营起来，租户下的用户可以绑定账号，把告警和提醒通过公众号连接起来

压测报告中“请求流量明细”中的TraceID总数与请求总数不一致，请求总数57107，TraceID总数57435，我也确认了所有最早的一笔TraceId的开始时间是在压测开始之后的，说明57435个TraceId都是在压测过程中产生的。我怀疑是不是请求总数不准

压测结束性能分析很难用需要优化

压测结束后系统性能瓶颈分析目前手段还是很少

数据库瓶颈如何发现？
网关性能问题该怎么定位？
链路追踪展示的逻辑不太能看得明白

目前压测实况展示了所有的链路调用耗时，其实没有必要（用户只关心压测时候耗时比较长的链路调用）
链路调用耗时展示的明细不够清晰（主要体现在响应时间跟链路里面的耗时对不上，链路耗时ui展示的看不出重点）