Comments (2)
16:36:34 [Model Analyzer] DEBUG:
{'always_report_gpu_metrics': False,
'batch_sizes': [1],
'bls_composing_models': [],
'checkpoint_directory': '/app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/checkpoints',
'client_max_retries': 50,
'client_protocol': 'grpc',
'collect_cpu_metrics': False,
'concurrency': [],
'config_file': 'config.yaml',
'constraints': {},
'cpu_only_composing_models': [],
'duration_seconds': 3,
'early_exit_enable': False,
'export_path': './profile_results_reranker1',
'filename_model_gpu': 'metrics-model-gpu.csv',
'filename_model_inference': 'metrics-model-inference.csv',
'filename_server_only': 'metrics-server-only.csv',
'genai_perf_flags': {},
'gpu_output_fields': ['model_name',
'gpu_uuid',
'batch_size',
'concurrency',
'model_config_path',
'instance_group',
'satisfies_constraints',
'gpu_used_memory',
'gpu_utilization',
'gpu_power_usage'],
'gpus': ['all'],
'inference_output_fields': ['model_name',
'batch_size',
'concurrency',
'model_config_path',
'instance_group',
'max_batch_size',
'satisfies_constraints',
'perf_throughput',
'perf_latency_p99'],
'latency_budget': None,
'min_throughput': None,
'model_repository': '/app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/reranker',
'model_type': 'generic',
'monitoring_interval': 1.0,
'num_configs_per_model': 2,
'num_top_model_configs': 0,
'objectives': {'perf_throughput': 10},
'output_model_repository_path': './rerenker_output1',
'override_output_model_repository': True,
'perf_analyzer_cpu_util': 5120.0,
'perf_analyzer_flags': {},
'perf_analyzer_max_auto_adjusts': 10,
'perf_analyzer_path': 'perf_analyzer',
'perf_analyzer_timeout': 600,
'perf_output': False,
'perf_output_path': None,
'plots': [{'name': 'throughput_v_latency', 'title': 'Throughput vs. Latency', 'x_axis': 'perf_latency_p99', 'y_axis': 'perf_throughput', 'monotonic': True},
{'name': 'gpu_mem_v_latency', 'title': 'GPU Memory vs. Latency', 'x_axis': 'perf_latency_p99', 'y_axis': 'gpu_used_memory', 'monotonic': False}],
'profile_models': [{'model_name': 'bge_reranker_v2_onnx', 'cpu_only': False, 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': [], 'request_rate': []}, 'weighting': 1},
{'model_name': 'reranker', 'cpu_only': False, 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': [], 'request_rate': []}, 'weighting': 1}],
'reload_model_disable': False,
'request_rate': [],
'request_rate_search_enable': False,
'run_config_profile_models_concurrently_enable': True,
'run_config_search_disable': False,
'run_config_search_max_binary_search_steps': 5,
'run_config_search_max_concurrency': 2,
'run_config_search_max_instance_count': 4,
'run_config_search_max_model_batch_size': 4,
'run_config_search_max_request_rate': 8192,
'run_config_search_min_concurrency': 1,
'run_config_search_min_instance_count': 1,
'run_config_search_min_model_batch_size': 1,
'run_config_search_min_request_rate': 16,
'run_config_search_mode': 'quick',
'server_output_fields': ['model_name',
'gpu_uuid',
'gpu_used_memory',
'gpu_utilization',
'gpu_power_usage'],
'skip_detailed_reports': False,
'skip_summary_reports': False,
'triton_docker_args': {},
'triton_docker_image': 'nvcr.io/nvidia/tritonserver:24.04-py3',
'triton_docker_labels': {},
'triton_docker_mounts': [],
'triton_docker_shm_size': None,
'triton_grpc_endpoint': 'localhost:8001',
'triton_http_endpoint': 'localhost:8000',
'triton_install_path': '/opt/tritonserver',
'triton_launch_mode': 'local',
'triton_metrics_url': 'http://localhost:8002/metrics',
'triton_output_path': None,
'triton_server_environment': {},
'triton_server_flags': {},
'triton_server_path': 'tritonserver',
'weighting': None}
16:36:34 [Model Analyzer] Initializing GPUDevice handles
16:36:35 [Model Analyzer] Using GPU 0 Tesla V100-SXM2-32GB with UUID GPU-c898354c-1e75-3b40-3c84-2a272ee206c2
16:36:36 [Model Analyzer] WARNING: Overriding the output model repo path "./rerenker_output1"
16:36:36 [Model Analyzer] Starting a local Triton Server
16:36:36 [Model Analyzer] No checkpoint file found, starting a fresh run.
16:36:36 [Model Analyzer] Profiling server only metrics...
16:36:36 [Model Analyzer] DEBUG: Triton Server started.
16:36:46 [Model Analyzer] DEBUG: Stopped Triton Server.
16:36:46 [Model Analyzer]
16:36:46 [Model Analyzer] Starting quick mode search to find optimal configs
16:36:46 [Model Analyzer]
16:36:46 [Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_default
16:36:46 [Model Analyzer]
16:36:46 [Model Analyzer] Creating model config: reranker_config_default
16:36:46 [Model Analyzer]
16:36:58 [Model Analyzer] DEBUG: Triton Server started.
16:37:07 [Model Analyzer] DEBUG: Model bge_reranker_v2_onnx_config_default loaded
16:37:22 [Model Analyzer] DEBUG: Model reranker_config_default loaded
16:37:22 [Model Analyzer] Profiling bge_reranker_v2_onnx_config_default: client batch size=1, concurrency=8
16:37:22 [Model Analyzer] Profiling reranker_config_default: client batch size=1, concurrency=16
16:37:22 [Model Analyzer]
16:37:22 [Model Analyzer] DEBUG: Running ['mpiexec', '--allow-run-as-root', '--tag-output', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'bge_reranker_v2_onnx', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'bge_reranker_v2_onnx-results.csv', '--verbose-csv', '--concurrency-range', '8', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000', ':', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'reranker', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'reranker-results.csv', '--verbose-csv', '--concurrency-range', '16', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000']
16:37:26 [Model Analyzer] Running perf_analyzer failed with exit status 99:
[1,1]:*** Measurement Settings ***
[1,1]: Batch size: 1
[1,1]: Service Kind: Triton
[1,1]: Using "count_windows" mode for stabilization
[1,1]: Minimum number of samples in each window: 50
[1,1]: Using synchronous calls for inference
[1,1]: Stabilizing using average latency
[1,1]:
[1,0]:*** Measurement Settings ***
[1,0]: Batch size: 1
[1,0]: Service Kind: Triton
[1,0]: Using "count_windows" mode for stabilization
[1,0]: Minimum number of samples in each window: 50
[1,0]: Using synchronous calls for inference
[1,0]: Stabilizing using average latency
[1,0]:
[1,0]:Request concurrency: 8
[1,1]:Request concurrency: 16
[1,1]:Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
[1,1]:Thread [0] had error: Failed to process the request(s) for model instance 'reranker_0_4', mes
16:37:26 [Model Analyzer] DEBUG: Measurement for [0, 0, 0, 0]: None.
16:37:26 [Model Analyzer] Saved checkpoint to /app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/checkpoints/0.ckpt
16:37:26 [Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_0
16:37:26 [Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]
16:37:26 [Model Analyzer] Setting max_batch_size to 1
16:37:26 [Model Analyzer] Enabling dynamic_batching
16:37:26 [Model Analyzer]
16:37:26 [Model Analyzer] Creating model config: reranker_config_0
16:37:26 [Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]
16:37:26 [Model Analyzer] Setting max_batch_size to 1
16:37:26 [Model Analyzer] Enabling dynamic_batching
16:37:26 [Model Analyzer]
16:37:31 [Model Analyzer] DEBUG: Stopped Triton Server.
16:37:31 [Model Analyzer] DEBUG: Triton Server started.
16:37:34 [Model Analyzer] DEBUG: Model bge_reranker_v2_onnx_config_0 loaded
16:37:47 [Model Analyzer] DEBUG: Model reranker_config_0 loaded
16:37:47 [Model Analyzer] Profiling bge_reranker_v2_onnx_config_0: client batch size=1, concurrency=2
16:37:47 [Model Analyzer] Profiling reranker_config_0: client batch size=1, concurrency=2
16:37:47 [Model Analyzer]
16:37:47 [Model Analyzer] DEBUG: Running ['mpiexec', '--allow-run-as-root', '--tag-output', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'bge_reranker_v2_onnx', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'bge_reranker_v2_onnx-results.csv', '--verbose-csv', '--concurrency-range', '2', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000', ':', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'reranker', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'reranker-results.csv', '--verbose-csv', '--concurrency-range', '2', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000']
16:37:51 [Model Analyzer] Running perf_analyzer failed with exit status 99:
[1,0]:*** Measurement Settings ***
[1,0]: Batch size: 1
[1,0]: Service Kind: Triton
[1,0]: Using "count_windows" mode for stabilization
[1,0]: Minimum number of samples in each window: 50
[1,0]: Using synchronous calls for inference
[1,0]: Stabilizing using average latency
[1,0]:
[1,1]:*** Measurement Settings ***
[1,1]: Batch size: 1
[1,1]: Service Kind: Triton
[1,1]: Using "count_windows" mode for stabilization
[1,1]: Minimum number of samples in each window: 50
[1,1]: Using synchronous calls for inference
[1,1]: Stabilizing using average latency
[1,1]:
[1,0]:Request concurrency: 2
[1,1]:Request concurrency: 2
[1,1]:Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
[1,1]:Thread [0] had error: [request id: <id_unknown>] Exceeds maximum queue size
[1,1]:
[1,
16:37:51 [Model Analyzer] No changes made to analyzer data, no checkpoint saved.
16:37:56 [Model Analyzer] DEBUG: Stopped Triton Server.
Traceback (most recent call last):
File "/opt/app_venv/bin/model-analyzer", line 8, in
sys.exit(main())
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/entrypoint.py", line 278, in main
analyzer.profile(
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 124, in profile
self._profile_models()
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 233, in _profile_models
self._model_manager.run_models(models=models)
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 145, in run_models
self._stop_ma_if_no_valid_measurement_threshold_reached()
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 239, in _stop_ma_if_no_valid_measurement_threshold_reached
raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.
from server.
@riyajatar37003
Can you please provide the details of the bug suing our bug report here: https://github.com/triton-inference-server/server/issues/new/choose
from server.
Related Issues (20)
- Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API HOT 5
- Question to huggingface model using triton
- Questions about input and output shape in model configuration when batch size is 1 HOT 3
- Model Management HOT 1
- passing input data HOT 1
- Python backend status zombie but Tritonserver `v2/health` still return 200 OK HOT 1
- Onnxruntime backend doesn't load model when container is running on Ubuntu HOT 1
- Cant build python+onnx+ternsorrtllm backends r24.04 HOT 3
- increase chunk size for streaming with tensorrtllm_backend
- trt accelerator
- Can't build the Docker image r24.04 on Azure Nvidia VMI HOT 3
- Build error when building new image on top of the `nvcr.io/nvidia/tritonserver:24.04-py3-sdk` container image from NGC HOT 3
- Feature Questions HOT 1
- Triton Server for the model mixtral-8x7b HOT 1
- max_batch_effect HOT 1
- launch_triton_server.py attempts to place two models on the same GPU instead of one model on two GPUs
- CUDA Failing to initialize in docker container HOT 3
- Add to the serve-side metrics on the input and output sizes HOT 1
- Pods Receiving Traffic Too Early When Scaling with HPA Causes 'Socket Closed' Errors on Triton Inference Server
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.