Comments (4)
I wasn't able to reproduce this issue. Hope the indexSpec
was being submitted as part of the tuningConfig
as detailed at the Supervisor API page; else it will pick the defaults.
{
"type": "kafka",
"spec": {
"tuningConfig": {
"indexSpec": {
"bitmap": {
"type": "roaring"
},
"dimensionCompression": "lz4",
"metricCompression": "none",
"longEncoding": "auto",
"stringDictionaryEncoding": {
"type": "frontCoded",
"formatVersion": 1,
"bucketSize": 8
},
"jsonCompression": "lz4"
}
}
}
}
from druid.
Thanks for looking at this, @gargvishesh ! My team can retest on a newer version of druid. Separate but related, do you know how we can verify that the front coding actually took effect? In the unified console, the segment view doesn't actually indicate the encoding for strings (last I had checked). If we need to open up the segment to confirm frontCoded versus utf-8, that is acceptable.
from druid.
Update: the spec below has the same features as the original spec but with fewer columns (fewer dimensions, metrics, flattenSpec, and transforms). It seems to work correctly. So this problem appears only with the larger spec.
Example test:
- tested against the late May release
$ echo "$spec" | jq '.spec.tuningConfig.indexSpec'
{
"bitmap": {
"type": "roaring"
},
"dimensionCompression": "lz4",
"stringDictionaryEncoding": {
"type": "frontCoded",
"bucketSize": 8,
"formatVersion": 1
},
"metricCompression": "none",
"longEncoding": "auto",
"jsonCompression": "lz4"
}
$ curl -k -XPOST -H content-type:application/json -H "Authorization: Basic $pwd" 'https://router-lb:9088/druid/indexer/v1/supervisor' -d "$spec"
{"id":"druid_streaming_source"}
$ curl -XGET -H content-type:application/json -H "Authorization: Basic $pwd" 'https://router-lb:9088/druid/indexer/v1/supervisor/druid_streaming_source' | jq '.spec.tuningConfig.indexSpec' #-d "$spec"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7228 100 7228 0 0 19790 0 --:--:-- --:--:-- --:--:-- 19802
{
"bitmap": {
"type": "roaring"
},
"dimensionCompression": "lz4",
"stringDictionaryEncoding": {
"type": "frontCoded",
"bucketSize": 8,
"formatVersion": 1
},
"metricCompression": "none",
"longEncoding": "auto",
"jsonCompression": "lz4"
}
{
"type": "kafka",
"spec": {
"dataSchema": {
"dataSource": "druid_streaming_source",
"timestampSpec": {
"column": "ts_column",
"format": "millis",
"missingValue": null
},
"dimensionsSpec": {
"dimensions": [
{
"type": "string",
"name": "concat_dimension",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "primitive_column_a",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
}
],
"dimensionExclusions": [
"__time",
"ts_column",
"event_count",
"sketch_column_a",
"max_column_a"
],
"includeAllDimensions": false,
"useSchemaDiscovery": false
},
"metricsSpec": [
{
"type": "count",
"name": "event_count"
},
{
"type": "quantilesDoublesSketch",
"name": "sketch_column_a",
"fieldName": "numeric_source_column_a",
"k": 128,
"maxStreamLength": 1000000000
},
{
"type": "longMax",
"name": "max_column_a",
"fieldName": "numeric_source_column_a"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": {
"type": "duration",
"duration": 300000,
"origin": "1970-01-01T00:00:00.000Z"
},
"rollup": true,
"intervals": []
},
"transformSpec": {
"filter": {
"type": "and",
"fields": [
{
"type": "in",
"dimension": "filter_dim_a",
"values": [
"value-1",
"value-2"
]
},
{
"type": "or",
"fields": [
{
"type": "selector",
"dimension": "filter_dim_b",
"value": "value-3"
},
{
"type": "selector",
"dimension": "filter_dim_c",
"value": "value-4"
}
]
}
]
},
"transforms": [
{
"type": "expression",
"name": "concat_dimension",
"expression": "concat(\"field_a\", '_', \"field_b\")"
}
]
}
},
"ioConfig": {
"topic": "kafka_topic_a",
"topicPattern": null,
"inputFormat": {
"type": "avro_stream",
"flattenSpec": {
"useFieldDiscovery": true,
"fields": [
{
"type": "path",
"name": "primitive_column_a",
"expr": "$.primitive_column_a",
"nodes": null
}
]
},
"avroBytesDecoder": {
"type": "schema_registry",
"url": "https://schema-registry",
"capacity": 2147483647,
"urls": null,
"config": null,
"headers": null
},
"binaryAsString": false,
"extractUnionsByType": false
},
"replicas": 1,
"taskCount": 2,
"taskDuration": "PT3600S",
"consumerProperties": {
"bootstrap.servers": "kafka-1,kafka-2,kafka-3"
},
"autoScalerConfig": null,
"pollTimeout": 100,
"startDelay": "PT5S",
"period": "PT30S",
"useEarliestOffset": false,
"completionTimeout": "PT1800S",
"lateMessageRejectionPeriod": null,
"earlyMessageRejectionPeriod": null,
"lateMessageRejectionStartDateTime": null,
"configOverrides": null,
"idleConfig": null,
"stopTaskCount": null,
"stream": "kafka_topic_a",
"useEarliestSequenceNumber": false
},
"tuningConfig": {
"type": "kafka",
"appendableIndexSpec": {
"type": "onheap",
"preserveExistingMetrics": false
},
"maxRowsInMemory": 60000,
"maxBytesInMemory": -1,
"skipBytesInMemoryOverheadCheck": false,
"maxRowsPerSegment": 2000000,
"maxTotalRows": null,
"intermediatePersistPeriod": "PT1M",
"maxPendingPersists": 0,
"indexSpec": {
"bitmap": {
"type": "roaring"
},
"dimensionCompression": "lz4",
"stringDictionaryEncoding": {
"type": "frontCoded",
"bucketSize": 8,
"formatVersion": 1
},
"metricCompression": "none",
"longEncoding": "auto",
"jsonCompression": "lz4"
},
"indexSpecForIntermediatePersists": null,
"reportParseExceptions": false,
"handoffConditionTimeout": 900000,
"resetOffsetAutomatically": false,
"segmentWriteOutMediumFactory": null,
"workerThreads": null,
"chatRetries": 8,
"httpTimeout": "PT10S",
"shutdownTimeout": "PT80S",
"offsetFetchPeriod": "PT30S",
"intermediateHandoffPeriod": "P2147483647D",
"logParseExceptions": false,
"maxParseExceptions": 2147483647,
"maxSavedParseExceptions": 0,
"numPersistThreads": 1,
"skipSequenceNumberAvailabilityCheck": false,
"repartitionTransitionDuration": "PT120S"
}
},
"context": {
"taskLockType": "APPEND",
"useSharedLock": true
},
"suspended": false
}
from druid.
This was retested using the late May release. The new kafka data sources did not have a problem accepting frontCoded version 1. I also tried submitting as frontCoded version 0, pushing the segments and letting them propagate and then resubmitting as version 1 which also accepted version 1 -- good (no apparent backwards compatibility logic rewriting version 1 to version 0).
It looks submitting a spec that includes the extra properties populated by the system (ie, defaults) will retain the frontCoded spec version, but submitting a supervisor spec that will be modified by this process results in version 1 being changed to version 0.
from druid.
Related Issues (20)
- Ingestion protobuf with SR leads to error: Next token wasn't a START_ARRAY, was[null] from url
- Druid pac4j extension failing during OIDC callback
- Can't use APPROX_QUANTILE/APPROX_QUANTILE_DS to calculate percentiles for sys.segments HOT 2
- Indexer unhealthy after 1 job deployment, querying datasource fails with Next token wasn't a START_ARRAY
- Druid docker Run failing
- Check interval range to avoid cases where year is inappropriately entered HOT 3
- Min() and Max() aggregate functions on string columns HOT 2
- Timestamp field datatype
- Query after pressing Enter to reduce database load and improve efficiency
- Got Interrupted while adding to the Queue
- The SegmentMetadata query returns the thetaSketch column type incorrectly in real-time ingestion range HOT 1
- Reduce the size of the druid tar. HOT 1
- Add diff function in the edit spec screen HOT 1
- Apache Druid Historical Problems HOT 1
- Add support for Kinesis Compression HOT 1
- Error 401 Unauthorized error when using LDAP authentication
- Druid shows null columns even though is not null is used along with other conditions
- Router `priority` strategy implementation does not match documentation
- Availability Zone Fault Tolerance - Documentation Proposal
- Support for enhanced IP data types
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from druid.