Git Product home page Git Product logo

Comments (1)

tarilabs avatar tarilabs commented on May 19, 2024

Reproducer

With PostgreSQL ready, for instance:

docker run --name some-postgres \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=mypsw \
-e PGUSER=postgres \
-e PGPASSWORD=mypsw \
-p 5432:5432 \
-d "postgres:12"

Run the following Python script:

from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2
from faker import Faker
fake = Faker()

def create_store():
    connection_config = metadata_store_pb2.ConnectionConfig()
    connection_config.postgresql.host = 'localhost'
    connection_config.postgresql.port = '5432'
    connection_config.postgresql.user = 'postgres'
    connection_config.postgresql.password = 'mypsw'
    connection_config.postgresql.dbname = 'pgserver'
    # connection_config.postgresql.skip_db_creation = 'false'
    # connection_config.postgresql.ssloption.sslmode = '...' # disable, allow, verify-ca, verify-full, etc.
    # connection_config.postgresql.ssloption.sslcert = '...'
    # connection_config.postgresql.ssloption.sslkey = '...'
    # connection_config.postgresql.ssloption.sslpassword = '...'
    # connection_config.postgresql.ssloption.sslrootcert = '...'
    store = metadata_store.MetadataStore(connection_config, enable_upgrade_migration=True)
    return store

def create_ctx_type(store):
    experiment_type = metadata_store_pb2.ContextType()
    experiment_type.name = "Experiment"
    experiment_type.properties["note"] = metadata_store_pb2.STRING
    experiment_type_id = store.put_context_type(experiment_type)
    results = store.get_context_types()
    print(results)
    return experiment_type_id

def working(experiment_type_id):
    my_experiment = metadata_store_pb2.Context()
    my_experiment.type_id = experiment_type_id
    my_experiment.name = "exp1"
    my_experiment.properties["note"].string_value = "My first experiment."
    [experiment_id] = store.put_contexts([my_experiment])
    print(experiment_id)
    result = store.get_contexts_by_id([experiment_id])
    print(result)

def lorem(max_chars):
    result = ""
    while len(result) < max_chars:
        result += fake.sentence() + " "
    return result[:max_chars]

def not_working(experiment_type_id):
    my_experiment = metadata_store_pb2.Context()
    my_experiment.type_id = experiment_type_id
    my_experiment.name = "exp_longnote"
    my_experiment.properties["note"].string_value = lorem(2690) # up to ~2680 works
    [experiment_id] = store.put_contexts([my_experiment])
    print(experiment_id)
    result = store.get_contexts_by_id([experiment_id])
    print(result)

if __name__ == '__main__':
    store = create_store()
    experiment_type_id = create_ctx_type(store)
    working(experiment_type_id)
    not_working(experiment_type_id)

Resulting in:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0303 16:19:42.571434 13378 postgresql_metadata_source.cc:208] Connecting to database. 
I0303 16:19:42.576880 13378 postgresql_metadata_source.cc:215] Connection to database succeed.
I0303 16:19:43.322311 13378 postgresql_metadata_source.cc:208] Connecting to database. 
I0303 16:19:43.327381 13378 postgresql_metadata_source.cc:215] Connection to database succeed.
[id: 10
name: "Experiment"
properties {
  key: "note"
  value: STRING
}
]
1
[id: 1
type_id: 10
name: "exp1"
properties {
  key: "note"
  value {
    string_value: "My first experiment."
  }
}
type: "Experiment"
create_time_since_epoch: 1709479183548
last_update_time_since_epoch: 1709479183548
]
E0303 16:19:43.555275 13378 postgresql_metadata_source.cc:128] Execution failed: ERROR:  index row size 2712 exceeds btree version 4 maximum 2704 for index "idx_context_property_string"
DETAIL:  Index row references tuple (0,2) in relation "contextproperty".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
WARNING:absl:mlmd client InternalError: PostgreSQL metadata source error: ERROR:  index row size 2712 exceeds btree version 4 maximum 2704 for index "idx_context_property_string"
DETAIL:  Index row references tuple (0,2) in relation "contextproperty".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

Traceback (most recent call last):
  File "/home/tarilabs/git/demo20231009-mlmdpg/python/test3.py", line 61, in <module>
    not_working(experiment_type_id)
  File "/home/tarilabs/git/demo20231009-mlmdpg/python/test3.py", line 52, in not_working
    [experiment_id] = store.put_contexts([my_experiment])
  File "/home/tarilabs/git/demo20231009-mlmdpg/venv/lib64/python3.10/site-packages/ml_metadata/metadata_store/metadata_store.py", line 520, in put_contexts
    self._call('PutContexts', request, response)
  File "/home/tarilabs/git/demo20231009-mlmdpg/venv/lib64/python3.10/site-packages/ml_metadata/metadata_store/metadata_store.py", line 203, in _call
    return self._call_method(method_name, request, response)
  File "/home/tarilabs/git/demo20231009-mlmdpg/venv/lib64/python3.10/site-packages/ml_metadata/metadata_store/metadata_store.py", line 224, in _call_method
    self._pywrap_cc_call(cc_method, request, response)
  File "/home/tarilabs/git/demo20231009-mlmdpg/venv/lib64/python3.10/site-packages/ml_metadata/metadata_store/metadata_store.py", line 255, in _pywrap_cc_call
    raise errors.make_exception(error_message.decode('utf-8'), status_code)
ml_metadata.errors.InternalError: PostgreSQL metadata source error: ERROR:  index row size 2712 exceeds btree version 4 maximum 2704 for index "idx_context_property_string"
DETAIL:  Index row references tuple (0,2) in relation "contextproperty".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

Demonstrating is potentially a typo/overlook/misconfiguration of the Index which is limiting the amount of chars which can be stored in the MLMD string property:

PostgreSQL metadata source error: ERROR: index row size 2712 exceeds btree version 4 maximum 2704 for index "idx_context_property_string"
DETAIL: Index row references tuple (0,2) in relation "contextproperty".
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

from ml-metadata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.