google / model_search Goto Github PK

License: Apache License 2.0

Starlark 4.69% Python 95.31%

model_search's Issues

RuntimeError: Cannot connect sqlite3 database: unable to open database file

To get started, I just ran the example code. Unfortunately I get a RuntimeError. Did anyone get the same error or has an idea to solve it?
Thanks in advance!

============================CODE OUTPUT========================================

2021-04-15 08:37:31.183727: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-04-15 08:37:31.184042: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "demo.py", line 55, in <module>
    experiment_owner="model_search_user")
  File "C:\...\model_search\model_search\single_trainer.py", line 65, in try_models
    metadata=None)
  File "C:\...t\model_search\model_search\phoenix.py", line 240, in __init__
    study_owner)
  File "C:\...\model_search\model_search\metadata\ml_metadata_db.py", line 105, in __init__
    self._store = metadata_store.MetadataStore(self._connection_config)
  File "C:\...7\AppData\Local\Programs\Python\Python37\lib\site-packages\ml_metadata\metadata_store\metadata_store.py", line 92, in __init__
    config.SerializeToString(), migration_options.SerializeToString())
RuntimeError: Cannot connect sqlite3 database: unable to open database file

EDIT1: Formating
EDIT2: Changed VENV from python 3.7 to python 3.8 but didn't solve the problem

Some questions

How should I preprocess the dataset? In all examples features are integers, I tried with floats and got error.
Is it possible to specify the target metric?
How to deploy the trained model? Are there any docs?

Can't fine module : ModuleNotFoundError: No module named 'ml_metadata'

Hello,

Well... I can't find ml_metadata module or package anywhere.
What is weird thing is... I didn't get this error in other PC. (It might have different python version and environment)
Anyway, where is ml_metadata module? Is this no problem???
Please advise as necessary.

=================== Error message ===================================================

(modelsch) PS D:\PROGRAMMING\model_search> python ms.py
2021-02-27 07:05:53.471047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Traceback (most recent call last):
File "ms.py", line 3, in
from model_search import single_trainer
File "D:\PROGRAMMING\model_search\model_search\single_trainer.py", line 17, in
from model_search import oss_trainer_lib
File "D:\PROGRAMMING\model_search\model_search\oss_trainer_lib.py", line 29, in
from model_search import phoenix
File "D:\PROGRAMMING\model_search\model_search\phoenix.py", line 33, in
from model_search.metadata import ml_metadata_db
File "D:\PROGRAMMING\model_search\model_search\metadata\ml_metadata_db.py", line 24, in
from ml_metadata import metadata_store
ModuleNotFoundError: No module named 'ml_metadata'

NewRandomAccessFile failed to Create/Open

NotFoundError: NewRandomAccessFile failed to Create/Open: model_search/model_search/configs/dnn_config.pbtxt : The system cannot find the path specified. ; No such process

It seems like the path "model_search" was imported twice...Any idea how to solve it? Thanks in advance!

Can it run on GPU?

It looks like it recognizes the GPU, but it doesn't use the GPU.

RuntimeError: Cannot connect sqlite3 database: unable to open database file

Is there any solution to this problem?

UnparsedFlagAccessError: Trying to access flag --phoenix_master before flags were parsed.

I'm try to run Model Search, but get back this error:

UnparsedFlagAccessError: Trying to access flag --phoenix_master before flags were parsed.

ERROR:absl:Flags are not parsed. Using default in file mlmd database. Please run main with absl.app.run(main) to fix this. If running in distributed mode, this means that the trainers are not sharing information between one another.
---------------------------------------------------------------------------
UnparsedFlagAccessError                   Traceback (most recent call last)
<ipython-input-10-93cbf25f70e3> in <module>
     19     batch_size=32,
     20     experiment_name="root",
---> 21     experiment_owner="root")

How to Train a model with 2 dimensional dataset , ie. sentence with wordVec

Hi here,
looks like the input available is for a CSV file or image file. if I have a wordVec sequence, is it possible to input as it?

Inputs to the model

I have a problem with understanding the inputs to the model. I set my experiments on a dataset with 2496 columns using csv file. I provided label_index and record_defaults via a list.

Parameters that I set in single_trainer.SingleTrainer (as according to readme) were label_index, logits_dimension, record_defaults, filename, spec.

After the experiments were done, I started to look at the graphs to understand how to use them in a pipeline where keras models are used (I wanted to wrap selected graph with keras lambda layer and use for inference).

When looking at the graph I see

gdef = gpb.GraphDef()
with open('/Users/tomasz.p/Desktop/GOOGLE_SEARCH_RESULTS/r524xlarge-2/RESULTS/1/graph.pbtxt', 'r') as fh:
    graph_str = fh.read()
pbtf.Parse(graph_str, gdef)
tf.import_graph_def(gdef)
for op in tf.get_default_graph().get_operations():
    print(str(op.name))

import/record_defaults_0
import/record_defaults_1
import/record_defaults_2
import/record_defaults_3
import/record_defaults_4
...
up to 2496 as it should be. I also see:

import/Phoenix/search_generator_0/Input/input_layer/1_1/ExpandDims/dim
import/Phoenix/search_generator_0/Input/input_layer/1_1/ExpandDims
import/Phoenix/search_generator_0/Input/input_layer/1_1/Shape
import/Phoenix/search_generator_0/Input/input_layer/1_1/strided_slice/stack
import/Phoenix/search_generator_0/Input/input_layer/1_1/strided_slice/stack_1
import/Phoenix/search_generator_0/Input/input_layer/1_1/strided_slice/stack_2
import/Phoenix/search_generator_0/Input/input_layer/1_1/strided_slice
import/Phoenix/search_generator_0/Input/input_layer/1_1/Reshape/shape/1
import/Phoenix/search_generator_0/Input/input_layer/1_1/Reshape/shape
import/Phoenix/search_generator_0/Input/input_layer/1_1/Reshape

for all 2496 inputs.

but I only see :

input_1
input_2
input_3
input_4
...
input_21

21 inputs instead of 2496. Could you please help me to understand this situation?

My final goal is to do something like that:

import tensorflow as tf
#import tensorflow.compat.v1 as tf
#To make tf 2.0 compatible with tf1.0 code, we disable the tf2.0 functionalities
#tf.disable_eager_execution()
debug = False
if debug:
    tf.autograph.set_verbosity(3, True)
else:
    tf.autograph.set_verbosity(0, True)
from tensorflow.core.framework.graph_pb2 import GraphDef
from google.protobuf import text_format as pbtf
import numpy as np
@tf.autograph.experimental.do_not_convert
@tf.function
def my_model(x):
    #tf.get_default_graph()
    #tensor_input = ['inputs_'+str(i+1) for i in range(2496)]
    tensor_input = [str(i+1) for i in range(2496)]
    tensor_input_sample = [x[:,i] for i in range(2496)]
    dict_input = {tensor_input[i]: tensor_input_sample[i] for i in range(len(tensor_input))}
    input_map_ = dict_input
    y, z = tf.graph_util.import_graph_def(
        gd, name='', input_map=input_map_, return_elements=['Phoenix/Trainer/ArgMax:0', 'Phoenix/Trainer/Softmax:0'])
    return [y, z]
x = tf.keras.Input(shape=2496)
print(x)
gd = GraphDef()
print("open tf graph")
with open('/Users/tomasz.p/Desktop/GOOGLE_SEARCH_RESULTS/r524xlarge-2/RESULTS/11/graph.pbtxt', 'rb') as f:
    print("read file")
    graph_str = f.read()
print("parse")
pbtf.Parse(graph_str, gd)
print("import graph")
tf.import_graph_def(gd)
y, z = tf.keras.layers.Lambda(my_model)(x)
model = tf.keras.Model(x, [y, z])
model.summary()
y_out, z_out = model.predict(np.ones((5, 2496), dtype=np.float32))
print(y_out.shape, z_out.shape)
print(y_out, z_out)

but unfortunately I do not understand inputs at this point.
Thank you for any help!

Hey can you do a python assignment

How to predict from model_search saved_model.pb

Hi,
I have tried simple classification dataset "German credit data" to make predictions.
I load the model_search saved_model.pb model.
importedModel = tf.saved_model.load(saved_model_dir)

But, when I try to predict or make summary() from the model ,I get the following error,

AttributeError: 'AutoTrackable' object has no attribute 'summary'

When I print print (gs_model.signatures), I got "['serving_default']".

How can I predict, get metrics like f1_score, accuracy, R2,etc from the model.

Is it possible to store the model_search model as tenorflow v2 compatible in oss_trainer_lib.py instead of "estimator"model.

Pls suggest on this.
I am using tensorflow 2.4.0
How can I make the model to work like keras model.

Thanks,
SJRam

Cannot connect sqlite3 database

Hi,

I cannot connect to database while running the “Getting started” example.
How can I fix this? Thank you.

Here is the full trace:

Traceback (most recent call last):



 File "C:\Users\e173196\Anaconda projects\model_search\load_flags.py", line 32, in <module>

   trainer.try_models(



 File "C:\Users\e173196\Anaconda projects\model_search\model_search\single_trainer.py", line 57, in try_models

   phoenix_instance = phoenix.Phoenix(



 File "C:\Users\e173196\Anaconda projects\model_search\model_search\phoenix.py", line 239, in __init__

   self._metadata = ml_metadata_db.MLMetaData(phoenix_spec, study_name,



 File "C:\Users\e173196\Anaconda projects\model_search\model_search\metadata\ml_metadata_db.py", line 100, in __init__

   self._store = metadata_store.MetadataStore(self._connection_config)



 File "C:\Users\e173196\Anaconda3\envs\Model_search\lib\site-packages\ml_metadata\metadata_store\metadata_store.py", line 91, in __init__

   self._metadata_store = metadata_store_serialized.CreateMetadataStore(



RuntimeError: Cannot connect sqlite3 database: unable to open database file

this two block don't define in the block.py and don't support float64 when net = tf_slim.batch_norm(net, is_training=is_training)

blocks_to_use: "CIFAR_NASA_REDUCTION"
blocks_to_use: "CIFAR_NASA"

this two block don't define in the block.py

Is it possible to get the performance report after the training?

just wondering, after trying a number of models , ie.200 models, is it possible to output a list / file to state each model's performance / locations and etc

Could it search for models that use time series such as RNN

hi,I want to know how to use it to search for LSTM and GRU models.

Evaluating Model Search Results

As a test, I just ran the supplied code with all of the default parameters for the "Getting Started" example, just to make sure I installed everything correctly and that the code would run smoothly. The test has now finished, and I saw that the model search put all the outputs in a 'tmp\run_example' directory, with each model getting its own folder numbered from 1 - 200. I would now like to evaluate the results of this run, but I have no idea where to even start. I would like to see things like optimal model architecture, architectures evaluated, architecture with highest ratio of accuracy to # parameters, etc., but I am just lost as to how to go about doing that. Would anyone be able to offer any guidance in that area?

I tried loading in a model taking inspiration from the following tutorial on Tensorflow,

but ultimately I got an error after running the code below:

M = tf.keras.models.load_model("C:\\tmp\\run_example\\tuner-1\\200\\saved_model\\1616871647") M.summary() Traceback (most recent call last): File "<string>", line 1, in <module> AttributeError: 'AutoTrackable' object has no attribute 'summary'

Does anyone have any ideas how to get information about the model architectures from these runs?

FailedPreconditionError: No completed trials to perform ensembling.

I'm using the Colab example https://colab.research.google.com/drive/1k1EaKDCTB2fU9XtIdiiXEDyDOvrU6fmD?usp=sharing

Using my own data (composed of two attributes, both of them classes for regression), I got the following error:


I0313 17:21:15.287168 140503459874688 metadata_store.py:93] MetadataStore with DB connection initialized
I0313 17:21:15.307923 140503459874688 oss_trainer_lib.py:290] creating directory: /tmp/run_example/tuner-1/5
I0313 17:21:15.309459 140503459874688 oss_trainer_lib.py:337] Tuner id: tuner-1
I0313 17:21:15.310938 140503459874688 oss_trainer_lib.py:338] Training with the following hyperparameters: 
I0313 17:21:15.312628 140503459874688 oss_trainer_lib.py:339] {'learning_rate': 1.0536239272270075e-05, 'new_block_type': 'FULLY_CONNECTED_RESIDUAL_FORCE_MATCH_SHAPES', 'optimizer': 'adam', 'initial_architecture_0': 'FULLY_CONNECTED_RESIDUAL_CONCAT', 'exponential_decay_rate': 0.8225550864907855, 'exponential_decay_steps': 250, 'gradient_max_norm': 2, 'dropout_rate': 0.20000000596046447, 'initial_architecture': ['FULLY_CONNECTED_RESIDUAL_CONCAT']}
I0313 17:21:15.314630 140503459874688 run_config.py:550] TF_CONFIG environment variable: {'model_dir': '/tmp/run_example/tuner-1/5', 'session_master': ''}
I0313 17:21:15.316400 140503459874688 run_config.py:973] Using model_dir in TF_CONFIG: /tmp/run_example/tuner-1/5
I0313 17:21:15.318898 140503459874688 estimator.py:191] Using config: {'_model_dir': '/tmp/run_example/tuner-1/5', '_tf_random_seed': None, '_save_summary_steps': 2000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 120, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0313 17:21:15.373794 140503459874688 estimator.py:1169] Calling model_fn.
I0313 17:21:15.389836 140503459874688 controller.py:160] trial id: 5
I0313 17:21:15.391364 140503459874688 controller.py:239] intermix ensemble search mode
I0313 17:21:15.406418 140503459874688 phoenix.py:371] {'prior_generator': GeneratorWithTrials(instance=<model_search.generators.prior_generator.PriorGenerator object at 0x7fc8eed807d0>, relevant_trials=[])}
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-17-bfaafc6a9348> in <module>()
      6     batch_size=32,
      7     experiment_name="example",
----> 8     experiment_owner="model_search_user")

11 frames
/content/model_search/model_search/generators/prior_generator.py in _nonadaptive_ensemble(self, features, input_layer_fn, shared_input_tensor, shared_lengths, logits_dimension, relevant_trials, is_training, num_trials_to_consider, width, my_model_dir)
     64     if not best_trials:
     65       raise tf.errors.FailedPreconditionError(
---> 66           None, None, "No completed trials to perform ensembling.")
     67 
     68     if len(best_trials) < width:

FailedPreconditionError: No completed trials to perform ensembling.

Any clue?

Does the model_search support tf1.x?

model_search not available yet on pip

When is it going to be available as a pip package?

Thanks a million!

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 100: invalid continuation byte

I run the demo test.py:

import model_search
from model_search import constants
from model_search import single_trainer
from model_search.data import csv_data

trainer = single_trainer.SingleTrainer(
data=csv_data.Provider(
label_index=0,
logits_dimension=2,
record_defaults=[0, 0, 0, 0],
filename="model_search/data/testdata/csv_random_data.csv"),
spec=constants.DEFAULT_DNN)

trainer.try_models(
number_models=200,
train_steps=1000,
eval_steps=100,
root_dir="/tmp/run_example",
batch_size=32,
experiment_name="example",
experiment_owner="model_search_user")

but there is an error and I can't find the reason. The error informations is:

(gpu) E:\src\model_search>python test.py
2021-02-23 16:24:06.170523: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-02-23 16:24:06.175639: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "test.py", line 12, in
spec=constants.DEFAULT_DNN)
File "E:\src\model_search\model_search\single_trainer.py", line 33, in init
text_format.Parse(f.read(), self._spec)
File "d:\Documents\Application Data\Python\Python37\site-packages\tensorflow\python\lib\io\file_io.py", line 116, in read
self._preread_check()
File "d:\Documents\Application Data\Python\Python37\site-packages\tensorflow\python\lib\io\file_io.py", line 79, in _preread_check
self.__name, 1024 * 512)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 100: invalid continuation byte

InvalidArgumentError: Field 1 in record is not a valid int32

I was able to get the code running with the test data supplied with the module. Now, I am testing out the code with a custom dataset. However, the dataset isn't just composed of int32 entries but also has float entries. This is probably causing the following error:
InvalidArgumentError: Field 1 in record is not a valid int32: 1.0000000007063299
Could you help me fix it? Is there a keyword in the csvdata module that will accommodate for floating point numbers in the inputs?

The full error trace is below:

`InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1374 try:
-> 1375 return fn(*args)
1376 except errors.OpError as e:

20 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1359 return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1360 target_list, run_metadata)
1361

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1452 fetch_list, target_list,
-> 1453 run_metadata)
1454

InvalidArgumentError: Field 1 in record is not a valid int32: 1.0000000007063299
[[{{node IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

InvalidArgumentError Traceback (most recent call last)
in ()
31 batch_size=32,
32 experiment_name="example",
---> 33 experiment_owner="model_search_user")

/content/drive/My Drive/oqmd_structures/sony/search/model_search/single_trainer.py in try_models(self, number_models, train_steps, eval_steps, root_dir, batch_size, experiment_name, experiment_owner)
84 train_steps=train_steps,
85 eval_steps=eval_steps,
---> 86 batch_size=batch_size):
87 pass

/content/drive/My Drive/oqmd_structures/sony/search/model_search/oss_trainer_lib.py in run_parameterized_train_and_eval(phoenix_instance, oracle, tuner_id, root_dir, max_trials, data_provider, train_steps, eval_steps, batch_size)
338 train_steps=train_steps,
339 eval_steps=eval_steps,
--> 340 batch_size=batch_size)
341
342 oracle.update_trial(

/content/drive/My Drive/oqmd_structures/sony/search/model_search/oss_trainer_lib.py in run_train_and_eval(hparams, model_dir, phoenix_instance, data_provider, train_steps, eval_steps, batch_size)
240 mode=tf.estimator.ModeKeys.TRAIN,
241 batch_size=batch_size),
--> 242 max_steps=train_steps)
243 tf.compat.v1.reset_default_graph()
244 tf.keras.backend.clear_session()

/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py in train(self, input_fn, hooks, steps, max_steps, saving_listeners)
347
348 saving_listeners = _check_listeners_type(saving_listeners)
--> 349 loss = self._train_model(input_fn, hooks, saving_listeners)
350 logging.info('Loss for final step: %s.', loss)
351 return self

/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py in _train_model(self, input_fn, hooks, saving_listeners)
1173 return self._train_model_distributed(input_fn, hooks, saving_listeners)
1174 else:
-> 1175 return self._train_model_default(input_fn, hooks, saving_listeners)
1176
1177 def _train_model_default(self, input_fn, hooks, saving_listeners):

/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py in _train_model_default(self, input_fn, hooks, saving_listeners)
1206 return self._train_with_estimator_spec(estimator_spec, worker_hooks,
1207 hooks, global_step_tensor,
-> 1208 saving_listeners)
1209
1210 def _train_model_distributed(self, input_fn, hooks, saving_listeners):

/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py in _train_with_estimator_spec(self, estimator_spec, worker_hooks, hooks, global_step_tensor, saving_listeners)
1512 any_step_done = False
1513 while not mon_sess.should_stop():
-> 1514 _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
1515 any_step_done = True
1516 if not any_step_done:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/monitored_session.py in run(self, fetches, feed_dict, options, run_metadata)
776 feed_dict=feed_dict,
777 options=options,
--> 778 run_metadata=run_metadata)
779
780 def run_step_fn(self, step_fn):

/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/monitored_session.py in run(self, fetches, feed_dict, options, run_metadata)
1281 feed_dict=feed_dict,
1282 options=options,
-> 1283 run_metadata=run_metadata)
1284 except _PREEMPTION_ERRORS as e:
1285 logging.info(

/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/monitored_session.py in run(self, *args, **kwargs)
1382 raise six.reraise(*original_exc_info)
1383 else:
-> 1384 raise six.reraise(*original_exc_info)
1385
1386

/usr/local/lib/python3.7/dist-packages/six.py in reraise(tp, value, tb)
701 if value.traceback is not tb:
702 raise value.with_traceback(tb)
--> 703 raise value
704 finally:
705 value = None

/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/monitored_session.py in run(self, *args, **kwargs)
1367 def run(self, *args, **kwargs):
1368 try:
-> 1369 return self._sess.run(*args, **kwargs)
1370 except _PREEMPTION_ERRORS:
1371 raise

/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/monitored_session.py in run(self, fetches, feed_dict, options, run_metadata)
1440 feed_dict=feed_dict,
1441 options=options,
-> 1442 run_metadata=run_metadata)
1443
1444 for hook in self._hooks:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/monitored_session.py in run(self, *args, **kwargs)
1198
1199 def run(self, *args, **kwargs):
-> 1200 return self._sess.run(*args, **kwargs)
1201
1202 def run_step_fn(self, step_fn, raw_session, run_with_hooks):

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
966 try:
967 result = self._run(None, fetches, feed_dict, options_ptr,
--> 968 run_metadata_ptr)
969 if run_metadata:
970 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1189 if final_fetches or final_targets or (handle and feed_dict_tensor):
1190 results = self._do_run(handle, final_targets, final_fetches,
-> 1191 feed_dict_tensor, options, run_metadata)
1192 else:
1193 results = []

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1367 if handle is None:
1368 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369 run_metadata)
1370 else:
1371 return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1392 '\nsession_config.graph_options.rewrite_options.'
1393 'disable_meta_optimizer = True')
-> 1394 raise type(e)(node_def, op, message)
1395
1396 def _extend_graph(self):

InvalidArgumentError: Field 1 in record is not a valid int32: 1.0000000007063299
[[node IteratorGetNext (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/util.py:61) ]]

Errors may have originated from an input operation.
Input Source operations connected to node IteratorGetNext:
IteratorV2 (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/util.py:59)

Original stack trace for 'IteratorGetNext':
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in
app.launch_new_instance()
File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 845, in launch_instance
app.start()
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 122, in _handle_events
handler_func(fileobj, events)
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 451, in _handle_events
self._handle_recv()
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py", line 434, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2828, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 33, in
experiment_owner="model_search_user")
File "search/model_search/single_trainer.py", line 86, in try_models
batch_size=batch_size):
File "search/model_search/oss_trainer_lib.py", line 340, in run_parameterized_train_and_eval
batch_size=batch_size)
File "search/model_search/oss_trainer_lib.py", line 242, in run_train_and_eval
max_steps=train_steps)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1201, in _train_model_default
self._get_features_and_labels_from_input_fn(input_fn, ModeKeys.TRAIN))
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1037, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/util.py", line 61, in parse_input_fn_result
result = iterator.get_next()
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 419, in get_next
name=name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2601, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 3536, in _create_op_internal
op_def=op_def)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 1990, in init
self._traceback = tf_stack.extract_stack()`

sqlite default filename is not Windows compatible

File model_search/metadata/ml_metadata_db.py, lines 102-103 are:

        self._connection_config.sqlite.filename_uri = (
            "/tmp/filedb-%d" % random.randint(0, 1000000))

/tmp/... is of course not a valid path for Windows.
Can you update this path such that it chooses a valid path when running on Windows?

Is there anyway we can use model_search for object detection also ?

Predicitions/Inference using the searched model

I tried to utilize this framework to find the best possible model architecture for a multi class classsification problem. As a result the respective files were created. eg. under /tmp/run_example/tuner-1/1/. The type of files include a checkpoint file, graph.pbtxt, replay_config.pbtxt along with .index, .meta and .data-000000-of-00001 files.

Given that the framework does not store the searched model in a .h5 or SavedModel format, I wanted to know how I could utilize the produced files and load the searched model / restore it, such that I can use it to make predicitions / inference ?

I would really appreciate any help / suggestions in the right direction !
Cheers.

hi ,where to find "ImportError: cannot import name 'hparam_pb2' from 'model_search.proto' (unknown location)"

AttributeError: module 'ml_metadata.metadata_store' has no attribute 'MetadataStore'

I encountered this error when I run the example on my jupyter notebook: AttributeError: module 'ml_metadata.metadata_store' has no attribute 'MetadataStore'. Appreciate any advice on how to fix this issue. Thank you.

tensorflow.python.framework.errors_impl.NotFoundError: model_search/model_search/configs/dnn_config.pbtxt; No such file or directory

I get the error when I run the code snippet as follows

absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --mlmd_default_sqllite_filename before flags were parsed

I get the error when I run the code snippet from the project page:

Traceback (most recent call last): File "btc.py", line 14, in <module> trainer.try_models( File "/home/lasse/Development/projects/btcpred/model_search/model_search/single_trainer.py", line 56, in try_models phoenix_instance = phoenix.Phoenix( File "/home/lasse/Development/projects/btcpred/model_search/model_search/phoenix.py", line 239, in __init__ self._metadata = ml_metadata_db.MLMetaData(phoenix_spec, study_name, File "/home/lasse/Development/projects/btcpred/model_search/model_search/metadata/ml_metadata_db.py", line 84, in __init__ if FLAGS.mlmd_default_sqllite_filename: File "/home/lasse/Development/projects/btcpred/.env/lib/python3.8/site-packages/absl/flags/_flagvalues.py", line 498, in __getattr__ raise _exceptions.UnparsedFlagAccessError(error_message) absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --mlmd_default_sqllite_filename before flags were parsed.

What does "experiment_owner" mean? I don’t understand how to fill in this parameter???

error:UnparsedFlagAccessError: Trying to access flag --mlmd_default_sqllite_filename before flags were parsed.

How do I train my own data for image classification by this code?

Suggestion: Would you please add sample code on scoring a new dataset using the best (or selected) model

Suggestion: Would you please add sample code snippet on scoring a new dataset using the best (or selected) model

RuntimeError: Error when executing query: file is not a database query

File "C:...\lib\site-packages\ml_metadata\metadata_store\metadata_store.py", line 92, in init
self._metadata_store = metadata_store_serialized.CreateMetadataStore(
RuntimeError: Error when executing query: file is not a database query: CREATE TABLE IF NOT EXISTS Type ( id INTEGER PRIMARY KEY AUTOINCREMENT, name VARCHAR(255) NOT NULL, version VARCHAR(255), type_kind TINYINT(1) NOT NULL, description TEXT, input_type TEXT, output_type TEXT );

Possible beta for Regression Modeling?

The current task that I'm trying to run requires regression modeling, and I'm wondering if there is any progress for creating a model for regression analysis? For example, if there is some beta code or something similar that I could use, it would be greatly appreciated.

I'm confused by distributed search, any suggestions are welcomed!

Any detailed method for single-machine multi-card distributed search?

ImportError: cannot import name 'hparam_pb2' from 'model_search.proto' (unknown location)

from model_search import oss_trainer_lib
Traceback (most recent call last):
File "", line 1, in
File "E:\src\model_search\model_search\oss_trainer_lib.py", line 28, in
from model_search import hparam as hp
File "E:\src\model_search\model_search\hparam.py", line 26, in
from model_search.proto import hparam_pb2
ImportError: cannot import name 'hparam_pb2' from 'model_search.proto' (unknown location)

Model Architecture Output?

I have just finished a test run, and obtained a new directory containing the results from 200 separate models. For a given model, I noticed there is a "saved_model.pb" file, but I get an error every time I try to load it into Python. All I would like to do is see the model layers, similar to the output of a typical model.summary( ) in Keras:

Is it possible to see this output with the models that have been saved?

UnicodedecodeErroe

Hi Guys, Im facing this error when trying to reproduce the example provided by you, I'm using the testdata provided here as well...
Anyone knows what could be happening?

Thank you very much!

Is it feasible to use MirroredStrategy for training?

Thank you for making this work open source, and let us glimpse some of Google's achievements on the NAS platform. , I have successfully run feature data, binary and multi-classes image network search after checking materials and issues.

The current problem: model_search is only trained on a single GPU card, and the computational efficiency is not satisfactory. Is there a way to modify model_search (via distributing configs?) for multi-GPU card synchronous/asynchronous training?

Training on Multiclass Image Dataset

Is there already a way to use the code for multiclass image datasets? Documentation shows only for binary image datasets. Tried changing the "label_mode" variable in image_data.py to "categorical" and change the return value of the "number_of_classes" function to the number of classes. Still an error.

Documentation for Utilizing Output of Model Search?

Is there anywhere where we can find additional documentation? I've run the Google Model Search and obtained a saved_model.pb, but I'm not sure how to load this back into Python.

Here is some code that I've found for converting the output, but it doesn't seem quite right. So, documentation for how to actually implement this would be appreciated.

UnparsedFlagAccessError

Hi - What could be causing this?

UnparsedFlagAccessError Traceback (most recent call last)
in ()
14 batch_size=32,
15 experiment_name="animalfaces",
---> 16 experiment_owner="myname")

3 frames
/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py in getattr(self, name)
496 # get too much noise.
497 logging.error(error_message)
--> 498 raise _exceptions.UnparsedFlagAccessError(error_message)
499
500 def setattr(self, name, value):

UnparsedFlagAccessError: Trying to access flag --mlmd_default_sqllite_filename before flags were parsed.

How to know which model is the best and what is its accuracy?

I have trained 200 models using trainer.try_models() but now I need to know which model performed the best and its accuracy?

Anyone knows how to get the details such as: what evaluation score was used? what is the best model? and how can we use it later for prediction?

Thank you.

Issue

hparam_pb2 missing

There is no hparam_pb2 under proto.

Here is the full trace:

ImportError                               Traceback (most recent call last)
<ipython-input-8-14ac9b22f6ac> in <module>
      1 import model_search
      2 from model_search import constants
----> 3 from model_search import single_trainer
      4 from model_search.data import csv_data
      5 

~/Projects/model_search/model_search/single_trainer.py in <module>
     15 
     16 import kerastuner
---> 17 from model_search import oss_trainer_lib
     18 from model_search import phoenix
     19 from model_search.proto import phoenix_spec_pb2

~/Projects/model_search/model_search/oss_trainer_lib.py in <module>
     26 import kerastuner
     27 
---> 28 from model_search import hparam as hp
     29 from model_search import phoenix
     30 from model_search import registry

~/Projects/model_search/model_search/hparam.py in <module>
     24 import re
     25 
---> 26 from model_search.proto import hparam_pb2
     27 import six
     28 

ImportError: cannot import name 'hparam_pb2' from 'model_search.proto' (unknown location)```

Neural Network Architecture Details

I ran the default setting on my dataset and I am trying to figure out what the architecture of the model looks like (i.e how many layers, its size) and what Hyperparameter values did it choose. Where could I find this information?

evaluation step > 1 breaks the training loop, killed jupyter kernel

Time series-structured data

Is it able to perform training and evaluation with structured multi dimension data ( like time-series data, image data, etc.) that cannot be represented with a csv?

Frustrated when using

I am trying to use it to do regression tasks but found it only for cls. After trying to revise the codes to apply it on reg tasks, I gave up. The codes are too hard to understand when you are moving from a strange class to another strange class... Any ideas about (1) how to revise it to apply it reg tasks; (2) how to understand the phoenix class?

KeyError: 0

I'm trying to run the example provided by the tool in the README file. After many fixed error, I have found this one and I have no solution.

The error

Traceback (most recent call last):
  File "d:/Tesi_Magistrale/google_model_search/test.py", line 14, in <module>
    trainer.try_models(
  File "d:\Tesi_Magistrale\google_model_search\model_search\single_trainer.py", line 56, in try_models
    phoenix_instance = phoenix.Phoenix(
  File "d:\Tesi_Magistrale\google_model_search\model_search\phoenix.py", line 244, in __init__
    self._controller = controller.InProcessController(
  File "d:\Tesi_Magistrale\google_model_search\model_search\controller.py", line 147, in __init__
    self._search_candidate_generator = SearchCandidateGenerator(
  File "d:\Tesi_Magistrale\google_model_search\model_search\generators\search_candidate_generator.py", line 57, in __init__
    self._search_algorithm = search_algorithms[phoenix_spec.search_type]
KeyError: 0

The code I'm running

import model_search
from model_search import constants
from model_search import single_trainer
from model_search.data import csv_data

trainer = single_trainer.SingleTrainer(
    data=csv_data.Provider(
    label_index=0,
    logits_dimension=2,
    record_defaults=[0, 0, 0, 0],
    filename="model_search/data/testdata/csv_random_data.csv"),
    spec=constants.DEFAULT_DNN)

trainer.try_models(
    number_models=200,
    train_steps=1000,
    eval_steps=100,
    root_dir="/tmp/run_example",
    batch_size=32,
    experiment_name="example",
    experiment_owner="model_search_user")

I have no idea on what I need to check for the resolution.

setup.py

Hello,
It setup.py missing, or I'm missing something?

Thanks,
Vic

google / model_search Goto Github PK

model_search's Issues

Recommend Projects

Recommend Topics

Recommend Org