Git Product home page Git Product logo

lambeq's People

Contributors

ace07-sev avatar dimkart avatar gopal-dahale avatar ianyfan avatar kentaroaoki avatar kinianlo avatar le-big-mac avatar nikhilkhatri avatar shiro-raven avatar thommy257 avatar wingcode avatar y-richie-y avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lambeq's Issues

Method or class for composing more than 1 free wires into one

Description

Many people who want to use a syntax-based model such as DisCoCat are confused from the fact that the parser can return in some cases more than one output wires, e.g. for an imperative sentence it would return n.r s, as below:

    Do      your  homework                                                                                           
─────────  ─────  ────────
n.r·s·n.l  n·n.l     n
 │  │  ╰───╯  ╰──────╯

This is actually not a bug but the expected behaviour, since e.g. in cases like the above the subject of the sentence is considered missing. However, this issue makes impossible for the user to apply a unified treatment to all sentences in their dataset, e.g. during the training (see for example Discussion #88).

A simple solution to this would be to write a method or, preferably, a new class (e.g. DiagramRewriter) that goes through the generated diagrams and adds a box that merges any output wires, more than 1 in number, into a single wire. In this way, all sentence states would live into the same space, allowing easy training and post-processing. For the above example, we would expect the following outcome:

image

Notes

  • You can start by adding an abstract class DiagramRewriter with a basic interface, and subclass to create a MergeWiresRewriter.
  • The rewriter should be generic enough to apply to all diagrams with more than 1 free wires (not just a specific type as codomain).
  • Please accompany your code with meaningful tests.
  • Note that usual lambeq's rewrite rules apply on the box level, not the diagram level, so there is no straightforward way to use them for this task.

See also

Numpy int32 error

Hi I ran the Quantum_trainer, and been getting this and can't seem to find where the issue is.

Traceback (most recent call last):
File "C:\Users\elmm\Desktop\CQM\QNLP Depression.py", line 103, in
trainer.fit(train_dataset, val_dataset, logging_step=12)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\trainer.py", line 365, in fit
y_hat, loss = self.training_step(batch)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\quantum_trainer.py", line 149, in training_step
y_hat, loss = self.optimizer.backward(batch)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\spsa_optimizer.py", line 125, in backward
y0 = self.model(diagrams)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\model.py", line 59, in call
return self.forward(*args, **kwds)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\tket_model.py", line 131, in forward
return self.get_diagram_output(x)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\tket_model.py", line 103, in get_diagram_output
seed=self._randint()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\tket_model.py", line 71, in _randint
return np.random.randint(low, high)
File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1334, in numpy.random._bounded_integers._rand_int32
ValueError: low is out of bounds for int32

Bobcat parser crashes on English contractions

Hi!

When passing tokenised data containing English contractions, the parser crashes. Passing non-tokenised data seems wrong as the parser does not perform tokenisation internally (all punctuation gets attached to words, contractions are attached to the verb).

E.g.:

from lambeq import BobcatParser
bobcat_parser = BobcatParser()
diagram = bobcat_parser.sentence2diagram("Baby didn 't like it")
diagram.draw()

results in:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File .../site-packages/lambeq/text2diagram/bobcat_parser.py:382, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    381     result = self.parser(sentence_input)
--> 382     trees.append(self._build_ccgtree(result[0]))
    383 except Exception:
File .../site-packages/lambeq/bobcat/parser.py:258, in ParseResult.__getitem__(self, index)
    256 def __getitem__(self, index: Union[int, slice]) -> Union[ParseTree,
    257                                                          list[ParseTree]]:
--> 258     return self.root[index]

IndexError: list index out of range

During handling of the above exception, another exception occurred:

BobcatParseError                          Traceback (most recent call last)
Cell In[2], line 1
----> 1 diagram = bobcat_parser.sentence2diagram("Baby didn 't like it")
      2 diagram.draw()

File .../site-packages/lambeq/text2diagram/ccg_parser.py:231, in CCGParser.sentence2diagram(self, sentence, tokenised, planar, suppress_exceptions)
    228 if not isinstance(sentence, str):
    229     raise ValueError('`tokenised` set to `False`, but variable '
    230                      '`sentence` does not have type `str`.')
--> 231 return self.sentences2diagrams(
    232                 [sentence],
    233                 planar=planar,
    234                 suppress_exceptions=suppress_exceptions,
    235                 tokenised=tokenised,
    236                 verbose=VerbosityLevel.SUPPRESS.value)[0]

File .../site-packages/lambeq/text2diagram/ccg_parser.py:161, in CCGParser.sentences2diagrams(self, sentences, tokenised, planar, suppress_exceptions, verbose)
    125 def sentences2diagrams(
    126         self,
    127         sentences: SentenceBatchType,
   (...)
    130         suppress_exceptions: bool = False,
    131         verbose: Optional[str] = None) -> list[Optional[Diagram]]:
    132     """Parse multiple sentences into a list of discopy diagrams.
    133 
    134     Parameters
   (...)
    159 
    160     """
--> 161     trees = self.sentences2trees(sentences,
    162                                  suppress_exceptions=suppress_exceptions,
    163                                  tokenised=tokenised,
    164                                  verbose=verbose)
    165     diagrams = []
    166     if verbose is None:

File .../site-packages/lambeq/text2diagram/bobcat_parser.py:387, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    385                 trees.append(None)
    386             else:
--> 387                 raise BobcatParseError(' '.join(sent.words))
    389 for i in empty_indices:
    390     trees.insert(i, None)

BobcatParseError: Bobcat failed to parse "Baby didn 't like it".

No module named BobcatParser

I'm trying to run the following code from your tutorials website, but I am unable to install BobcatParser. I am working in a Colab environment. Is there a dependency that I may be missing?

from lambeq import BobcatParser

parser = BobcatParser(root_cats=('NP', 'N'), verbose='text')

raw_train_diagrams = parser.sentences2diagrams(train_data, suppress_exceptions=True)
raw_val_diagrams = parser.sentences2diagrams(val_data, suppress_exceptions=True) 

Feature Request : Default SpaCy and UNK tokenization enabled for the diagrams.

Given tokenization will be present in all use-cases of NLP models, it would be efficient to have it set to True and enabled by default, as well as a SpaCy tokenizer since it would provide a more generalized model (used for indicating words such as he's and he is, they're and they are, and I'm and I am are the same for the model). These tokenizers are used in all use-cases hence it would provide a more efficient and enjoyable experience when using the models given having them built-in.

Furthermore, tokenizers for non ascii letters, numbers, and acronyms (such as idk,tbh,rn etc.) would pose as additional tokenization features, which can be used as additional bool params inside the parser.

Normalisation of quantum state in QuantumModel

Hi,

I've noticed that the normalisation of the quantum state done in the following lines of the QuantumModel._normalise_vector function

predictions = backend.abs(predictions) + self.SMOOTHING
return predictions / predictions.sum()

seems to be missing a square along with the absolute value, i.e. it should be something like

predictions = backend.abs(predictions)**2 + self.SMOOTHING
return predictions / predictions.sum()

so that the resulting probabilities match those one would get from measuring the quantum state in the specified basis.

Error with Bobcat Parser

Hi,

I am having a relatively difficult to understand error. I am trying to look at the sentence "Senate Advances Bill To Approve Keystone Pipeline Despite Obamas Veto Threat", and can't figure out why it is giving me the error below. Does BobCatParser throw an error if it doesn't recognize a word? I am unclear what I would need to do to fix the sentence.

from lambeq import BobcatParser

parser = BobcatParser()
diagram = parser.sentence2diagram('Senate Advances Bill To Approve Keystone Pipeline Despite Obamas Veto Threat')
diagram.draw()

raceback (most recent call last):

File "/Users/dabeaulieu/opt/anaconda3/envs/qcware/lib/python3.9/site-packages/lambeq/text2diagram/bobcat_parser.py", line 382, in sentences2trees
trees.append(self._build_ccgtree(result[0]))

File "/Users/dabeaulieu/opt/anaconda3/envs/qcware/lib/python3.9/site-packages/lambeq/bobcat/parser.py", line 258, in getitem
return self.root[index]

IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/Users/dabeaulieu/opt/anaconda3/envs/qcware/lib/python3.9/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)

File "/Users/dabeaulieu/Documents/Initiatives/quantum/machine learning/notebooks/qnlp/ankush/Quantum_NLP/testcode.py", line 12, in
diagram = parser.sentence2diagram('Senate Advances Bill To Approve Keystone Pipeline Despite Obamas Veto Threat')

File "/Users/dabeaulieu/opt/anaconda3/envs/qcware/lib/python3.9/site-packages/lambeq/text2diagram/ccg_parser.py", line 231, in sentence2diagram
return self.sentences2diagrams(

File "/Users/dabeaulieu/opt/anaconda3/envs/qcware/lib/python3.9/site-packages/lambeq/text2diagram/ccg_parser.py", line 161, in sentences2diagrams
trees = self.sentences2trees(sentences,

File "/Users/dabeaulieu/opt/anaconda3/envs/qcware/lib/python3.9/site-packages/lambeq/text2diagram/bobcat_parser.py", line 387, in sentences2trees
raise BobcatParseError(' '.join(sent.words))

BobcatParseError: Bobcat failed to parse 'Senate Advances Bill To Approve Keystone Pipeline Despite Obamas Veto Threat'.

KeyError : am__n.r@[email protected]@n_3

Traceback (most recent call last):
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\numpy_model.py", line 154, in get_diagram_output
values.append(parameters[sym])
KeyError: am__n.r@[email protected]@n_3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\elmm\Desktop\CQM\QNLP Depression 2.py", line 125, in
prediction = model.forward([diagram])
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\numpy_model.py", line 185, in forward
return self.get_diagram_output(x)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\numpy_model.py", line 156, in get_diagram_output
raise KeyError(f'Unknown symbol {sym!r}.')
KeyError: 'Unknown symbol am__n.r@[email protected]@n_3.'

I ran "I am depressed" and got this error. Any resolves?

NumpyModel not converging

Hi, I have an issue with the NumpyModel where it doesn't converge to a good accuracy and low loss, I thought maybe that is due to my dataset and tried the previously tested food and IT dataset, same issue. I thought maybe this is because of my code, so I reran the sample notebook for the food and IT, and still the same result. Any ideas on how to resolve this? Also regardless of the dataset and the model, it always starts with approx train loss and valid loss of 0.76-0.8 and accuracy of around 40 percent, and just wobbles back and forth there.
Could you kindly assist on how I can resolve this issue? I have also posted my github code for the entire thing on my repo as well. Here is the link :
https://github.com/ACE07-Sev/QNLP

integration with constructive tagger

hey folks
would there be any interest in integrating the tagger currently used in bobcat with the constructive one we've recently cooked up in Utrecht? see arxiv preprint
the code is available here and I'd be happy to discuss and/or assist if desired

Lambeq Installation

ERROR: Cannot install lambeq[depccg]==0.1.0, lambeq[depccg]==0.1.1, lambeq[depccg]==0.1.2 because these package versions have conflicting dependencies. How to solve this?

Add more tutorials and example notebooks in the documentation

Description

In an open-source project like lambeq, you can never have too many tutorials or example notebooks. This open-ended task asks for contributions on this matter. Any meaningful tutorial is welcome, but some concrete suggestions are also provided below:

  • Write a tutorial for a multi-class classification task.
  • Write a tutorial for a regression task.
  • Write a tutorial that explains basic quantum concepts such as qubits, gates, entanglement etc using lambeq circuits or string diagrams.

Notes

  • A good option for writing your training tutorials is the PennyLane model, since it's quite flexible to cover most of the important use cases.
  • lambeq documentation is generated with Sphinx, and is written in restructured text (RST) format.
  • lambeq tutorials that contain code are Jupyter notebooks, whose text cells are written in RST format so they can be integrated easily with the rest of the documentation. To mark a cell as RST in Jupyter notebook's web editor, do the following:
    1. Display the raw cell format toolbar by selecting View > Cell Toolbar > Raw cell format
    2. Select the cell and and then the option Cell > Cell type > Raw NBConvert
    3. Select from cell's "Raw NBConvert Format" menu the option "reST".

See also:

UnitaryHACK: Replace unknown words in diagrams with `UNK` token

Task description

One of the most common challenges in NLP is the handling of unknown words, or out-of-vocabulary (OOV) words. The term refers to words that may appear during evaluation and testing, but they were not present in the training data of the model. A common technique to handle unknown word is to introduce a special token UNK. In the simplest possible case, a way to do that is the following:

  1. Replace every rare word in the training data (e.g. every word that occurs less than a specified threshold, for example 3 times) with a special token UNK.
  2. During training, learn a representation for UNK as if there was any other token.
  3. During evaluation, when you meet an unknown word, use the representation of UNK instead.

However, in the syntax-based models of lambeq (such as DisCoCat) this method would not work. This is because of two reasons:

  • The sentences are used as input to a parser, which has no way to recognise the special token UNK and assign to it the proper part-of-speech in each case.
  • In order for the produced diagram to be valid, each part of speech (in lambeq, defined by the pregroup type of each word) needs a different UNK token.

This task is about adding a feature in lambeq that handles unknown words. For the reasons explained before, in lambeq the unknown words need to be replaced after the diagrams have been generated. You will have to learn how to construct DisCoPy diagrams in lambeq and manipulate them with functors. The overall goal of this task is to develop a DisCoPy functor (pretty much similar to how a lambeq's RewriteRule is implemented) that takes a list of unknown words to be replaced with UNK, and that, when passed a diagram, replaces all the boxes containing an unknown word with an UNK box corresponding to the same pregroup type.

Notes

  • lambeq also contains compositional schemes that do not support syntax, such as the CupsReader and StairsReader. For these cases, the simple algorithm proposed above would suffice, and it could be applied directly at the sentence level. For this task, however, we are interested to provide handling of unknown words for the syntax-based models of lambeq, such as DisCoCat and TreeReader.
  • In lambeq's pipeline, the replacement should take place after the generation of the string diagrams and before the application of any rewrite rule or ansatz.
  • In case you are not familiar with functors, a less preferred way to implement this task is to create a function that processes a passed list of diagrams in a simple imperative way.

Resources

Some useful resources for this task can be found below:

Problem with trainer.fit(), operands of different shape

Hi,
I am trying to run the quantum trainer algorithm. When running the following line:

trainer.fit(train_dataset, val_dataset, evaluation_step=1, logging_step=100)

i get the following error:

ValueError                          Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 trainer.fit(train_dataset, val_dataset, evaluation_step=1, logging_step=100)

File c:\python38\lib\site-packages\lambeq\training\trainer.py:365, in Trainer.fit(self, train_dataset, val_dataset, evaluation_step, logging_step)
    363 step += 1
    364 x, y_label = batch
--> 365 y_hat, loss = self.training_step(batch)
    366 if (self.evaluate_on_train and
    367         self.evaluate_functions is not None):
    368     for metr, func in self.evaluate_functions.items():

File c:\python38\lib\site-packages\lambeq\training\quantum_trainer.py:149, in QuantumTrainer.training_step(self, batch)
    133 def training_step(
    134         self,
    135         batch: tuple[list[Any], np.ndarray]) -> tuple[np.ndarray, float]:
    136     """Perform a training step.
    137 
    138     Parameters
   (...)
    147 
    148     """
--> 149     y_hat, loss = self.optimizer.backward(batch)
    150     self.train_costs.append(loss)
    151     self.optimizer.step()

File c:\python38\lib\site-packages\lambeq\training\spsa_optimizer.py:126, in SPSAOptimizer.backward(self, batch)
    124 self.model.weights = xplus
    125 y0 = self.model(diagrams)
--> 126 loss0 = self.loss_fn(y0, targets)
    127 xminus = self.project(x - self.ck * delta)
    128 self.model.weights = xminus

Input In [13], in <lambda>(y_hat, y)
----> 1 loss = lambda y_hat, y: -np.sum(y * np.log(y_hat)) / len(y)  # binary cross-entropy loss
      3 acc = lambda y_hat, y: np.sum(np.round(y_hat) == y) / len(y) / 2  # half due to double-counting
      4 eval_metrics = {"acc": acc}

ValueError: operands could not be broadcast together with shapes (30,2) (30,)

I have just fixed the .py file in the lib following #12. The algorithm raised an error even before. I can't recall exactly, but i don't think it was the same error.

What can i do to solve this?
Thank you for your time.

accuracy performance

I'm trying to reproduce the example described in section 4 of the documentation, the classifier based on Lorentz et al., 2021 using their resources for training and testing.

I'm using pytket instead of IBMQ, but the final value is always much lower than the one described in the documentation (around 0.5 compared to 0.9 expected).

How is this behavior possible if the settings and resources are the same?

An Error while running Classical Pipeline Example given in docs/examples

Hi @dimkart I hope you are doing well

I am trying to run the code given here on my Google Colab account - https://github.com/CQCL/lambeq/blob/main/docs/examples/classical_pipeline.ipynb

I am installing lambeq directly on Colab and it is picking up the latest version of DisCoPy

But I am continuously getting an error like this. I have pasted the full stack trace here -

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-84634b74856a> in <module>()
     39 dev_cost_fn, dev_costs, dev_accs = make_cost_fn(dev_pred_fn, dev_labels)
     40 
---> 41 result = train(train_cost_fn, x0, niter=20, callback=dev_cost_fn, optimizer_fn=torch.optim.AdamW, lr=0.1)

10 frames
<ipython-input-11-84634b74856a> in train(func, x0, niter, callback, optimizer_fn, lr)
      3     optimizer = optimizer_fn(x, lr=lr)
      4     for _ in range(niter):
----> 5         loss = func(x)
      6 
      7         optimizer.zero_grad()

<ipython-input-11-84634b74856a> in cost_fn(params, **kwargs)
     16 def make_cost_fn(pred_fn, labels):
     17     def cost_fn(params, **kwargs):
---> 18         predictions = pred_fn(params)
     19 
     20         logits = predictions[:, 1] - predictions[:, 0]

<ipython-input-10-dbb8534e3157> in predict(params)
      1 def make_pred_fn(circuits):
      2     def predict(params):
----> 3         return torch.stack([c.lambdify(*parameters)(*params).eval(contractor=tn.contractors.auto).array for c in circuits])
      4     return predict
      5 

<ipython-input-10-dbb8534e3157> in <listcomp>(.0)
      1 def make_pred_fn(circuits):
      2     def predict(params):
----> 3         return torch.stack([c.lambdify(*parameters)(*params).eval(contractor=tn.contractors.auto).array for c in circuits])
      4     return predict
      5 

/usr/local/lib/python3.7/dist-packages/discopy/tensor.py in eval(self, contractor)
    448         if contractor is None:
    449             return Functor(ob=lambda x: x, ar=lambda f: f.array)(self)
--> 450         array = contractor(*self.to_tn()).tensor
    451         return Tensor(self.dom, self.cod, array)
    452 

/usr/local/lib/python3.7/dist-packages/tensornetwork/contractors/opt_einsum_paths/path_contractors.py in auto(nodes, output_edge_order, memory_limit, ignore_edge_order)
    262         output_edge_order=output_edge_order,
    263         nbranch=1,
--> 264         ignore_edge_order=ignore_edge_order)
    265   return greedy(nodes, output_edge_order, memory_limit, ignore_edge_order)
    266 

/usr/local/lib/python3.7/dist-packages/tensornetwork/contractors/opt_einsum_paths/path_contractors.py in branch(nodes, output_edge_order, memory_limit, nbranch, ignore_edge_order)
    160   alg = functools.partial(
    161       opt_einsum.paths.branch, memory_limit=memory_limit, nbranch=nbranch)
--> 162   return base(nodes, alg, output_edge_order, ignore_edge_order)
    163 
    164 

/usr/local/lib/python3.7/dist-packages/tensornetwork/contractors/opt_einsum_paths/path_contractors.py in base(nodes, algorithm, output_edge_order, ignore_edge_order)
     86   path, nodes = utils.get_path(nodes_set, algorithm)
     87   for a, b in path:
---> 88     new_node = contract_between(nodes[a], nodes[b], allow_outer_product=True)
     89     nodes.append(new_node)
     90     nodes = utils.multi_remove(nodes, [a, b])

/usr/local/lib/python3.7/dist-packages/tensornetwork/network_components.py in contract_between(node1, node2, name, allow_outer_product, output_edge_order, axis_names)
   2083     axes1 = [axes1[i] for i in ind_sort]
   2084     axes2 = [axes2[i] for i in ind_sort]
-> 2085     new_tensor = backend.tensordot(node1.tensor, node2.tensor, [axes1, axes2])
   2086     new_node = Node(tensor=new_tensor, name=name, backend=backend)
   2087     # node1 and node2 get new edges in _remove_edges

/usr/local/lib/python3.7/dist-packages/tensornetwork/backends/pytorch/pytorch_backend.py in tensordot(self, a, b, axes)
     44   def tensordot(self, a: Tensor, b: Tensor,
     45                 axes: Union[int, Sequence[Sequence[int]]]) -> Tensor:
---> 46     return torchlib.tensordot(a, b, dims=axes)
     47 
     48   def reshape(self, tensor: Tensor, shape: Tensor) -> Tensor:

/usr/local/lib/python3.7/dist-packages/torch/functional.py in tensordot(a, b, dims, out)
   1032 
   1033     if out is None:
-> 1034         return _VF.tensordot(a, b, dims_a, dims_b)  # type: ignore[attr-defined]
   1035     else:
   1036         return _VF.tensordot(a, b, dims_a, dims_b, out=out)  # type: ignore[attr-defined]

RuntimeError: expected scalar type Float but found Double

I was able to successfully carry out experiments using the Quantum Pipeline code on Google Colab and did not faced any issues but for this one I am getting error. I have tried to fix the issue by converting variables or some function outputs to float() but I was unable to rectify this.

Can you please help me fix this issue?

Thank you so much!

<unk> token feature in the forward() function

Considering the necessity of the token for the never-seen before entities, how can I implement the token in the forward function to allow for the model to calculate probabilities for the instances which have unknown symbols. Based on my understanding and guide from one of the moderators I think it's supposed to be in the forward function.

Could you kindly assist me in implementing this?

Question : What does the Quantum_trainer output?

Hi,
I wish to use the Quantum_trainer to do a Depression Detection using a chatbot (the sentences would be the input for the QNLP module), and wish to then classify as whether the person has depression or not.

May I ask what is the input and what is the output in the sample trainer for the quantum module? Does it do binary classification or is it something I need to add as an additional layer?

Ty('P') Error

Traceback (most recent call last):
File "C:\Users\elmm\Desktop\CQM\QNLP Depression.py", line 141, in
train_circuits = [ansatz(diagram) for diagram in train_diagrams]
File "C:\Users\elmm\Desktop\CQM\QNLP Depression.py", line 141, in
train_circuits = [ansatz(diagram) for diagram in train_diagrams]
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\ansatz\circuit.py", line 59, in call
return self.functor(diagram)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\discopy\rigid.py", line 651, in call
return super().call(diagram)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\discopy\monoidal.py", line 900, in call
result = result >> id_l @ self(box) @ id_r
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\discopy\rigid.py", line 642, in call
return super().call(diagram)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\discopy\monoidal.py", line 894, in call
return super().call(diagram)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\discopy\cat.py", line 911, in call
return self.ar[arrow]
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\discopy\cat.py", line 961, in getitem
return self._func(box)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\ansatz\circuit.py", line 118, in _ar
dom, cod = self._ob(box.dom), self._ob(box.cod)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\ansatz\circuit.py", line 63, in _ob
return sum(self.ob_map[Ty(factor.name)] for factor in pg_type)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\ansatz\circuit.py", line 63, in
return sum(self.ob_map[Ty(factor.name)] for factor in pg_type)
KeyError: Ty('p')

How can I resolve this?

QuantumTrainer.fit(train_dataset) error - examples / quantum_pipeline.ipynb

For either the MC or RP dataset, with training and test data, calling fit(train_dataset, logging_step=12) generates the following error:


ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_34608\45519830.py in
...

C:\Anaconda3\lib\site-packages\lambeq\training\quantum_trainer.py in fit(self, train_dataset, val_dataset, evaluation_step, logging_step)
197 self.model._training = True
198
--> 199 super().fit(train_dataset, val_dataset, evaluation_step, logging_step)
200
201 self.model._training = False

...

C:\Anaconda3\lib\site-packages\discopy\quantum\tk.py in get_counts(self, backend, *others, **params)
131 if compilation is not None:
132 for circuit in (self, ) + others:
--> 133 compilation.apply(circuit)
134 handles = backend.process_circuits(
135 (self, ) + others, n_shots=n_shots, seed=seed)

ValueError: Non-unitary matrix passed to two_qubit_canonical

inference

I'm trying to run Quantum pipeline using JAX backend. In order to better show results (how each sentence was classified according to the two different categories) , it is available an example of code to realise the inference as in the classic NLP deep learning approaches (i.e. like in transformers-based approaches or similar)

Permission error when using checkpoint

Traceback (most recent call last):
File "C:\Users\elmm\Desktop\CQM\QNLP Depression 2.py", line 20, in
model = TketModel.from_checkpoint('C:/Users/elmm/Desktop/CQM/runs/Jun08_15-09-45_LT-ELMM-T-MY', backend_config=backend_config)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\quantum_model.py", line 106, in from_checkpoint
checkpoint = Checkpoint.from_file(checkpoint_path)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\training\checkpoint.py", line 110, in from_file
with open(path, 'rb') as ckp:
PermissionError: [Errno 13] Permission denied: 'C:/Users/elmm/Desktop/CQM/runs/Jun08_15-09-45_LT-ELMM-T-MY'

I am trying to use a trained model with the checkpoint. I may have done it wrongly, could I ask for some assistance?

Adjectival verbs in Japanese lambeq

Originally posted by @masakiowari in #24 (comment)

Hello! We are now working on Japanese QNLP by Lambeq following the installation in the description of PR #24. We found that the present version of depccg_jp and Lambeq cannot treat sentences in which an adjectival verb (Keiyo-Do-Shi) modifies a noun. We give a list of sentences in which Lambeq + depccg_jp cannot create any string diagram. E..g

感動的な映画を見る
曖昧な表現をする
静かな海を見る
健康な男性が歩く
親切な男性がいる
元気な男性が歩く
上品な表現をする
きれいな海を見る
健やかな男性が歩く
和やかな雰囲気を感じる
穏やかな笑顔を浮かべる
正直な男性がいる
有名な男性がいる
にぎやかな雰囲気を感じる
特別な表現をする
複雑な表現をする
まじめな男性がいる
下手な表現をする
便利な本を買う
朗らかな笑顔を浮かべる
幸せな笑顔を浮かべる
好きなスープを食べる
無理な計画を立てる
暇な男性がいる
必要な計画を立てる
邪魔なものをどかす
変な表現をする
自由な表現をする

We would like to know anyone who knows how to solve this problem.

By the way, this problem occurs when we use Lambeq ver. 0.2.6 and 0.3.1. We installed depccg_jp following the above instruction. Except for the sentence including adjectival verbs, depccg_jp + Lambeq works very well.

Feature Request : Dataset visualisation

Given the dynamic nature of NLP data points, it would be really useful if we could visualize a dataset based on how noisy and/or rugged the dataset is to provide a manner of countering caveats. Another approach would be to enable a regularization term to allow for a better fit to the dataset.

Issue when running the quantum training example

Hi ,

Ran into an issue when running the quantum trainer : "ValueError: low is out of bounds for int32"

stack trace:

File ~\AppData\Local\Continuum\anaconda3\envs\qnlp\lib\site-packages\lambeq\training\tket_model.py:103, in TketModel.get_diagram_output(self, diagrams)
93 raise ValueError('Weights and/or symbols not initialised. '
94 'Instantiate through '
95 'TketModel.from_diagrams() first, '
96 'then call initialise_weights(), or load '
97 'from pre-trained checkpoint.')
99 lambdified_diagrams = [self._make_lambda(d) for d in diagrams]
100 tensors = Circuit.eval(
101 *[diag_f(*self.weights) for diag_f in lambdified_diagrams],
102 **self.backend_config,
--> 103 seed=self._randint()
104 )
105 self.backend_config['backend'].empty_cache()
106 # discopy evals a single diagram into a single result
107 # and not a list of results

File ~\AppData\Local\Continuum\anaconda3\envs\qnlp\lib\site-packages\lambeq\training\tket_model.py:71, in TketModel._randint(self, low, high)
70 def _randint(self, low=-1 << 63, high=(1 << 63)-1):
---> 71 return np.random.randint(low, high)

Slow performance of trainer

Hi,
i am running the py quantum trainer example on the documentation (with AerBackend and TketModel), and the training time for each epoch is almost 1 minute. I would like to know how can i speed it up, what may be the most probable bottleneck.
I am using 100 samples for the training set and 30 for testing.
Thanks.

UnitaryHACK: Serialise ansatz and include in model

Task Description

lambeq's training pipeline, shows that QNLP model training does not occur on string diagrams themselves, but on the concrete quantum circuits or tensor-networks, which are created from the string diagrams using ansätze. Akin to classical ML training, the Model and Trainer classes (and subclasses) use multiple hyperparameters to adjust the training procedure.

We would like to treat the choice of ansatz as a training hyperparameter.

Since the choice of ansätze plays a significant role in training, it is something we would like to be able to adjust from Model automatically. Currently, a change in ansatz requires converting the diagrams to circuits / tensor-networks using the new anstaz, and instantiating a new Model.

Furthermore, the models have a checkpoint system that allows restoring the model from a previous training run. If we then wanted to use a different ansatz, it may not be possible since the model only stores the circuits, and not the original diagrams.

Completion of this task involves the following changes:

  1. Rewrite the interface of the trainers so that it has an ansatz attribute.
  2. Wherever a (public) method takes circuit or tensor-network input, rewrite it to take in a string diagram input instead. Then, use the ansatz to transform these into circuits. Relevant Model methods include from_diagrams and get_diagram_output.
  3. Rewrite checkpoint saving and loading code to use string diagrams, and to save the ansatz. This will require writing a method to serialise and deserialise ansatze.

Bonus: instead of passing the ansatz as the parameter, rewrite the interface so that we can pass in the ansatz class and the arguments for initialising the ansatz.

Notes

  • Read the training tutorial to learn about lambeq's training infrastructure.
  • lambeq uses circuits (discopy.quantum.Circuit) for quantum models, and tensor-networks (discopy.tensor.Diagram) for classical models. Code that is common to both cases uses discopy.tensor.Diagram as the common superclass, since quantum circuits are implemented as a subclass of tensor diagrams. These should be rewritten to use discopy.rigid.Diagram.

Resources

real life usage

I'm trying to replicate the binary classifier that is described in the documentation (step 4, complete use case). The accuracy is much lower than what is shown in the documentation, using the training and testing datasets provided by Lorenz et al., 2021 QNLP .

As suggested, I also tried to run the scripts proposed on their github, in particular rp_task_simulation.ipynb using google colab but I am getting the following error message back

UnfilteredStackTrace                      Traceback (most recent call last)

<ipython-input-20-0fff64bd88f1> in <module>()
      4     start = time()
----> 5     print('Cost: ', get_cost_jit(rand_unshaped_pars))
      6     print('Time taken for this iteration: ', time()-start)

32 frames

UnfilteredStackTrace: jax._src.errors.TracerArrayConversionError: The numpy.ndarray conversion method __array__() was called on the JAX Tracer object Traced<ShapedArray(float32[])>with<DynamicJaxprTrace(level=0/1)>
While tracing the function get_cost at <ipython-input-18-b448627ced96>:8 for jit, this concrete value was not available in Python because it depends on the value of the argument 'unshaped_params'.
See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.TracerArrayConversionError

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------


The above exception was the direct cause of the following exception:

TracerArrayConversionError                Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/discopy/quantum/gates.py in array(self)
    410     def array(self):
    411         half_theta = self.modules.pi * self.phase
--> 412         sin, cos = self.modules.sin(half_theta), self.modules.cos(half_theta)
    413         return Tensor.np.array([[cos, -1j * sin], [-1j * sin, cos]])
    414 

TracerArrayConversionError: The numpy.ndarray conversion method __array__() was called on the JAX Tracer object Traced<ShapedArray(float32[])>with<DynamicJaxprTrace(level=0/1)>
While tracing the function get_cost at <ipython-input-18-b448627ced96>:8 for jit, this concrete value was not available in Python because it depends on the value of the argument 'unshaped_params'.
See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.TracerArrayConversionError

I have already set the variable IMPORT_JAX = True in discopy's config.py. any suggestions on how to fix it?

Ansatz for performing amplitude encoding - Enhancement

Description

The choice of the feature map is perhaps the most important choice a QML engineer can make, as it will directly dictate the size of the circuit required. There are four common encoding algorithms :

  • Basis Encoding
  • Angle Encoding
  • Amplitude Encoding
  • Arbitrary Encoding

There is a wide array of literature on each of the approaches mentioned, with each having its own list of algorithms. However, as a rule of thumb, amplitude encoders are more appropriate for large models given their logarithmic scaling in terms of number of qubits with respect to number of features to be embedded. Currently, Lambeq is using IQPAnsatz based on IQP Encoding which similarly to Angle encoding requires N qubits to encode N features. For now, we will ignore the depth scaling.

The challenge is to create a new encoder class called AmplitudeAnsatz based on amplitude encoding, with the motivation of the efficiency of amplitude encoders for high dimensional data.

Notes

  • You can start by creating an ansatz that does amplitude encoding (i.e., RFV, FRQI, etc.)
  • The amplitude encoder must take into consideration how Lambeq currently performs the feature embedding (i.e., using qubit count dedicated to each AtomicType)
  • The implementation must take into consideration how Model is trained given the embedded circuits (going back to what Model is exactly)

See Also

https://cqcl.github.io/lambeq/tutorials/extend-lambeq.html#Creating-ans%C3%A4tze
https://arxiv.org/pdf/1804.11326.pdf
https://docs.pennylane.ai/en/stable/code/api/pennylane.IQPEmbedding.html
https://qiskit.org/documentation/stubs/qiskit.circuit.library.IQP.html
https://qiskit.org/ecosystem/machine-learning/tutorials/01_neural_networks.html

[Question] What's the purpose of the classical parametrization and classical training ?

Hello, I was wondering what is the purpose of including the classical training and parametrization facilities in a quantum natural language processing library.

I'm not sure how to properly elaborate more on this question. The only thing that comes to my mind is that having a classical equivalent of those two steps of the pipeline is useful to properly compare the result, but I'm fairly sure I'm missing something.

P.s. I suggest enabling "Discussions" from the repository's settings so that people can post there their questions instead of opening an issue.

Feature request: support for texts with multiple sentences

Many NLP tasks require the handling of texts with multiple sentences, with each text as a unit and the text meaning formed from sentence meanings in the text. An example is the sentiment classification of movie reviews, each of which is expressed as a text with multiple sentences.

Bobcat fails with extra space tokens

This happens only when tokenising, not with raw strings:

sentence = "Alice  sleeps"

from lambeq import SpacyTokeniser, BobcatParser
tokeniser, parser = SpacyTokeniser(), BobcatParser()
tokens = tokeniser.tokenise_sentence(sentence)
tree = parser.sentence2tree(tokens, tokenised=True)

I get the following error:

ValueError                                Traceback (most recent call last)
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:291, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    290 try:
--> 291     sentence_input = self._prepare_sentence(sent, tags)
    292     result = self.parser(sentence_input)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:214, in BobcatParser._prepare_sentence(sent, tags)
    212 spans = {(start, end): {id: score for id, score in scores}
    213          for start, end, scores in sent.spans}
--> 214 return Sentence(sent.words, sent_tags, spans)

File <string>:6, in __init__(self, words, input_supertags, span_scores)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/bobcat/parser.py:62, in Sentence.__post_init__(self)
     61 if len(self.words) != len(self.input_supertags):
---> 62     raise ValueError(
     63             '`words` must be the same length as `input_supertags`')

ValueError: `words` must be the same length as `input_supertags`

The above exception was the direct cause of the following exception:

BobcatParseError                          Traceback (most recent call last)
Cell In[17], line 1
----> 1 tree = parser.sentence2tree(tokens, tokenised=True)

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/ccg_parser.py:108, in CCGParser.sentence2tree(self, sentence, tokenised, suppress_exceptions)
    104         raise ValueError('`tokenised` set to `True`, but variable '
    105                          '`sentence` does not have type '
    106                          '`list[str]`.')
    107     sent: list[str] = [str(token) for token in sentence]
--> 108     return self.sentences2trees(
    109                     [sent],
    110                     suppress_exceptions=suppress_exceptions,
    111                     tokenised=tokenised,
    112                     verbose=VerbosityLevel.SUPPRESS.value)[0]
    113 else:
    114     if not isinstance(sentence, str):

File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/lambeq/text2diagram/bobcat_parser.py:298, in BobcatParser.sentences2trees(self, sentences, tokenised, suppress_exceptions, verbose)
    296                 trees.append(None)
    297             else:
--> 298                 raise BobcatParseError(' '.join(sent.words)) from e
    300 for i in empty_indices:
    301     trees.insert(i, None)

BobcatParseError: Bobcat failed to parse 'Alice   sleeps'.

BobCat fails to parse with extra addition of "the" to a sentence.

I have been trying run this code:

import warnings
warnings.filterwarnings("ignore")
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

from discopy.grammar.pregroup import Spider, Ty, Id, Box, Diagram, Word
from lambeq import DepCCGParser, pregroups

from lambeq import TreeReader, TreeReaderMode

reader = TreeReader(mode=TreeReaderMode.RULE_ONLY)

sent = "Avoiding processed and sugary foods is important for reducing the risk of chronic diseases such as diabetes , obesity , and heart disease ."

tree_diagram = reader.sentence2diagram(sent,suppress_exceptions=False)
print(tree_diagram)

which gives the error: Illegal use of UNK: unknown CCG rule.

However, if we run the same program with the following sentence after removing "the", it runs succesffuly.

...
sent = "Avoiding processed and sugary foods is important for reducing risk of chronic diseases such as diabetes , obesity , and heart disease ."
...

My attempt to debug the issue has reached a point where the conjunction in "and sugary food" was causing the error by having mismtach CCGRule type in "call" function in CCGRule class. Not sure why this is happening and how to solve it. Any ideas?

Thanks!

UnitaryHACK: Implement and test gradient free optimizer for circuit models

Task description

To enhance the performance of our quantum models, we are currently investigating new optimizers for lambeq. One promising candidate is the Rotosolve algorithm (Ostaszewski et al.). Rotosolve is a gradient-free optimizer that leverages the fact that the expectation value of circuit measurements are sinusoidal with respect to the circuit parameters. Hence, Rotosolve can find the optimal parameters without the need for calculating gradients, making it particularly well-suited for optimization in the context of variational quantum algorithms in the NISQ era.

Your task is to implement the Rotosolve algorithm in lambeq, using the Optimizer interface. The following steps can be used as a guideline:

  1. Familiarise yourself with the theory behind Rotosolve by reading the paper by Ostaszewski et al.
  2. Investigate other optimizers in lambeq, e.g. the SPSAOptimizer
  3. Implement a new class called RotosolveOptimizer which inherits from Optimizer.
  4. Write unit tests to validate the functionality and performance of the Rotosolve optimizer.
  5. Provide clear documentation and examples on how to use the Rotosolve optimizer within lambeq.

Notes

  • Rotosolve is designed to minimise the expectation values of Hamiltionian measurements. Therefore, it won't work with classical losses like the Binary Cross Entropy. Assume that we are only optimizing a small number of variational quantum circuits where the loss is a linear combination of measurement expectation values.
  • Rotosolve is an optimizer that does not keep an internal state. Hence, there is no need to provide any functionality for the state_dict and load_state_dict methods.
  • The Optimizer interface has an attribute called gradient. Although Rotosolve is gradient-free, you can use the attribute to store the parameter updates.

Resources

Alternative

If you don't manage to implement and test the RotosolveOptimizer, you can also implement a different (gradient-free) optimizer of your choice. Some ideas are COBYLA, Nelder-Mead, Conjugate-Gradient...

error during installation

When I try installing using sh <(curl 'https://cqcl.github.io/lambeq/install.sh'), i get the follwing error:

ERROR: Cannot install lambeq[depccg]==0.1.0, lambeq[depccg]==0.1.1 and lambeq[depccg]==0.1.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    lambeq[depccg] 0.1.2 depends on depccg==1.1.0; extra == "depccg"
    lambeq[depccg] 0.1.1 depends on depccg==1.1.0; extra == "depccg"
    lambeq[depccg] 0.1.0 depends on depccg==1.1.0; extra == "depccg"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

I am on a 2020 Macbook air (with Apple M1 chip), and using conda with python=3.8.11 . Will any of that be causing the problem?

Error from_tk

Discussed in #49

Originally posted by JVM1982 October 10, 2022
Hello.

Why the following code does not work ? :

sentence = 'person runs program .'
diagram = remove_cups( parser.sentence2diagram( sentence ) )
circuit = ansatz( diagram )
print( model( [ circuit ] ) ) # OK #
print( model( [ from_tk( circuit.to_tk() ) ] ) ) # ERROR #

[[0.14473685 0.85526315]]
Unexpected exception formatting exception. Falling back to standard exception

Traceback (most recent call last):
File "/home/javier.valera/.local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_16444/2916265434.py", line 9, in
print( model( [ from_tk( circuit.to_tk() ) ] ) )
File "/home/javier.valera/.local/lib/python3.8/site-packages/lambeq/training/model.py", line 59, in call
return self.forward(*args, **kwds)
File "/home/javier.valera/.local/lib/python3.8/site-packages/lambeq/training/tket_model.py", line 131, in forward
return self.get_diagram_output(x)
File "/home/javier.valera/.local/lib/python3.8/site-packages/lambeq/training/tket_model.py", line 101, in get_diagram_output
*[diag_f(*self.weights) for diag_f in lambdified_diagrams],
File "/home/javier.valera/.local/lib/python3.8/site-packages/lambeq/training/tket_model.py", line 101, in
*[diag_f(*self.weights) for diag_f in lambdified_diagrams],
File "/home/javier.valera/.local/lib/python3.8/site-packages/discopy/monoidal.py", line 509, in
return lambda xs: self.id(self.dom).then((
File "/home/javier.valera/.local/lib/python3.8/site-packages/discopy/monoidal.py", line 510, in
self.id(left) @ box.lambdify(*symbols, **kwargs)(*xs)
File "/home/javier.valera/.local/lib/python3.8/site-packages/discopy/quantum/gates.py", line 321, in
return lambda *xs: type(self)(c_fn(*xs), distance=self.distance)
File "/home/javier.valera/.local/lib/python3.8/site-packages/discopy/quantum/gates.py", line 448, in
return lambda *xs: type(self)(data(*xs))
File "", line 2, in _lambdifygenerated
return runs__n.r@[email protected]_0
NameError: name 'runs__n' is not defined

Thanks.

Problem with BinaryCrossEntropyLoss

Originally posted by @yousrabou in #83 (comment)

Using:

  • lambeq : 0.3.1
  • pytket : 1.14.0
  • pytket-qiskit : 0.38.0
  • and the BinaryCrossEntropyLoss function for training RP dataset on GoogleColab envirenement

the following error is produced !!

ValueError                                Traceback (most recent call last)
[<ipython-input-16-04c3311cae81>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer.fit(train_dataset, logging_step=12)

7 frames
[/usr/local/lib/python3.10/dist-packages/lambeq/training/loss.py](https://localhost:8080/#) in _match_shapes(self, y1, y2)
     64                       y2: np.ndarray | jnp.ndarray) -> None:
     65         if y1.shape != y2.shape:
---> 66             raise ValueError('Provided arrays must be of equal shape. Got '
     67                              f'arrays of shape {y1.shape} and {y2.shape}.')
     68 

**ValueError: Provided arrays must be of equal shape. Got arrays of shape (30,) and (30, 2).**

cfg: other languages compatibility

I am interested in some verticalization of lambeq (and consequently discocat) to languages other than English, particularly Italian. As far as I read from the documentation from a linguistic point of view, at the base of the framework, there are cfg grammars. I know this theoretical formalism very well. How is it possible to visualize and extract the structures and formalisms of these grammars from the library so that they can be extended/modified?

Please migrate to DisCoPy 1.0

DisCoPy 0.5 is old and is not compatible with DisCoPy 1.0, the current, officially supported version. lambeq breaks when it is used with DisCoPy 1.0.

Feature Request : GPU-based dedicated environment to run the Lambeq models

Motivation:
Given none of the currently existing platforms support the Lambeq packages, it would nice if there was a dedicated environment with GPU-enabled compiler to allow the user(s) to train models using TKET and have WSL interpreter for running WebParser. This environment would have the imports pre-installed by default and allow for .txt or .csv file imports for the dataset.

Downloadable Checkpoints:
This manner user(s) can train the models based on preference with high speed, and eliminate any variability given different IDEs and/or interpreters which may potentially disrupt the model. This would allow users to outsource the training process to a GPU-based environment and simply download the model.lt checkpoint to use locally. Given the necessity of a WSL interpreter for running WebParser, the user could also download .pkl format of Webparsed diagrams to be used on locally as an additional feature as well.

Error Suggestion :
There could also be a suggestion system based on errors, like TypeError : Ty('p') which essentially indicates the need for an AtomicType.PREPOSITIONAL_PHRASE, or the trainer operand issue which indicates an issue with some diagrams having different number of output wires and provide suggestions for changing the dataset accordingly.

Proposed Feature Summary:
Given how each online environment requires users to pip install the libraries and currently this is the issue with platforms such as Colab and Deepnote, the environment would have the Lambeq and DiscoPY packages pre-installed, and run on Linux OS with WSL being the default interpreter. Since TKET is currently very slow for intermediate scale models, GPU feature would be incredibly useful. And lastly having a function to generate downloadable files for the model.lt and Webparser diagrams.

BobCatParser didn't get downloaded!!

Greetings, Lambeq team.
I was able to use BobCatParser until yesterday, when I started getting this problem every time I tried to execute this line of code !!!

What should I do to correct this error?

---------------------------------------------------------------My code ---------------------------------------------------------------------------
1 from lambeq import BobcatParser
2
----> 3 parser = BobcatParser(verbose='text')
4
5 raw_train_diagrams = parser.sentences2diagrams(train_data, suppress_exceptions=True)

-------------------------------------- The error ----------------------------------------------------------------------------
2 frames
/usr/local/lib/python3.9/dist-packages/lambeq/text2diagram/bobcat_parser.py in init(self, model_name_or_path, root_cats, device, cache_dir, force_download, verbose, **kwargs)
265 raise ValueError('Invalid model name or path: '
266 f'{model_name_or_path!r}')
--> 267 download_model(model_name_or_path, model_dir, verbose)
268
269 with open(model_dir / 'pipeline_config.json') as f:

/usr/local/lib/python3.9/dist-packages/lambeq/text2diagram/bobcat_parser.py in download_model(model_name, model_dir, verbose)
144 if verbose != VerbosityLevel.SUPPRESS.value:
145 print('Extracting model...')
--> 146 tar = tarfile.open(fileobj=model_file)
147 tar.extractall(model_dir)
148 model_file.close()

/usr/lib/python3.9/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1623 fileobj.seek(saved_pos)
1624 continue
-> 1625 raise ReadError("file could not be opened successfully")
1626
1627 elif ":" in mode:

ReadError: file could not be opened successfully

Error with Japanese Sentence Parsing Using DepCCG in Lambeq

Hello,

I apologize in advance for any mistakes in my English, as it is not my native language.

I am using lambeq with DepCCG parser to process Japanese sentences and I encountered an error similar to what was reported in issue #99.

While attempting to resolve this, I noticed a discrepancy in the rule symbols between ccg_rule.py in lambeq and ja.py in DepCCG. For instance, the backward_composition rule in ccg_rule.py is denoted as BC or <B, whereas in ja.py it's represented as bx or <b1. To address this, I adjusted ja.py to match the notation in ccg_rule.py. As a result, I was able to process the sentence "ボブはおいしくないカレーが嫌いではない" (Bob does not dislike curry that is not delicious).

However, when I tried to process another sentence, "親切な男性がいる" (There is a kind man), I encountered a CCGRuleUseError with the message 'unknown CCG rule'.

Upon investigating this issue further by inputting the sentence directly into DepCCG, I believe the problem lies with the ADNint rule, which is unique to Japanese and maps S\NP to NP/NP.

I would like to know if it's possible to add this rule to lambeq, and if so, where should I add it? Any guidance on this would be greatly appreciated.

Thank you in advance for your help.

Error when running Parser

Below is the code I ran for testing the parser :
from lambeq import BobcatParser

parser = BobcatParser()
diagram = parser.sentence2diagram('This is a test sentence')
diagram.draw()

and this is the error I received when running it :
2022-05-22 20:19:08.041411: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-05-22 20:19:08.042271: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1342, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1010, in _send_output
self.send(msg)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 950, in send
self.connect()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1424, in connect
self.sock = self._context.wrap_socket(self.sock,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1040, in _create
self.do_handshake()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\elmm\Desktop\CQM\QNLP Depression 2.py", line 3, in
parser = BobcatParser()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\text2diagram\bobcat_parser.py", line 258, in init
download_model(model_name_or_path, model_dir, verbose)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\text2diagram\bobcat_parser.py", line 130, in download_model
model_file, headers = urlretrieve(url, reporthook=progress_bar.update)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 239, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
response = self._open(req, data)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1345, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>

How can I resolve this?

Trainnig models is IMPOSSIBLE !!

Dear, Lambeq team.

I'm unable to train models (spider, cups, etc.) while this error occurs frequently!!

---------------------------------------------------------Code-------------------------------------------------------------------------------------

     49 val_dataset= Dataset(val_circuits, val_labels, shuffle=False)
---> 50 Spiders_trainer.fit(train_dataset, val_dataset, logging_step=12)

---------------------------------------------------------Error----------------------------------------------------------------------------------
8 frames
[/usr/local/lib/python3.9/dist-packages/lambeq/training/quantum_trainer.py](https://localhost:8080/#) in fit(self, train_dataset, val_dataset, evaluation_step, logging_step)
    197             val_dataset: Optional[Dataset] = None,
    198             evaluation_step: int = 1,
--> 199             logging_step: int = 1) -> None:
    200 
    201         self.model._training = True

[/usr/local/lib/python3.9/dist-packages/lambeq/training/trainer.py](https://localhost:8080/#) in fit(self, train_dataset, val_dataset, evaluation_step, logging_step)
    373                     x, y_label = batch
    374                     y_hat, loss = self.training_step(batch)
--> 375                     if (self.evaluate_on_train
    376                             and self.evaluate_functions is not None):
    377                         for metr, func in self.evaluate_functions.items():

[/usr/local/lib/python3.9/dist-packages/lambeq/training/quantum_trainer.py](https://localhost:8080/#) in training_step(self, batch)
    161         -------
    162         Tuple of np.ndarray and float
--> 163             The model predictions and the calculated loss.
    164 
    165         """

[/usr/local/lib/python3.9/dist-packages/lambeq/training/spsa_optimizer.py](https://localhost:8080/#) in backward(self, batch)
    139         xminus = self.project(x - self.ck * delta)
    140         self.model.weights = xminus
--> 141         y1 = self.model(diagrams)
    142         loss1 = self.loss_fn(y1, targets)
    143         if self.bounds is None:

[/usr/local/lib/python3.9/dist-packages/lambeq/training/quantum_model.py](https://localhost:8080/#) in __call__(self, *args, **kwargs)
    144     @abstractmethod
    145     def forward(self, x: list[Diagram]) -> Any:
--> 146         """Compute the forward pass of the model using
    147         `get_model_output`
    148 

[/usr/local/lib/python3.9/dist-packages/lambeq/training/tket_model.py](https://localhost:8080/#) in forward(self, x)
    131 
    132         """
--> 133         return self.get_diagram_output(x)

[/usr/local/lib/python3.9/dist-packages/lambeq/training/tket_model.py](https://localhost:8080/#) in get_diagram_output(self, diagrams)
    100 
    101         lambdified_diagrams = [self._make_lambda(d) for d in diagrams]
--> 102         tensors = Circuit.eval(
    103             *[diag_f(*self.weights) for diag_f in lambdified_diagrams],
    104             **self.backend_config,

[/usr/local/lib/python3.9/dist-packages/discopy/quantum/circuit.py](https://localhost:8080/#) in eval(self, backend, mixed, contractor, *others, **params)
    286             return type(box)(box.dom, box.cod, box.array + 0j)
    287         circuits = [circuit.to_tk() for circuit in (self, ) + others]
--> 288         results, counts = [], circuits[0].get_counts(
    289             *circuits[1:], backend=backend, **params)
    290         for i, circuit in enumerate(circuits):

[/usr/local/lib/python3.9/dist-packages/discopy/quantum/tk.py](https://localhost:8080/#) in get_counts(self, backend, *others, **params)
    131         if compilation is not None:
    132             for circuit in (self, ) + others:
--> 133                 compilation.apply(circuit)
    134         handles = backend.process_circuits(
    135             (self, ) + others, n_shots=n_shots, seed=seed)

**ValueError: Non-unitary matrix passed to two_qubit_canonical**

Trainer.fit stuck at Epoch1 !

trainer.fit(train_dataset, val_dataset, evaluation_step=1, logging_step=100)
Epoch 1: train/loss: 0.8875 valid/loss: 1.5880 train/acc: 0.6286 valid/acc: 0.5000
But stop here and can't run down!

image

WebParser error

Hi
I'm trying to run the Web Parser but very similar to the issue with the Bobcat Parser, it gives this error :
Traceback (most recent call last):
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1342, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1010, in _send_output
self.send(msg)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 950, in send
self.connect()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\http\client.py", line 1424, in connect
self.sock = self._context.wrap_socket(self.sock,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1040, in _create
self.do_handshake()
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\elmm\Desktop\CQM\QNLP_test.py", line 6, in
new_diagram = parser.sentence2diagram('he was overtaken by the depression.')
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\text2diagram\ccg_parser.py", line 227, in sentence2diagram
return self.sentences2diagrams(
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\text2diagram\ccg_parser.py", line 157, in sentences2diagrams
trees = self.sentences2trees(sentences,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\text2diagram\web_parser.py", line 159, in sentences2trees
raise e
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\site-packages\lambeq\text2diagram\web_parser.py", line 148, in sentences2trees
with urlopen(url) as f:
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 517, in open
response = self._open(req, data)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1385, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Users\elmm\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 1345, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>

Process finished with exit code 1

I really need to get this working because Bobcat fails in parsing the sentences into diagrams. Can someone please help me with this?

Install instruction fails on zsh

Problem

When I run the command pip install lambeq[depccg] in a zsh shell on Ubuntu 18.04 with pip=21.3, I get the following error message:
zsh: no matches found: lambeq[depccg].

Suggested solution

Use quotes on the package name:
pip install "lambeq[depccg]"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.