Git Product home page Git Product logo

pytorch-crf's Introduction

pytorch-crf

Conditional random field in PyTorch.

Python versions

PyPI project

Build status

Documentation status

Code coverage

License

Built with Spacemacs

This package provides an implementation of linear-chain conditional random field (CRF) in PyTorch. This implementation borrows mostly from AllenNLP CRF module <https://github.com/allenai/allennlp/blob/master/allennlp/modules/conditional_ra ndom_field.py> with some modifications.

Documentation

https://pytorch-crf.readthedocs.io/

License

MIT

Contributing

Contributions are welcome! Please follow these instructions to install dependencies and running the tests and linter.

Installing dependencies

Make sure you setup a virtual environment with Python. Then, install all the dependencies in requirements.txt file and install this package in development mode.

pip install -r requirements.txt
pip install -e .

Setup pre-commit hook

Simply run:

ln -s ../../pre-commit.sh .git/hooks/pre-commit

Running tests

Run pytest in the project root directory.

Running linter

Run flake8 in the project root directory. This will also run mypy, thanks to flake8-mypy package.

pytorch-crf's People

Contributors

aravindmahadevan avatar fuzihaofzh avatar jeppehallgren avatar kmkurn avatar xu-song avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-crf's Issues

Error in _compute_score when working with a batch

I got an error on line 186: score = self.start_transitions[tags[0]]
The error is
IndexError: The shape of the mask [32] at index 0 does not match the shape of the indexed tensor [21] at index 0
32 is the size of my batch and 21 the number of tags.

raise ValueError('mask of the first timestep must all be on')

What exactly does this ValueError mean?
This is the mask tensor (dim: [6, 512]) I am using

tensor([[0, 1, 1,  ..., 0, 0, 0],
        [0, 1, 0,  ..., 1, 1, 0],
        [0, 1, 1,  ..., 0, 0, 0],
        [0, 1, 1,  ..., 0, 0, 0],
        [0, 1, 1,  ..., 1, 1, 0],
        [0, 1, 1,  ..., 0, 0, 0]], dtype=torch.uint8)

I assume it means that all the 6 tensors (dim 1) should begin with value 1 and not 0.

Error with _validate on gpu

I'm using this model on Python 3.6.5, Pytorch 1.0.1 on docker, here is the traceback:

  ...
  File "/share/E4G0/models/up_crf.py", line 25, in forward
    scores = self.crf(emissions, target_tags, input_masks.long())
  File "/home/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anaconda3/lib/python3.6/site-packages/torchcrf/__init__.py", line 90, in forward
    self._validate(emissions, tags=tags, mask=mask)
  File "/home/anaconda3/lib/python3.6/site-packages/torchcrf/__init__.py", line 165, in _validate
    no_empty_seq_bf = self.batch_first and mask[:, 0].all()
RuntimeError: _th_all is not implemented for type torch.cuda.LongTensor

It seems that mask[:, 0].all() dont work on cuda.LongTensor

[Question] Actual usage examples?

Besides the toy examples listed in the docs and tests, are there actual examples of this library available anywhere?

I'm interested in using this library for a sequence labeling project, but I'm curious to know if I'm using this library correctly. What I have is something like this:

class MyModel(nn.Module):
    def __init__(self, num_features, num_classes):
        super(MyModel, self).__init__()
        self.num_features = num_features
        self.num_classes = num_classes
        self.lstm = nn.LSTM(num_features, 128)
        self.fc = nn.Linear(128, num_classes)
        self.crf = CRF(num_classes)

# ----------------------------------------------------------
model = MyModel(...)

# Training loop:
y_hat = model(batch)  # The network's forward returns fc(lstm(batch))
loss = -model.crf(y_hat, y)
loss.backward()
optimizer.step()

Although this seems to work and the loss is decreasing, I have a feeling that I might be missing something.
Any help is appreciated. Thanks!

START TAG and STOP TAG?

Hello, your job on pytorch-crf impressed me a lot. While I have some questions on whether should my tag including START TAG and STOP TAG? I am confused about that

Loss not decreasing!

I use CRF as my model loss like Issue #29, but I found loss didn't decrease! I replace it with BCEWithLogitsLoss and then loss decreases.
I have y: (seq_length,) and y_pred: (seq_length, num_classes). Here is my code:

# features is a list here
for epoch in range(epochs):
    for fea in features:
        y_pred = model(fea)
        y_pred= y_pred.reshape(y_pred.shape[0], 1, -1)
        y= y.reshape(y.shape[0], -1)
        loss = -model_gat.crf(y_pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

I wonder whether I use the library by mistake?
Looking for your help, thanks!

RuntimeError: tensors used as indices must be long or byte tensors

I just follow the example in the document, but I got this error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torchcrf/__init__.py", line 102, in forward numerator = self._compute_score(emissions, tags, mask) File "/opt/conda/lib/python3.6/site-packages/torchcrf/__init__.py", line 187, in _compute_score score += emissions[0, torch.arange(batch_size), tags[0]] RuntimeError: tensors used as indices must be long or byte tensors

code:

import torch
from torchcrf import CRF
num_tags = 5
model = CRF(num_tags)
seq_length = 3 # maximum sequence length in a batch
batch_size = 2 # number of samples in the batch
emissions = torch.randn(seq_length, batch_size, num_tags)
tags = torch.tensor([
... [0, 1], [2, 4], [3, 1]
... ], dtype=torch.long) # (seq_length, batch_size)
model(emissions, tags)

Unexpected IndexError is result of dtype change

I noticed this in my own code and spent some time debugging it. Here's a version of the code in the docs as an example:

import torch
from torchcrf import CRF
num_tags = 5 
model = CRF(num_tags)

seq_length = 3 
batch_size = 2 
emissions = torch.randn(seq_length, batch_size, num_tags)
tags = torch.tensor([
  [0, 1], [2, 4], [3, 1]
], dtype=torch.uint8)

log_likelihood = model(emissions, tags)

The change in the dtype parameter of tags from torch.long to torch.uint8 introduces the following error:


IndexError: The shape of the mask [2] at index 0 does not match the shape of the indexed tensor [5] at index 0

Which is unexpected behavior, as far as I can tell.

Allow parameters to be manually set

Currently the only way to manually update the CRF state (such as start_transitions) is by accessing its internal memory directly (crf.start_transitions.data = my_new_parameters). Let's consider if there's a better way of achieving this - either through a setter function (see #18 ) or by allowing the parameters to be specified in the model constructor.

Get the score for the best_tags_list

Hello !

Thank you for your awesome work

I know that this question has been asked before, but somehow I cannot manage to get the score for the best_tags_list

On this issue #48, you said that manipulating forward would do the job but I am not sure how you could do that.

I thought that making modifications on _viterbi_decode function would provide the score of the best sequence. Actually I have printed score within the _viterbi_decode function but still not find what I want

Can you please be more specific on how to get the score for the best_tags_list ?

Thanks
Regards

Unchecked Initialize params in Init

When training torchcrf module version (0.3.1) using pytorch version 0.2.4, I encounter NaN loss from forward computation.
possibly because CRF parameters not initialized before

Here is the log when passing one instance from lstm module 300 dims to
nn.Linear then to torchcrf module

lstm hidden :
2017-12-29 09:07:17,238 - INFO - root - Variable containing:
( 0 ,.,.) = 

Columns 0 to 8 
  -0.3024  0.0548 -0.0000  0.0135  0.1097 -0.1398 -0.0569  0.1195 -0.0071
 -0.3441  0.2693 -0.0000 -0.1563 -0.0391  0.1706  0.0360 -0.1091  0.3121
 -0.3552  0.0395  0.0000 -0.1029  0.4409 -0.1020  0.2578 -0.2286 -0.0193

Columns 9 to 17 
  -0.1107 -0.0007 -0.1530 -0.0000 -0.1692 -0.0962  0.1139  0.0132  0.1538
  0.0558  0.0195  0.0275 -0.0000  0.0698  0.2186  0.2312  0.1365 -0.0382
  0.3324  0.1442 -0.2000 -0.0000 -0.1615  0.0404 -0.0260  0.2240 -0.1211

Columns 18 to 26 
  -0.1879 -0.0071 -0.0625  0.3014 -0.1089  0.1881 -0.3226 -0.2328  0.1622
 -0.2262 -0.2449 -0.2407 -0.0109 -0.0068  0.3066 -0.4911 -0.0048  0.0169
  0.0021 -0.0269 -0.0051  0.1964  0.0030 -0.0820 -0.2229 -0.0990  0.0310

Columns 27 to 35 
  -0.1022 -0.1103  0.0386  0.1079  0.1450 -0.0188 -0.0616  0.0000  0.1597
  0.1801  0.1645  0.0948  0.1418  0.2858 -0.3416 -0.2020 -0.0000  0.0428
  0.0588  0.3452 -0.0155  0.0919 -0.2397 -0.1481 -0.1706 -0.0000 -0.1262

Columns 36 to 44 
  -0.1714  0.0000  0.0649  0.1468  0.0170 -0.3933  0.0553 -0.0761  0.4328
 -0.3755  0.0000  0.2756  0.1469 -0.2402 -0.0848  0.0758 -0.0185  0.1007
 -0.0883  0.0000 -0.1557  0.1176  0.1104 -0.1737  0.0776 -0.0709  0.0133

Columns 45 to 53 
  -0.1138 -0.0721 -0.2555 -0.1136  0.0650 -0.2853 -0.0383 -0.1924 -0.0878
 -0.0104  0.0393 -0.2041  0.1408  0.0427 -0.0282  0.0322 -0.3089  0.2671
  0.1139  0.1691 -0.1021 -0.3851  0.1300 -0.1518 -0.2397 -0.1741  0.1581

Columns 54 to 62 
  -0.0363 -0.0600 -0.1095 -0.0233 -0.0011  0.0741 -0.1148  0.0705 -0.0301
  0.0204 -0.0919 -0.1292  0.0038 -0.3615  0.2765 -0.0031 -0.3507 -0.0910
 -0.5493  0.0463 -0.1996 -0.0512  0.1703  0.0860  0.0686  0.1487  0.1957

Columns 63 to 71 
  -0.0441 -0.0000  0.1158  0.0000 -0.1326 -0.2144 -0.3385 -0.2347 -0.0383
 -0.1015 -0.0000 -0.2969 -0.0000  0.0072 -0.0106 -0.2851 -0.0488 -0.0822
  0.1214 -0.0000  0.0413  0.0000 -0.0001  0.0167 -0.0139 -0.1047 -0.0495

Columns 72 to 80 
   0.0000  0.0686 -0.0375 -0.2266  0.0000  0.0605  0.2789  0.0000 -0.1027
  0.0000 -0.0458 -0.0128  0.3314  0.0000 -0.0077  0.1982 -0.0000 -0.2339
  0.0000 -0.1153  0.1715 -0.2791  0.0000  0.2872 -0.0257  0.0000  0.1475

Columns 81 to 89 
  -0.0693  0.1278 -0.2081  0.1067  0.0601 -0.2345  0.0000 -0.1261  0.0998
 -0.1327 -0.0881 -0.0285  0.0020  0.1344 -0.3387  0.0000 -0.2330  0.4240
 -0.1269  0.0222 -0.0881  0.0822  0.0633 -0.0394  0.0000 -0.0447  0.1005

Columns 90 to 98 
   0.0601 -0.2818 -0.0915 -0.0000 -0.1667  0.0736  0.0502  0.0000 -0.2093
  0.0734 -0.3057  0.0352 -0.0000 -0.2095 -0.2647  0.4264  0.0000  0.1107
 -0.1155  0.1542 -0.1582 -0.0000 -0.0560  0.0783 -0.1361  0.0000 -0.0103

Columns 99 to 107 
  -0.1206  0.0846 -0.0680  0.0378 -0.1582 -0.0099  0.1485  0.0000  0.0000
  0.1400 -0.0458 -0.0598  0.1909 -0.3910 -0.6758 -0.2302 -0.0000  0.0000
 -0.2916  0.0088  0.0951 -0.1761 -0.0085 -0.1792 -0.2064 -0.0000  0.0000

Columns 108 to 116 
  -0.1337  0.2123 -0.1010 -0.1257  0.0000 -0.1044  0.1301 -0.3993 -0.1017
 -0.1513  0.1115  0.1221 -0.1946 -0.0000 -0.0130 -0.0254 -0.0542 -0.0974
 -0.2906  0.0156  0.2102  0.0105  0.0000 -0.1112 -0.0140  0.0742  0.0151

Columns 117 to 125 
  -0.0189  0.1031  0.2555 -0.0443  0.0766  0.0084 -0.0230  0.0501 -0.3675
  0.2021 -0.2891 -0.0351 -0.1743  0.3122  0.0760 -0.0478 -0.3143 -0.2035
  0.2177  0.2749  0.0076  0.0403  0.1036 -0.0337 -0.1401 -0.0082  0.0270

Columns 126 to 134 
  -0.0578 -0.0108 -0.1187  0.1386 -0.0156 -0.0056  0.0103 -0.0659  0.2106
  0.0096  0.0461 -0.1115  0.3557  0.1699 -0.0051 -0.1683 -0.0401 -0.0644
 -0.0448  0.0751  0.1242  0.0328 -0.2548 -0.0715  0.0221  0.0969 -0.0621

Columns 135 to 143 
   0.1960 -0.0476  0.2171  0.1671  0.1393 -0.0252 -0.0192 -0.0656 -0.2273
  0.0980 -0.0930  0.2376  0.1949 -0.1071  0.1856 -0.1145 -0.0927  0.3215
 -0.2770 -0.2060  0.1390  0.1908  0.1614  0.1142 -0.0390 -0.1477 -0.1599

Columns 144 to 152 
  -0.0614 -0.3834 -0.1685  0.3919 -0.2907 -0.0786  0.0313  0.0743 -0.0000
 -0.1174 -0.0375 -0.2633  0.3274 -0.2755 -0.2314  0.0491  0.1064 -0.0000
  0.0198  0.0110 -0.1439  0.1322 -0.1293 -0.1125  0.1284  0.0114  0.0000

Columns 153 to 161 
   0.0000 -0.0678  0.0117 -0.0224  0.0984  0.0000  0.3218 -0.1375  0.0000
 -0.0000  0.0583 -0.0360 -0.8546  0.0323  0.0000  0.3010  0.0919 -0.0000
 -0.0000 -0.0774  0.1754 -0.0606  0.1098 -0.0000 -0.0122 -0.1889  0.0000

Columns 162 to 170 
  -0.0686  0.0000 -0.0697  0.0901 -0.0578 -0.0486 -0.2314 -0.2631 -0.3633
  0.0288 -0.0000 -0.1322 -0.0271  0.2687 -0.2016 -0.2042 -0.2741 -0.2876
  0.0897  0.0000 -0.0039  0.0093 -0.1374 -0.1038  0.0054 -0.1721  0.2671

Columns 171 to 179 
   0.0728  0.0245  0.2898  0.0913  0.3150 -0.0550  0.1246 -0.1551  0.0648
 -0.0429  0.3847  0.0912  0.0331  0.3161 -0.0667  0.3400 -0.0388 -0.0036
  0.0877  0.1819  0.1225  0.0621 -0.0355 -0.2585  0.0713 -0.0841  0.2578

Columns 180 to 188 
  -0.1895  0.1002 -0.0052  0.3516 -0.1101  0.0523 -0.0000  0.2091  0.0747
 -0.1607 -0.0046 -0.1061 -0.0385 -0.1413 -0.0171  0.0000  0.1063 -0.4841
  0.1050  0.1690  0.1147  0.2506  0.2040  0.0545  0.0000  0.2060 -0.0623

Columns 189 to 197 
   0.0600 -0.0050 -0.0401 -0.0000 -0.1340  0.1341 -0.0451  0.1836 -0.0808
 -0.0280 -0.3557 -0.1866 -0.0000 -0.3596  0.1937 -0.0347  0.1154  0.0015
  0.1930  0.1141 -0.2795 -0.0000 -0.0629  0.3187  0.1487  0.0753 -0.0735

Columns 198 to 206 
   0.2044 -0.2763  0.0000  0.2526  0.0407 -0.1678 -0.0715  0.2023 -0.0038
  0.0858  0.0360 -0.0000  0.0356 -0.0155 -0.0775  0.3066  0.1852 -0.0844
  0.1613 -0.0127 -0.0000  0.0738 -0.1326 -0.0106  0.1497  0.6372  0.0351

Columns 207 to 215 
   0.0383  0.1690 -0.0000 -0.1033 -0.1101  0.0000 -0.0876  0.0957 -0.0977
  0.1350  0.0972 -0.0000  0.0241 -0.3232  0.0000 -0.0625 -0.2341 -0.0584
  0.0540  0.0776 -0.0000  0.1214  0.1804  0.0000 -0.0839  0.4604  0.2396

Columns 216 to 224 
   0.0508  0.0994 -0.1016 -0.0899 -0.1522  0.0000  0.1327  0.1221 -0.1373
  0.1690 -0.0343  0.0547  0.2222 -0.0506 -0.0000  0.0177  0.0349 -0.1143
  0.1727  0.1096  0.0626  0.0436  0.1396  0.0000  0.0528 -0.0249  0.2179

Columns 225 to 233 
  -0.0845 -0.1163  0.0705 -0.1589 -0.0378 -0.1994  0.2543  0.1034  0.2005
  0.0094 -0.0755  0.0915 -0.0504  0.0498 -0.0067  0.0800  0.1875  0.1883
  0.0364 -0.0549  0.0915  0.2024  0.1746  0.0507  0.1951  0.2767  0.0016

Columns 234 to 242 
  -0.0362  0.0638  0.0865  0.0119 -0.0331 -0.0068  0.0037  0.0220  0.0385
  0.0448  0.0941  0.2425 -0.2410  0.0814 -0.3031 -0.0512 -0.2177  0.0450
 -0.0252  0.3924  0.0181 -0.0210  0.0322 -0.1107  0.0121  0.2406 -0.0365

Columns 243 to 251 
  -0.0096 -0.0000  0.1289 -0.0000  0.0000  0.1612 -0.0654 -0.2008  0.0145
  0.1697  0.0000  0.1053 -0.0000  0.0000  0.1589  0.2171  0.1363 -0.0249
  0.1692 -0.0000  0.1759 -0.0000 -0.0000 -0.3958  0.0598 -0.1577 -0.1894

Columns 252 to 260 
  -0.1549  0.3160  0.0221  0.0000  0.1242  0.1123  0.1337 -0.0214 -0.0000
 -0.2198  0.1118 -0.2462 -0.0000 -0.2487  0.2292  0.0541  0.1150 -0.0000
  0.3288  0.2174  0.0088  0.0000 -0.0211 -0.0540 -0.0332 -0.0510  0.0000

Columns 261 to 269 
   0.1102 -0.0370  0.2161  0.2082  0.2646 -0.1688  0.0222  0.0684  0.0252
  0.1097 -0.0793  0.0189 -0.2266  0.0223  0.1617 -0.0686  0.3449  0.0602
 -0.0252 -0.1221  0.1081  0.2685  0.0149 -0.1512 -0.1749 -0.0512  0.0889

Columns 270 to 278 
  -0.0000  0.1140 -0.1641  0.0528 -0.1709  0.0775 -0.1427 -0.0161 -0.2229
 -0.0000 -0.1343 -0.1599 -0.1046  0.0678  0.2201 -0.2377 -0.1704 -0.3747
  0.0000  0.0893 -0.0924  0.2425  0.1202  0.1105 -0.1062 -0.1071 -0.1847

Columns 279 to 287 
  -0.0000 -0.1083 -0.0000 -0.0220  0.0487 -0.3342 -0.1988  0.0566 -0.0731
  0.0000  0.4410 -0.0000  0.2042  0.0569  0.0335 -0.1195  0.1861 -0.0210
 -0.0000  0.2180 -0.0000  0.1403  0.0247  0.0461 -0.1210 -0.0353  0.0776

Columns 288 to 296 
   0.2324  0.0000 -0.1793  0.3275  0.0456  0.0506  0.2650  0.0422 -0.0203
  0.1437 -0.0000 -0.0049  0.1338 -0.2079  0.0761  0.5466 -0.2304 -0.2335
  0.1686  0.0000  0.1794  0.1843 -0.0113 -0.1217  0.1163  0.5834 -0.3259

Columns 297 to 299 
  -0.2776  0.0000 -0.0008
 -0.2349  0.0000  0.0936
  0.1223 -0.0000  0.0938
[torch.FloatTensor of size 1x3x300]

logits / emission : 
2017-12-29 09:07:17,242 - INFO - root - Variable containing:
(0 ,.,.) = 

Columns 0 to 8 
  -0.0727 -0.1339 -0.0356  0.0746  0.0122 -0.0317 -0.0317  0.0196 -0.0424
 -0.1730 -0.0573  0.1282 -0.0520 -0.2719 -0.1338  0.1333  0.0369 -0.0064
 -0.0886 -0.0453  0.0726 -0.2096  0.1002 -0.1149  0.0496  0.1408  0.1913

Columns 9 to 12 
   0.0669  0.0864  0.0406  0.0853
  0.1540  0.0856 -0.1543  0.2038
  0.0946 -0.0216 -0.0473  0.1621
[torch.FloatTensor of size 1x3x13]

loss : 
2017-12-29 09:07:17,244 - INFO - root - Variable containing:
nan
[torch.FloatTensor of size 1]

issue with imbalanced data

Firstly, thank you for sharing the code and making it easy to use.
I'm using CRF to classify EEG data, as the labels are sequential and having dependencies.

However, the labels are imbalanced and CRF seems to just produce the labels of the majority class.
The use of oversamling is not proper in this case, so I wonder if you may have a solution or suggestion for this issue.

Thanks.

Add `size_average` argument in `forward`

This is to follow Pytorch's convention. When reduce is True, size_average being True means to average the log likelihood over the batch, but sum them otherwise. When reduce is False, size_average is ignored.

some thing wrong

Hi, It is a nice job, but I meet a problem. I follow the guide, But something wrong:
image
could you help me, thanks

pytorch : 1.0.0
python: 3.6

A question about "denominator" obtained by the function <_compute_normalizer>

Thank you for your awesome work. I am confused about the calculation process of the denominator.
Why do you need to perform additions for each step in the loop?
https://github.com/kmkurn/pytorch-crf/blob/master/torchcrf/__init__.py#L245

next_score = torch.logsumexp(next_score, dim=1)

I think that this operation is done at the end of the loop and does not need to be executed inside the loop.
I am confused about this. Can you give me some explanation?

The decoding module does not support multiple GPUs.

The decode function does not support multiple GPUs, can see torchcrf/__init__.py#L117, the following bug will appear on multiple GPUs.

File "/root/anaconda2/envs/pytorch1.0/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument # 1 must support iteration

Looking forward to your reply.

Loss decreases but f1 score remains unchanged

This is my model code, I want to fine-tuning a pertained language model XLM from Facebook to do NER tasks, so i linked a BiLSTM and CRF.

class XLM_BiLSTM_CRF(nn.Module):
    def __init__(self, config, num_labels, params, dico, reloaded):
        super().__init__()
        self.config = config
        self.num_labels = num_labels
        self.batch_size = config.batch_size
        self.hidden_dim = config.hidden_dim

        self.xlm = TransformerModel(params, dico, True, True)
        self.xlm.eval()
        self.xlm.load_state_dict(reloaded['model'])

        self.lstm = nn.LSTM(config.embedding_dim, config.hidden_dim // 2,
                            num_layers=1, bidirectional=True)
        self.dropout = nn.Dropout(config.dropout)
        self.classifier = nn.Linear(config.hidden_dim, config.num_class)
        self.apply(self.init_bert_weights)
        self.crf = CRF(config.num_class)

    def forward(self, word_ids, lengths, langs=None, causal=False):
        sequence_output = self.xlm('fwd', x=word_ids, lengths=lengths, causal=False).contiguous()
        sequence_output, _ = self.lstm(sequence_output)
        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)
        return self.crf.decode(logits)

    def log_likelihood(self, word_ids, lengths, tags):
        sequence_output = self.xlm('fwd', x=word_ids, lengths=lengths, causal=False).contiguous()
        sequence_output, _ = self.lstm(sequence_output)
        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)
        return - self.crf(logits, tags.transpose(0,1))

    def init_bert_weights(self, module):
        """ Initialize the weights.
        """
        if isinstance(module, (nn.Linear, nn.Embedding)):
            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
        if isinstance(module, nn.Linear) and module.bias is not None:
            module.bias.data.zero_()

And this is my training code.

def train(model, train_iter, dev_iter, params):
    for param in model.xlm.parameters():  ## freeze layers
        param.requires_grad = False

    optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()),
                           lr=0.003, betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-4)
    iteration, best_f1 = 0, 0

    for epoch in trange(params.n_epochs):
        for sentence, tags in train_iter:
            model.train()
            iteration += 1
            optimizer.zero_grad()

            # sentence = torch.tensor(sentence, dtype=torch.long)
            sentence = sentence.long().transpose(0, 1).to(device)  # slen * bs
            # tags = torch.tensor([tag2id[t] for t in tags], dtype=torch.long)
            tags = tags.long().to(device)

            lengths = torch.LongTensor([params.max_len] * sentence.size(1)).to(device)
            # langs = ''
            # logits = model(sentence, lengths)
            loss = model.log_likelihood(sentence, lengths, tags)

            loss.backward()

            torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=2)

            optimizer.step()

            if iteration % 20 == 0:
                logging.info(
                    '\rEpoch[{}] - Iteration[{}] - loss: {}'.format(epoch, iteration, loss.item()))

            if iteration % 20 == 0:
                _, _, eval_f1 = eval(model, dev_iter, params)
                if eval_f1 > best_f1:
                    best_f1 = eval_f1
                    save(model, "./dumped", iteration)


def eval(model, dev_iter, params):
    model.eval()

    aver_loss = 0
    preds, labels = [], []
    for sentence, tags in dev_iter:
        sentence = sentence.long().transpose(0, 1).to(device)
        tags = tags.long().to(device)

        lengths = torch.LongTensor([params.max_len] * sentence.size(1)).to(device)
        pred = model(sentence, lengths)
        loss = model.log_likelihood(sentence, lengths, tags)
        aver_loss += loss.item()

        for i in pred:
            preds += i
        for i in tags.tolist():
            labels += i

    aver_loss /= (len(dev_iter) * params.batch_size)
    precision = precision_score(labels, preds, average='macro')
    recall = recall_score(labels, preds, average='macro')
    f1 = f1_score(labels, preds, average='macro')
    report = classification_report(labels, preds)
    print(report)

    logging.info('\nEvaluation - loss: {:.6f}  precision: {:.4f}  recall: {:.4f}  f1: {:.4f} \n'.format(aver_loss,
                                                                                                        precision,
                                                                                                        recall, f1))
    return precision, recall, f1

During training, the loss of model decreases, but the F1 score remains unchanged in 0.073, It looks like the loss of model didn't help to predict the correct label of entity.
I just confused and don't know why did this happen, could anyone can help? Appreciate a lot.

Same token is predicted at each step during decoding

Hi,
I am using pytorch-crf for token prediction task with a LSTM network. When I use a fully connected layer after lstm it works fine.

x, _ = self.lstm(...)
x = self.linear(x)

This is trained using nn.CrossEntropyLoss loss in PyTorch.

Now, I want to add a CRF layer for a sequence prediction task.

x, _ = self.lstm(...)
x = self.linear(x)
crf_out = self.crf.forward(x, y, masks, reduction='token_mean')

-crf_out is used as loss to train the network.
Decoding is done using
dec_out = self.crf.decode(x, masks)
However, this only predicts one category (which hast the maximum occurrence in the data). Perhaps I should mention that the dataset is heavily imbalanced and one target token consists of 85% of all tokens. Loss decreases during training.

Index out of bound error

crf = CRF(5, batch_first=True)
score = torch.randn(1, 3, 5)
target = torch.tensor([
    [1, 2, -100],
], dtype=torch.long)
mask = torch.tensor([
    [1, 1, 0]
], dtype=torch.uint8)

print(crf(score, target, mask=mask)) # index out of bound error

target = torch.empty_like(target).copy_(target)
target[target == -100] = 0
print(crf(score, target, mask=mask)) # fix

forward algorithm relies on the masked value.
-100 is useful for the downstream.
it is ugly to copy the target every time, so can this be fixed inside the crf?

Cross Entropy as a loss function

Hi,

I would like to use Cross Entropy as a loss function so I wrote this code:

### __init__
    self.softmax = m = nn.LogSoftmax(dim=2)
    self.crf = CRF(hparams.num_classes, batch_first=True)

### forward
    if is_train:       
          output = self.softmax(output)
          slot_loss = -1 * self.crf(output, y, mask=mask)  # negative log-likelihood
          return slot_loss
    return self.crf.decode(output)

That should work by definition of Cross Entropy but I'm getting loss on very different scales, something like 0.96 when using just the cross entropy loss from pytorch (without crf) and something like 150.8 when using the code above.

Furthermore, I'm getting slightly worse performance when using the CRF compared to not using it, around 1% difference. While with an earlier architecture of the same network on the same dataset gave a significant performance improvement.

Is there something wrong with my code?

Thank you

How can we get the probabilities out?

I see how to get the output of the CRF, as well as the decoding - is it possible to also get the computed probabilities for each of the predictions?

installation error

ERROR: Error checking for conflicts.
Traceback (most recent call last):
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init_.py", line 3021, in _dep_map
return self._dep_map
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 2815, in __getattr__
raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init_.py", line 3012, in _parsed_pkg_info
return self.pkg_info
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 2815, in getattr
raise AttributeError(attr)
AttributeError: _pkg_info

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_internal\commands\install.py", line 512, in _warn_about_conflicts
package_set, dep_info = check_install_conflicts(to_install)
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_internal\operations\check.py", line 114, in check_install_conflicts
package_set, _ = create_package_set_from_installed()
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_internal\operations\check.py", line 53, in create_package_set_from_installed
package_set[name] = PackageDetails(dist.version, dist.requires())
File "c:\users\suman\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 2736, in requires
dm = self.dep_map
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 3023, in _dep_map
self.__dep_map = self.compute_dependencies()
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 3032, in _compute_dependencies
for req in self.parsed_pkg_info.get_all('Requires-Dist') or []:
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 3014, in parsed_pkg_info
metadata = self.get_metadata(self.PKG_INFO)
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 1420, in get_metadata
value = self.get(path)
File "c:\users\user\miniconda3\envs\py3_env\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 1616, in _get
with open(path, 'rb') as stream:
FileNotFoundError: [Errno 2] No such file or directory: 'c:\users\user\miniconda3\envs\py3_env\lib\site-packages\s3transfer-0.3.3.dist-info\METADATA'

Transition score for masked timesteps

Hi, thanks for the great library. I was going through your code and had a doubt regarding the computation of score when a timestep is masked. Specifically, in line 192 of torch_crf/__init__.py:

score += self.transitions[tags[i - 1], tags[i]] * mask[i]

Shouldn't this be:

score += self.transitions[tags[i - 1], tags[i]] * mask[i] * mask[i-1]

As we do not want transition score to be calculated when the previous timestep was masked. Thanks!

Multi-label results?

Hi, I have a use case where I need to compute multi-label results, i.e. each token could be a part of multiple sequences. Does this library support that?

Support half precission

This code not support half precision. This is because it directly use float() instead of using type_as

CRF library does not work as expected

I am using pytorch-crf to resolve tagging issues in NER, where the I-entity tag gets predicted after O (eg. O, I-person_name) or before B-entity tags (eg. I-person_name, B-person_name). Even after using pytorch-cr, I still am seeing decoded tags with the same errors as before.

I inspected the transitions, start_transitions and end_transition attributes of the CRF library (https://github.com/kmkurn/pytorch-crf/blob/master/torchcrf/__init__.py), and found that the library does not update any of these variables. They seem to be randomly initialized and then called for scoring, but these state transition variables are never updated.

Can you please provide a better understanding on why this is case, and probably provide some input on why these tag errors keep happening in-spite of using the CRF layer.

Thanks and Sincerely,
Vijay Ramakrishnan

Multiply -1 before loss.backward()?

Hi, thanks for providing this great tool!

I have one quick question about the loglikelihood returned by forward function:
loss = model(emissions, tags, mask=mask).

Is the returned value the negative loglikelihood or likelihood? Should I multiply -1 before loss.backward()?

Thanks!

Setup proper documentation

All docstrings are already in numpy-style, so the docs can be generated with Sphinx. The docs can be hosted in readthedocs.org. Adding proper tutorial might be nice as well.

Rename kwargs to match PyTorch's convention

PyTorch uses reduce to indicate whether to sum/average loss in a batch. Right now we use summed. It's nice to rename this to reduce, but for backward compatibility maybe we can print a deprecation warning for now.

what pad symbol to use in tags tensor

Hi,

first of all, thanks for making this code available : )

I would like to check something: what padding symbol should we use in the tags tensor?

If my tags go from 0 to 11, I was using 12 as a pad symbol. But it throws an index error in _compute_score. It works if I replace 12 by, say, 11. But since 11 is a valid tag symbol, I just want to be sure that the mask does take care of not considering these values. Or if I should use another value.

loss unstable

II'm interested in using this library for Named Entity Recognition,but something bad happend. I'm using Pytorch to build a model with one embedding layer, one lstm layer, and a crf layer. Model structure is shown below。

class my_model(nn.Module):

  def __init__(self, vocab_size, embedding_size, hidden_size, num_classes, pad_idx):
        super(mymodel, self).__init__()
        self.embed = nn.Embedding(vocab_size, embedding_size, padding_idx=pad_idx)
        self.num_classes = num_classes
        self.lstm = nn.LSTM(embedding_size, hidden_size)
        self.linear = nn.Linear(hidden_size, num_classes)
        self.crf = CRF(self.num_classes)
  def forward(self, texts, labels, masks):
        embedded = self.embed(texts)
        output, (hidden, cell) = model.lstm(embedded)
        out_vocab = model.linear(output)
        # out_vocab: [seq_len, batch_size, num_class]
        # labels: [seq_len, batch_size]
        # masks: [seq_len. batch_size] 
        loss = -(self.crf(out_vocab, labels, mask=masks))
        return loss

The question is that the LOSS is very high and very unstable during training. Loss often jump from 200+ to 20+, and then jump to 500+. I wonder if it's because I'm using this library incorrectly?

The f1 score only increases when the batch size =1

Hi, I have met the same problem as issue # 40.
I use a Bilstm+CRF to do NER tasks, the loss decreases and the f1 remains 0.12.
I find the outputs of the CRF layer are almost all O( a label of ner). So I change my batch size from 8 to 1, then the F1 score increases to 0.91.

Now my model can work well when batch size is 1,I am not sure if there is a problem with the CRF loss function. can you give me some help?

Set some transitions to 0

Hi,

let's say I know for sure that a particular transition is never gonna happen, for example from the label 1 to 2. Is there any way I could force a 0 in the transition matrix?

I understand that the CRF layer is able to figure this out by itself, in fact it's going in the right direction but not enough.

Thank you

Test for Pytorch 0.3.x

Right now the CI only tests for Pytorch 0.3.0. It should test for Pytorch 0.3.1 (and others, if any) as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.