Git Product home page Git Product logo

revscoring's Introduction

Build Status Test coverage GitHub license PyPI version

Revision Scoring

⚠️ Warning: As of late 2023, the ORES infrastructure is being deprecated by the WMF Machine Learning team, please check https://wikitech.wikimedia.org/wiki/ORES for more info.

While the code in this repository may still work, it is unmaintained, and as such may break at any time. Special consideration should also be given to machine learning models seeing drift in quality of predictions.

The replacement for ORES and associated infrastructure is Lift Wing: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing

Some Revscoring models from ORES run on the Lift Wing infrastructure, but they are otherwise unsupported (no new training or code updates).

They can be downloaded from the links documented at: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Revscoring_models_(migrated_from_ORES)

In the long term, some or all these models may be replaced by newer models specifically tailored to be run on modern ML infrastructure like Lift Wing.

If you have any questions, contact the WMF Machine Learning team: https://wikitech.wikimedia.org/wiki/Machine_Learning

A generic, machine learning-based revision scoring system designed to help automate critical wiki-work — for example, vandalism detection and removal. This library powers ORES.

Example

Using a scorer_model to score a revision::

  import mwapi
  from revscoring import Model
  from revscoring.extractors.api.extractor import Extractor

  with open("models/enwiki.damaging.linear_svc.model") as f:
       scorer_model = Model.load(f)

  extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
                                          user_agent="revscoring demo"))

  feature_values = list(extractor.extract(123456789, scorer_model.features))

  print(scorer_model.score(feature_values))
  {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}

Installation

The easiest way to install is via the Python package installer (pip).

pip install revscoring

You may find that some of the dependencies fail to compile (namely scipy, numpy and sklearn). In that case, you'll need to install some dependencies in your operating system.

Ubuntu & Debian:

  • Run sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev enchant
  • Run sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi

MacOS:

Using Homebrew and pip, installing revscoring and enchant can be accomplished as follows::

brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring

Adding languages in aspell (MacOS only)

cd /tmp
wget http://ftp.gnu.org/gnu/aspell/dict/pt/aspell-pt-0.50-2.tar.bz2
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
./configure
make
sudo make install

Caveats:
The differences between the aspell and myspell dictionaries can cause some of the tests to fail

Finally, in order to make use of language features, you'll need to download some NLTK data. The following command will get the necessary corpora.

python -m nltk.downloader omw sentiwordnet stopwords wordnet

You'll also need to install enchant-compatible dictionaries of the languages you'd like to use. We recommend the following:

  • languages.arabic: aspell-ar
  • languages.basque: hunspell-eu
  • languages.bengali: aspell-bn
  • languages.bosnian: hunspell-bs
  • languages.catalan: myspell-ca
  • languages.czech: myspell-cs
  • languages.croatian: myspell-hr
  • languages.dutch: myspell-nl
  • languages.english: myspell-en-us myspell-en-gb myspell-en-au
  • languages.estonian: myspell-et
  • languages.finnish: voikko-fi
  • languages.french: myspell-fr
  • languages.galician: hunspell-gl
  • languages.german: myspell-de-at myspell-de-ch myspell-de-de
  • languages.greek: aspell-el
  • languages.hebrew: myspell-he
  • languages.hindi: aspell-hi
  • languages.hungarian: myspell-hu
  • languages.icelandic: aspell-is
  • languages.indonesian: aspell-id
  • languages.italian: myspell-it
  • languages.latvian: myspell-lv
  • languages.norwegian: myspell-nb
  • languages.persian: myspell-fa
  • languages.polish: aspell-pl
  • languages.portuguese: myspell-pt-pt myspell-pt-br
  • languages.serbian: hunspell-sr
  • languages.spanish: myspell-es
  • languages.swedish: aspell-sv
  • languages.tamil: aspell-ta
  • languages.russian: myspell-ru
  • languages.ukrainian: aspell-uk
  • languages.vietnamese: hunspell-vi

Development

To contribute, ensure to install the dependencies:

$ pip install -r requirements.txt

Install necessary NLTK data:

python -m nltk.downloader omw sentiwordnet stopwords wordnet

Running tests

Make sure you install test dependencies:

$ pip install -r test-requirements.txt

Then run:

$ pytest . -vv

Reporting bugs

To report a bug, please use Phabricator

Authors

revscoring's People

Contributors

5uperpalo avatar accraze avatar adamwight avatar aikochou avatar chrisalbon avatar chtnnh avatar codez266 avatar custozza avatar elukey avatar eranroz avatar haksoat avatar halfak avatar he7d3r avatar isaranto avatar jonasagx avatar kenrick95 avatar kevinbazira avatar kizule avatar ladsgroup avatar marcoaureliowm avatar mariushoch avatar mdew192837 avatar nealmcb avatar pix1234 avatar sahethi avatar seanchen avatar toarushiroineko avatar urstrulykkr avatar xinbenlv avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

revscoring's Issues

TypeError when extracting features from deleted revisions

As reported by Danilo on Trello, if we attempt to extract features from a deleted revision such as
https://pt.wikipedia.org/wiki/?diff=40837381&uselang=en
an error occurs:

Extracting features for http://pt.wikipedia.org/wiki/?oldid=40837381&diff=prev
Traceback (most recent call last):
  File "demonstrate_extractor.py", line 72, in <module>
    features = api_extractor.extract(40837381, extractors)
  File "mypath/Revision-Scoring/revscores/api_extractor.py", line 19, in extract
    return [solve(feature, cache) for feature in features]
  File "mypath/Revision-Scoring/revscores/api_extractor.py", line 19, in <listcomp>
    return [solve(feature, cache) for feature in features]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in solve
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in <listcomp>
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in solve
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in <listcomp>
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in solve
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in <listcomp>
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in solve
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 111, in <listcomp>
    for dependency in dependencies]
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 115, in solve
    value = dependent(*args)
  File "mypath/Revision-Scoring/revscores/util/dependencies.py", line 20, in __call__
    return self.f(*args, **kwargs)
  File "mypath/Revision-Scoring/revscores/datasources/revision_diff.py", line 16, in revision_diff
    b = tokenizer.tokenize(revision_text)
  File "/home/helder/.mypyvenv/lib/python3.4/site-packages/deltas/tokenizers/wikitext_split.py", line 10, in tokenize
    text
  File "/usr/lib/python3.4/re.py", line 206, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

Probability returned by SVC models is useless

The probability returned by SVC models seems to be weighted by rate of input labels. This is lame and causes problems that, in the case predicting low probability events, results in low-probability scores.

For example, this doesn't make sense:

642215410: {
    'prediction': True, 
    'probabilities': [0.75884553,  0.24115447]
}

Note that the second probability which references "True" is 0.24 when it should at least be above 50%.

Many Portuguese badwords are not recognized anymore

Since we are not using stemming anymore, and the badwords list generated for Portuguese was originally created based on stems (21e410f), excluding variations of each word (when its stem was the same as the stem of a word which would be kept in the list), these variations are not recognized as badwords anymore.

For example:

>>> from revscoring.languages import portuguese
>>> [ portuguese.is_badword(x) for x in [ "mentiroso", "mentirosa", "mentirosos", "mentirosas" ] ]
[True, False, True, False]

This was the raw badwords list used in the filtering process:
https://gist.github.com/he7d3r/7e3718a43f5ce65e0dab
This was the script used for filtering:
https://gist.github.com/he7d3r/de5af63ac04338c3bfbf#file-stemtomostfrequentword-py

model_info utility

There should be a model_info utility for revscoring that allows you to view the name/version of a model and any test statistics that were generated.

Something like this.

Reads metadata from a model file. 

Usage:
    model_info -h | --help
    model_info <model-file> [--output=<path>]

Options:
    -h --help        Prints this documentation
    <model-file>     The path to a serialized model file from which to read statistics
    --output=<path>  The path to a file to write metadata [default: <stdout>]

Such that if you ran revscoring model_info models/my_wp10_rf.model, you'd get something like

Name: wp10
Version: 0.3.0
Type: RandomForrest(n_estimators=501, min_samples_leaf=8)

Accuracy: 0.6086513418638886

ROC-AUC:
-----  --------
B      0.817992
C      0.836208
FA     0.942814
GA     0.900372
Start  0.89841
Stub   0.983154
-----  --------

          B     C    FA    GA    Start    Stub
-----  ----  ----  ----  ----  -------  ------
B      1121   526   156   153      203       3
C       774  1380    21   216      483      17
FA      285   118  1971   810        6       0
GA      435   404   480  1803       46       0
Start   340   532     1    16     1931     411
Stub     35    42     3     0      396    2544

Set-based badwords and misspellings detection

It occurs to me that using set operation to detect the introduction of new types of badwords, informals and misspellings might have higher signal than using the content of the diff.

E.g. revision.badwords_set - parent_revision.badwords_set = new_badwords_set

If the last revision contained an instance of the curse "shit", and the editor added a new instance of the word "shit" the set difference would empty. But if the editor added an instance of the word "fuck" then that would show up since it wasn't in the article before.

These types of features should be pretty easy to add to the SpaceDelimited metafeatures. See https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/space_delimited/space_delimited.py#L14

"words_added.py" doesn't detect words containing accented characters

A user added three words in the edit
https://pt.wikipedia.org/w/index.php?diff=40692203
However, api_extractor.extract(40692203, [words_added]) returns [2]. I suspected this was due to the accented letter "É" and replicated the edit on
https://pt.wikipedia.org/w/index.php?diff=40697353
using "E" instead of "É". As expected, api_extractor.extract(40697353, [words_added]) returns [3].

I believe this is because of the regex used to detect words: [a-zA-Z]+.

Constructing a model and scorer is a pain

I was just demonstrating the construction of a scorer and realized that this is too complicated.

    from mw.api import Session

    from revscoring.extractors import APIExtractor
    from revscoring.languages import english
    from revscoring.scorers import MLScorerModel

    api_session = Session("https://en.wikipedia.org/w/api.php")
    extractor = APIExtractor(api_session, english)

    filename = "models/reverts.halfak_mix.trained.model"
    model = MLScorerModel.load(open(filename, 'rb'))

    rev_ids = [105, 642215410, 638307884]
    feature_values = [extractor.extract(id, model.features) for id in rev_ids]
    scores = model.score(feature_values, probabilities=True)
    for rev_id, score in zip(rev_ids, scores):
        print("{0}: {1}".format(rev_id, score))
  1. We have to import the language. This should be captured in the model. The model is useless without knowing which language was used to train it.
  2. model.score() takes a 'probabilities' argument. This is silly and it should be the default. We should just let the scoring model decide what it does and how it outputs.
  3. There should be a simple, default MLScorer that is used to wrap all MLScoringModels so that we don't need to know that this should be wrapped in a LinearSVC(Scorer).

Multi-token badword detection (words with spaces in them)

Some badwords require ngrams. E.g. "cock" and "sucker" are not really bad words on their own. This seems to be much more common outside of English.

Right now, we handle badword detection by running the is_badword function on the output of revscoring.datasource.revision.words and revscoring.datasources.diff.words_added. This results in the potential to match one word-token with a regex. We need multi-token word support.

Feature user.is_bot errors out with None 'groups'

Jul 21 19:31:31 ores-worker-02 celery[10062]: RuntimeError: Failed to process <user.is_bot>: 'NoneType' object has no attribute 'groups'
Jul 21 20:55:17 ores-worker-02 celery[10062]: [2015-07-21 20:55:17,011: ERROR/MainProcess] Task ores.score_processors.celery._process[ptwiki:re
Jul 21 20:55:17 ores-worker-02 celery[10062]: Traceback (most recent call last):
Jul 21 20:55:17 ores-worker-02 celery[10062]: File "/srv/ores/venv/lib/python3.4/site-packages/celery/app/trace.py", line 240, in trace_task
Jul 21 20:55:17 ores-worker-02 celery[10062]: R = retval = fun(*args, **kwargs)
Jul 21 20:55:17 ores-worker-02 celery[10062]: File "/srv/ores/venv/lib/python3.4/site-packages/celery/app/trace.py", line 438, in __protected_c
Jul 21 20:55:17 ores-worker-02 celery[10062]: return self.run(*args, **kwargs)
Jul 21 20:55:17 ores-worker-02 celery[10062]: File "/srv/ores/venv/lib/python3.4/site-packages/ores/score_processors/celery.py", line 32, in _p
Jul 21 20:55:17 ores-worker-02 celery[10062]: score = scoring_context.score(model, cache)
Jul 21 20:55:17 ores-worker-02 celery[10062]: File "/srv/ores/venv/lib/python3.4/site-packages/ores/scoring_contexts/scoring_context.py", line 
Jul 21 20:55:17 ores-worker-02 celery[10062]: feature_values = list(self.solve(model, cache))
Jul 21 20:55:17 ores-worker-02 celery[10062]: File "/srv/ores/venv/lib/python3.4/site-packages/revscoring/dependencies/functions.py", line 240,
Jul 21 20:55:17 ores-worker-02 celery[10062]: value, cache, history = _solve(dependent, context, cache)
Jul 21 20:55:17 ores-worker-02 celery[10062]: File "/srv/ores/venv/lib/python3.4/site-packages/revscoring/dependencies/functions.py", line 231,
Jul 21 20:55:17 ores-worker-02 celery[10062]: .format(dependent, e), str(e))
Jul 21 20:55:17 ores-worker-02 celery[10062]: RuntimeError: Failed to process <user.is_bot>: 'NoneType' object has no attribute 'groups'

revscoring-requests version issue.

Running even the plain binary ores command does not work:

root@ores-worker-01:/srv/ores# ./venv/bin/ores 
Traceback (most recent call last):
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 639, in _build_master
    ws.require(__requires__)
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 940, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 832, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (requests 2.7.0 (/srv/ores/venv/lib/python3.4/site-packages), Requirement.parse('requests==2.5.3'), {'revscoring'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./venv/bin/ores", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 3057, in <module>
    working_set = WorkingSet._build_master()
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 641, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 654, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/srv/ores/venv/lib/python3.4/site-packages/pkg_resources/__init__.py", line 832, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (requests 2.7.0 (/srv/ores/venv/lib/python3.4/site-packages), Requirement.parse('requests==2.5.3'), {'revscoring'})
root@ores-worker-01:/srv/ores# ls

List of features from ORES

diff.added_badwords_ratio,
diff.added_markup_chars_ratio,
diff.added_misspellings_ratio,
diff.added_number_chars_ratio,
diff.added_symbolic_chars_ratio,
diff.added_uppercase_chars_ratio,
diff.badwords_added,
diff.badwords_removed,
diff.chars_added,
diff.chars_removed,
diff.longest_repeated_char_added,
diff.longest_token_added,
diff.markup_chars_added,
diff.markup_chars_removed,
diff.misspellings_added,
diff.misspellings_removed,
diff.numeric_chars_added,
diff.numeric_chars_removed,
diff.proportion_of_badwords_added,
diff.proportion_of_badwords_removed,
diff.proportion_of_chars_added,
diff.proportion_of_chars_removed,
diff.proportion_of_markup_chars_added,
diff.proportion_of_misspellings_added,
diff.proportion_of_misspellings_removed,
diff.proportion_of_numeric_chars_added,
diff.proportion_of_symbolic_chars_added,
diff.proportion_of_uppercase_chars_added,
diff.removed_badwords_ratio,
diff.removed_misspellings_ratio,
diff.segments_added,
diff.segments_removed,
diff.symbolic_chars_added,
diff.symbolic_chars_removed,
diff.uppercase_chars_added,
diff.uppercase_chars_removed,
diff.words_added,
diff.words_removed,
diff.bytes_changed,
diff.bytes_changed_ratio,
page.age,
page.is_mainspace,
page.is_content_namespace,
parent_revision.badwords,
parent_revision.bytes,
parent_revision.chars,
parent_revision.markup_chars,
parent_revision.misspellings,
parent_revision.numeric_chars,
parent_revision.proportion_of_badwords,
parent_revision.proportion_of_markup_chars,
parent_revision.proportion_of_misspellings,
parent_revision.proportion_of_numeric_chars,
parent_revision.proportion_of_symbolic_chars,
parent_revision.proportion_of_uppercase_chars,
parent_revision.revision_bytes,
parent_revision.seconds_since,
parent_revision.symbolic_chars,
parent_revision.uppercase_chars,
parent_revision.was_same_user,
parent_revision.words,
previous_user_revision.seconds_since,
revision.badwords,
revision.bytes,
revision.category_links,
revision.chars,
revision.cite_templates,
revision.day_of_week,
revision.has_custom_comment,
revision.has_section_comment,
revision.hour_of_day,
revision.image_links,
revision.infobox_templates,
revision.infonoise,
revision.internal_links,
revision.level_1_headings,
revision.level_2_headings,
revision.level_3_headings,
revision.level_4_headings,
revision.level_5_headings,
revision.level_6_headings,
revision.markup_chars,
revision.misspellings,
revision.numeric_chars,
revision.proportion_of_badwords,
revision.proportion_of_markup_chars,
revision.proportion_of_misspellings,
revision.proportion_of_numeric_chars,
revision.proportion_of_symbolic_chars,
revision.proportion_of_templated_references,
revision.proportion_of_uppercase_chars,
revision.ref_tags,
revision.symbolic_chars,
revision.templates,
revision.uppercase_chars,
revision.words,
user.age,
user.is_anon,
user.is_bot

All language utilities imported by 'revscoring.languages' (with proposal)

This causes a lot of trouble.

  1. A lot of stuff is loaded even if only one LanguageUtility is needed.
  2. All aspell and myspell packages must be installed for the system to work at all.

So, these language utilities should be imported on demand. This is difficult to accomplish due to the structure of the 'languages' module.

Right now, languages/init.py imports all of the languages. For example:

from .english import english
from .french import french
# etc.

I propose that we replace this pattern with a dynamic import module. I haven't been able to get that one to work in my python 3.4 environment yet though -- so I expect we'll need to do a bit of research.

Is the wp10 model stable?

Can you build a weighted mean of the predicted classes see stability/slow progression towards quality over time? This would suggest that the weighted mean is a believable measure of sub-class quality improvements.

Booleans returned by numpy are not JSON serializable.

>>> from revscoring.scorers import MLScorerModel
>>> import json
>>> scorer = MLScorerModel.load(open("models/reverts.halfak_mix.trained.model", 'rb'))
>>> score_doc = next(scorer.score([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]]))
>>> score_doc
{'prediction': True, 'probability': {False: 0.91828775500977355, True: 0.081712244990226446}}
>>> json.dumps(score_doc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.4/json/encoder.py", line 192, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.4/json/encoder.py", line 250, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.4/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: True is not JSON serializable

See http://bugs.python.org/issue18303

AttributeError: type object 'Namespace' has no attribute 'from_doc'

I just updated my local copy of the repo and got this when running the tests:

$ nosetests
..E...........................................................
======================================================================
ERROR: revscores.datasources.tests.test_namespaces.test_namespaces
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/helder/.mypyvenv/lib/python3.4/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/helder/Revision-Scoring/revscores/datasources/tests/test_namespaces.py", line 39, in test_namespaces
    nses = namespaces(fake_si_doc)
  File "/home/helder/Revision-Scoring/revscores/dependent.py", line 18, in __call__
    return self.process(*args, **kwargs)
  File "/home/helder/Revision-Scoring/revscores/datasources/namespaces.py", line 18, in process
    namespaces[ns_id] = mw.Namespace.from_doc(ns_doc, aliases=aliases)
nose.proxy.AttributeError: type object 'Namespace' has no attribute 'from_doc'
-------------------- >> begin captured logging << --------------------
revscores.dependent: DEBUG: Executing <namespaces>.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 62 tests in 2.547s

FAILED (errors=1)

ImportError (cannot import name 'WikitextSplit')

(3.4) helder@std:~/projects/revscoring
$nosetests
EE....EE...
======================================================================
ERROR: Failure: ImportError (cannot import name 'WikitextSplit')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.4/imp.py", line 245, in load_module
    return load_package(name, filename)
  File "/usr/lib/python3.4/imp.py", line 217, in load_package
    return methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1471, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/home/helder/projects/revscoring/revscoring/datasources/__init__.py", line 1, in <module>
    from .contiguous_segments_added import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/contiguous_segments_added.py", line 4, in <module>
    from .revision_diff import revision_diff
  File "/home/helder/projects/revscoring/revscoring/datasources/revision_diff.py", line 4, in <module>
    from deltas.tokenizers import WikitextSplit
ImportError: cannot import name 'WikitextSplit'

======================================================================
ERROR: Failure: ImportError (cannot import name 'WikitextSplit')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.4/imp.py", line 245, in load_module
    return load_package(name, filename)
  File "/usr/lib/python3.4/imp.py", line 217, in load_package
    return methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1471, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/home/helder/projects/revscoring/revscoring/features/__init__.py", line 2, in <module>
    from .added_badwords_ratio import added_badwords_ratio
  File "/home/helder/projects/revscoring/revscoring/features/added_badwords_ratio.py", line 2, in <module>
    from .proportion_of_badwords_added import proportion_of_badwords_added
  File "/home/helder/projects/revscoring/revscoring/features/proportion_of_badwords_added.py", line 2, in <module>
    from .badwords_added import badwords_added
  File "/home/helder/projects/revscoring/revscoring/features/badwords_added.py", line 3, in <module>
    from ..datasources import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/__init__.py", line 1, in <module>
    from .contiguous_segments_added import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/contiguous_segments_added.py", line 4, in <module>
    from .revision_diff import revision_diff
  File "/home/helder/projects/revscoring/revscoring/datasources/revision_diff.py", line 4, in <module>
    from deltas.tokenizers import WikitextSplit
ImportError: cannot import name 'WikitextSplit'

======================================================================
ERROR: Failure: ImportError (cannot import name 'WikitextSplit')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.4/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.4/imp.py", line 171, in load_source
    module = methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1471, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/home/helder/projects/revscoring/revscoring/scorers/tests/test_scorer.py", line 5, in <module>
    from ...features import badwords_added, misspellings_added
  File "/home/helder/projects/revscoring/revscoring/features/__init__.py", line 2, in <module>
    from .added_badwords_ratio import added_badwords_ratio
  File "/home/helder/projects/revscoring/revscoring/features/added_badwords_ratio.py", line 2, in <module>
    from .proportion_of_badwords_added import proportion_of_badwords_added
  File "/home/helder/projects/revscoring/revscoring/features/proportion_of_badwords_added.py", line 2, in <module>
    from .badwords_added import badwords_added
  File "/home/helder/projects/revscoring/revscoring/features/badwords_added.py", line 3, in <module>
    from ..datasources import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/__init__.py", line 1, in <module>
    from .contiguous_segments_added import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/contiguous_segments_added.py", line 4, in <module>
    from .revision_diff import revision_diff
  File "/home/helder/projects/revscoring/revscoring/datasources/revision_diff.py", line 4, in <module>
    from deltas.tokenizers import WikitextSplit
ImportError: cannot import name 'WikitextSplit'

======================================================================
ERROR: Failure: ImportError (cannot import name 'WikitextSplit')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.4/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.4/imp.py", line 171, in load_source
    module = methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1471, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/home/helder/projects/revscoring/revscoring/scorers/tests/test_svc.py", line 8, in <module>
    from ...features import Feature
  File "/home/helder/projects/revscoring/revscoring/features/__init__.py", line 2, in <module>
    from .added_badwords_ratio import added_badwords_ratio
  File "/home/helder/projects/revscoring/revscoring/features/added_badwords_ratio.py", line 2, in <module>
    from .proportion_of_badwords_added import proportion_of_badwords_added
  File "/home/helder/projects/revscoring/revscoring/features/proportion_of_badwords_added.py", line 2, in <module>
    from .badwords_added import badwords_added
  File "/home/helder/projects/revscoring/revscoring/features/badwords_added.py", line 3, in <module>
    from ..datasources import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/__init__.py", line 1, in <module>
    from .contiguous_segments_added import contiguous_segments_added
  File "/home/helder/projects/revscoring/revscoring/datasources/contiguous_segments_added.py", line 4, in <module>
    from .revision_diff import revision_diff
  File "/home/helder/projects/revscoring/revscoring/datasources/revision_diff.py", line 4, in <module>
    from deltas.tokenizers import WikitextSplit
ImportError: cannot import name 'WikitextSplit'

----------------------------------------------------------------------
Ran 11 tests in 2.422s

FAILED (errors=4)

Add documentation for features

Right now, finding out which features are implemented and what they return is very difficult. Add documentation for features and get it up on pythonhosted.org

Allow scoring the diff between two arbitrary revisions

It might happen that a vandal edits the same page many times, and each of the edits has a low probability of being reverted, and still the whole set of edits, if looked at as a single edit (e.g. by enabling the enhanced recent changes preference), would have a high probability of being reverted.

In order to have predictions for these sequential edits, it seems necessary to be able to score a revision by comparing it with an older revision than the previous (parent) revision.

E.g.: I want to know the probability of this diff being reverted:
https://pt.wikipedia.org/w/index.php?diff=42204427&oldid=42203059
instead of the probabilities of each of the intermediary diffs for that page:

Numpy install required before scipy dep. can be built

For some reason, scipy doesn't express a dependency on numpy -- yet it will not install without numpy being installed first.

Either document that this is required or figured out how to upstream the fix to scipy (and close this as invalid).

revscoring.dependent.DependencyError: Failed to process <is_stopword>

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 102, in _solve
    value = dependent(*args)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 25, in __call__
    return self.process(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/languages/language.py", line 22, in not_implemented_processor
    raise NotImplementedError()
NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/utilities/extract_features.py", line 89, in run
    for v in list(values) + [label]))
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 39, in solve_many
    value, cache, history = _solve(dependent, cache)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 96, in _solve
    value, cache, history = _solve(dependency, cache, history)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 96, in _solve
    value, cache, history = _solve(dependency, cache, history)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 96, in _solve
    value, cache, history = _solve(dependency, cache, history)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 105, in _solve
    .format(dependent, e), e)
revscoring.dependent.DependencyError: Failed to process <is_stopword>:

`RevisionDocumentNotFound` when scoring new pages

It seems revscoring is not able to score new pages:
http://ores.wmflabs.org/scores/ptwiki/?models=reverted&revids=42835908|40200979

{
  "40200979": {
    "reverted": {
      "error": {
        "message": "Failed to process <parent_revision.metadata>: RevisionDocumentNotFound",
        "type": "<class 'revscoring.dependencies.errors.DependencyError'>"
      }
    }
  },
  "42835908": {
    "reverted": {
      "error": {
        "message": "Failed to process <parent_revision.metadata>: RevisionDocumentNotFound",
        "type": "<class 'revscoring.dependencies.errors.DependencyError'>"
      }
    }
  }
}

Content replacement features -- use removed content to inform measures of added content in diffs

revscoring and AbuseFilter (and other tools) allow to catch easily vandalism that use some "bad regex"/bad words. However, the existing tools don't have ability to identify word replacements:
E.g "Barack Obama is president" => "Barack Obama is terrorist". While "terrorist" is not a bad word, a replacement of some other word to terrorist is most probably bad.

While it isn't always obvious there is "alignment" between words in the previous and the new revisions, if such exist the tool can use it.

Bug from refractoring: max() arg is an empty sequence

....Traceback (most recent call last):
  File "/home/eva/Github/Objective-Revision-Evaluation-Service/ores/features_reverted.py", line 94, in run
    print('\t'.join(str(v) for v in (list(values) + [reverted])))
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 34, in solve_many
    value, cache, history = _solve(dependent, cache)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 91, in _solve
    value, cache, history = _solve(dependency, cache, history)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 91, in _solve
    value, cache, history = _solve(dependency, cache, history)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 95, in _solve
    value = dependent(*values)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/features/feature.py", line 31, in __call__
    value = super().__call__(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/dependent.py", line 20, in __call__
    return self.process(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/revscoring-0.1.0-py3.4.egg/revscoring/features/diff.py", line 137, in process_longest_repeated_char_added
    for segment in diff_added_segments
ValueError: max() arg is an empty sequence

Can't pickle languages

$ python
Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from revscoring.languages import english
>>> import pickle
>>> foo = pickle.dumps(english)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <function is_misspelled_process at 0x7f85a844f378>: attribute lookup is_misspelled_process on revscoring.languages.english failed

Pep8 issues in revscoring/languages/persian.py

I'd just solve these issues myself, but it's really hard to work in LTR and know that I'm not breaking any of the regexes.

$ flake8 revscoring
revscoring/languages/persian.py:117:80: E501 line too long (88 > 79 characters)
revscoring/languages/persian.py:118:80: E501 line too long (88 > 79 characters)
revscoring/languages/persian.py:118:88: E225 missing whitespace around operator
revscoring/languages/persian.py:119:80: E501 line too long (85 > 79 characters)
revscoring/languages/persian.py:120:80: E501 line too long (86 > 79 characters)
revscoring/languages/persian.py:120:86: E225 missing whitespace around operator
revscoring/languages/persian.py:123:80: E501 line too long (80 > 79 characters)
revscoring/languages/persian.py:145:80: E501 line too long (80 > 79 characters)

turkish.py uses STEMMER before defining it

Since a32214a I'm getting this:

(3.4) helder@std:~/projects/revscoring$
nosetests
.............................................................E...
======================================================================
ERROR: Failure: NameError (name 'STEMMER' is not defined)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/helder/env/3.4/lib/python3.4/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/usr/lib/python3.4/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/usr/lib/python3.4/imp.py", line 171, in load_source
    module = methods.load()
  File "<frozen importlib._bootstrap>", line 1220, in load
  File "<frozen importlib._bootstrap>", line 1200, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1129, in _exec
  File "<frozen importlib._bootstrap>", line 1471, in exec_module
  File "<frozen importlib._bootstrap>", line 321, in _call_with_frames_removed
  File "/home/helder/projects/revscoring/revscoring/languages/tests/test_turkish.py", line 3, in <module>
    from ..turkish import turkish
  File "/home/helder/projects/revscoring/revscoring/languages/turkish.py", line 56, in <module>
    "yarrak"
  File "/home/helder/projects/revscoring/revscoring/languages/turkish.py", line 10, in <genexpr>
    BADWORDS = set(STEMMER.stem(w) for w in [
NameError: name 'STEMMER' is not defined

----------------------------------------------------------------------
Ran 65 tests in 19.920s

FAILED (errors=1)

Proposal: Multi-lingual feature sets

Right now, a feature extraction is limited to the use of a single language. For example, revscoring.features.diff.badwords_added depends on the language utility languages.is_badword. as a result, a feature list can only have a count of "badwords_added" as identified by one "language". The result is that we have a lot of mixture in our badwords sets and we're not poised to support multi-lingual wikis like Commons and WikiData.

I propose that we convert the concept of a languages from a context (in which feature extraction happens) to a feature set with the necessary context baked in. This would mean that we can use multiple language features in parallel. E.g.

badwords = [
    revision.bytes,
    diff.bytes_changed,
    english.diff.badwords_added,
    portuguese.diff.badwords_added,
    persian.diff.badwords_added,
    ...
]

This would also mean that we wouldn't need to associate a revscoring.languages.Language with a model -- just the set of features that were used to build the model. That would substantially reduce the complication and potential mistakes involved in generating and using model files.

Math domain error when processing imported revisions (user.age)

Error when processing rev_id 408030634 in enwiki. It looks like the revision is an import with a very old timestamp.

RuntimeError('Failed to process <log((user.age + 1))>: math domain error',)
Traceback (most recent call last):
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/ores-0.2.0-py3.4.egg/ores/score_processors/celery.py", line 33, in _process
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/ores-0.2.0-py3.4.egg/ores/scoring_contexts/scoring_context.py", line 46, in score
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring-0.4.0-py3.4.egg/revscoring/dependencies/functions.py", line 240, in _solve_many
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/revscoring-0.4.0-py3.4.egg/revscoring/dependencies/functions.py", line 231, in _solve
RuntimeError: Failed to process <log((user.age + 1))>: math domain error

See original report here. https://github.com/wiki-ai/ores/issues/60

Make all datasource/feature/dependencies-generally and their value JSONable

Right now, many datasources return values that cannot be encoded in JSON.

This is a bummer because it would be better if we could use the JSON serializer within ORES's celery.

This is the error we get when trying to use the JSON serializer within ORES for non-JSON serializable datasources:

3784623 HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/halfak/projects/ores/ores/wsgi/routes/scores.py", line 102, in score_revisions
    precache=precache)
  File "/home/halfak/projects/ores/ores/score_processors/score_processor.py", line 25, in score
    scores = self._score(context, model, rev_ids, caches=caches)
  File "/home/halfak/projects/ores/ores/score_processors/celery.py", line 146, in _score
    caches=caches))
  File "/home/halfak/projects/ores/ores/score_processors/celery.py", line 97, in _score_in_celery
    task_id=id_string
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/celery/app/task.py", line 559, in apply_async
    **dict(self._get_exec_options(), **options)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/celery/app/base.py", line 353, in send_task
    reply_to=reply_to or self.oid, **options
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/celery/app/amqp.py", line 305, in publish_task
    **kwargs
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/messaging.py", line 165, in publish
    compression, headers)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/messaging.py", line 241, in _prepare
    body) = dumps(body, serializer=serializer)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/serialization.py", line 164, in dumps
    payload = encoder(data)
  File "/usr/lib/python3.4/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/serialization.py", line 59, in _reraise_errors
    reraise(wrapper, wrapper(exc), sys.exc_info()[2])
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/five.py", line 132, in reraise
    raise value.with_traceback(tb)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/serialization.py", line 55, in _reraise_errors
    yield
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/kombu/serialization.py", line 164, in dumps
    payload = encoder(data)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/anyjson/__init__.py", line 141, in dumps
    return implementation.dumps(value)
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/anyjson/__init__.py", line 89, in dumps
    raise TypeError(TypeError(*exc.args)).with_traceback(sys.exc_info()[2])
  File "/home/halfak/env/3.4/lib/python3.4/site-packages/anyjson/__init__.py", line 87, in dumps
    return self._encode(data)
  File "/usr/lib/python3.4/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.4/json/encoder.py", line 192, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.4/json/encoder.py", line 250, in iterencode
    return _iterencode(o, 0)
kombu.exceptions.EncodeError: keys must be a string

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.