$ python main.py --epoch-millis 1686070222583 --headers -o ./
Timestamp of latest rating in data: 2023-05-28 00:31:49.395000
Timestamp of latest note in data: 2023-05-28 00:30:46.699000
total notes added to noteStatusHistory: 0
Preprocess Data: Filter misleading notes, starting with 4243086 ratings on 98052 notes
Keeping 3077820 ratings on 67233 misleading notes
Keeping 219979 ratings on 8448 deleted notes that were previously scored (in note status history)
Removing 69184 ratings on 3421 older notes that aren't deleted, but are not-misleading.
Removing 11994 ratings on 1727 notes that were deleted and not in note status history (e.g. old).
Num Ratings: 4161908, Num Unique Notes Rated: 92904, Num Unique Raters: 111825
Identifying core notes and ratings
Total ratings: 4161908
Ratings from user without modelingPopulation: 0
Total notes: 112020
Total notes with ratings: 92904
Total core notes: 106666
Total expansion notes: 5354
Core ratings: 3803097
Filter notes and ratings with too few ratings
After Filtering Notes w/less than 5 Ratings, Num Ratings: 3761285, Num Unique Notes Rated: 70266, Num Unique Raters: 78386
After Filtering Raters w/less than 10 Notes, Num Ratings: 3621122, Num Unique Notes Rated: 70266, Num Unique Raters: 39076
After Final Filtering of Notes w/less than 5 Ratings, Num Ratings: 3619999, Num Unique Notes Rated: 69975, Num Unique Raters: 39076
------------------
Users: 39076, Notes: 69975
cpu
epoch 0 6.531041145324707
TRAIN FIT LOSS: 6.066281795501709
epoch 50 0.12861372530460358
TRAIN FIT LOSS: 0.09712684154510498
epoch 100 0.11354893445968628
TRAIN FIT LOSS: 0.08736852556467056
Num epochs: 144
epoch 144 0.11346116662025452
TRAIN FIT LOSS: 0.08727223426103592
Global Intercept: 0.15957853198051453
Applying scoring rule: InitialNMR (v1.0)
Applying scoring rule: GeneralCRH (v1.0)
Applying scoring rule: LcbCRH (v1.0)
Applying scoring rule: GeneralCRNH (v1.0)
Applying scoring rule: UcbCRNH (v1.0)
Applying scoring rule: NmCRNH (v1.0)
Total ratings: 3579098 post-tombstones and 223999 pre-tombstones
Total ratings created before statuses: 556087, including 486157 post-tombstones and 69930 pre-tombstones.
Total valid ratings: 192160
Unique Raters: 39076
People (Authors or Raters) With Helpfulness Scores: 32863
Raters Included Based on Helpfulness Scores: 26273
Included Raters who have rated at least 1 note in the final dataset: 23496
Number of Ratings Used For 1st Training: 3619999
Number of Ratings for Final Training: 2698602
------------------
Users: 23496, Notes: 69970
initializing notes
initializing users
cpu
epoch 0 0.40559813380241394
TRAIN FIT LOSS: 0.3399314880371094
epoch 50 0.11206676810979843
TRAIN FIT LOSS: 0.08626192808151245
epoch 100 0.11129889637231827
TRAIN FIT LOSS: 0.08425898849964142
Num epochs: 115
epoch 115 0.11129649728536606
TRAIN FIT LOSS: 0.08428216725587845
Global Intercept: 0.1628568470478058
------------------
Re-scoring all notes with extra rating added: {'internalRaterIntercept': None, 'internalRaterFactor1': None, 'helpfulNum': None}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.14149416983127594
TRAIN FIT LOSS: 0.11726519465446472
epoch 50 0.11138982325792313
TRAIN FIT LOSS: 0.0844133123755455
Num epochs: 96
epoch 96 0.11130642890930176
TRAIN FIT LOSS: 0.08429824560880661
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-1', 'raterIndex': 23496, 'internalRaterIntercept': -0.20972191, 'internalRaterFactor1': -0.9928637, 'helpfulNum': 0.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16229085624217987
TRAIN FIT LOSS: 0.13822472095489502
epoch 50 0.11394057422876358
TRAIN FIT LOSS: 0.08843617886304855
Num epochs: 85
epoch 85 0.11384273320436478
TRAIN FIT LOSS: 0.08817018568515778
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-1', 'raterIndex': 23496, 'internalRaterIntercept': -0.20972191, 'internalRaterFactor1': -0.9928637, 'helpfulNum': 1.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1719895601272583
TRAIN FIT LOSS: 0.12891653180122375
epoch 50 0.12944582104682922
TRAIN FIT LOSS: 0.10115989297628403
Num epochs: 84
epoch 84 0.12934477627277374
TRAIN FIT LOSS: 0.10133105516433716
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-2', 'raterIndex': 23497, 'internalRaterIntercept': -0.20972191, 'internalRaterFactor1': 0.0, 'helpfulNum': 0.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1564764380455017
TRAIN FIT LOSS: 0.13679614663124084
epoch 50 0.11024019122123718
TRAIN FIT LOSS: 0.08390671759843826
Num epochs: 65
epoch 65 0.11015045642852783
TRAIN FIT LOSS: 0.08361467719078064
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-2', 'raterIndex': 23497, 'internalRaterIntercept': -0.20972191, 'internalRaterFactor1': 0.0, 'helpfulNum': 1.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1625971496105194
TRAIN FIT LOSS: 0.1218295469880104
epoch 50 0.12860900163650513
TRAIN FIT LOSS: 0.0998450443148613
Num epochs: 80
epoch 80 0.12854285538196564
TRAIN FIT LOSS: 0.09984190762042999
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-3', 'raterIndex': 23498, 'internalRaterIntercept': -0.20972191, 'internalRaterFactor1': 0.8534382, 'helpfulNum': 0.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16249951720237732
TRAIN FIT LOSS: 0.1381371021270752
epoch 50 0.11279866099357605
TRAIN FIT LOSS: 0.08706667274236679
Num epochs: 79
epoch 79 0.11270956695079803
TRAIN FIT LOSS: 0.08679283410310745
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-3', 'raterIndex': 23498, 'internalRaterIntercept': -0.20972191, 'internalRaterFactor1': 0.8534382, 'helpfulNum': 1.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.17296093702316284
TRAIN FIT LOSS: 0.12987180054187775
epoch 50 0.1286552995443344
TRAIN FIT LOSS: 0.09953145682811737
Num epochs: 93
epoch 93 0.1285542994737625
TRAIN FIT LOSS: 0.09970027208328247
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-4', 'raterIndex': 23499, 'internalRaterIntercept': 0.5991039, 'internalRaterFactor1': -0.9928637, 'helpfulNum': 0.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.17510266602039337
TRAIN FIT LOSS: 0.1509304940700531
epoch 50 0.13192355632781982
TRAIN FIT LOSS: 0.10659254342317581
Num epochs: 76
epoch 76 0.131842240691185
TRAIN FIT LOSS: 0.10650181025266647
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-4', 'raterIndex': 23499, 'internalRaterIntercept': 0.5991039, 'internalRaterFactor1': -0.9928637, 'helpfulNum': 1.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16212572157382965
TRAIN FIT LOSS: 0.13665850460529327
epoch 50 0.11367649585008621
TRAIN FIT LOSS: 0.08795454353094101
Num epochs: 79
epoch 79 0.11358115077018738
TRAIN FIT LOSS: 0.08765255659818649
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-5', 'raterIndex': 23500, 'internalRaterIntercept': 0.5991039, 'internalRaterFactor1': 0.0, 'helpfulNum': 0.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.17736336588859558
TRAIN FIT LOSS: 0.1557299792766571
epoch 50 0.13043750822544098
TRAIN FIT LOSS: 0.1049032136797905
Num epochs: 66
epoch 66 0.1303536295890808
TRAIN FIT LOSS: 0.10477226972579956
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-5', 'raterIndex': 23500, 'internalRaterIntercept': 0.5991039, 'internalRaterFactor1': 0.0, 'helpfulNum': 1.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.15023091435432434
TRAIN FIT LOSS: 0.12547419965267181
epoch 50 0.11003502458333969
TRAIN FIT LOSS: 0.08325959742069244
Num epochs: 97
epoch 97 0.10993592441082001
TRAIN FIT LOSS: 0.0831153616309166
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-6', 'raterIndex': 23501, 'internalRaterIntercept': 0.5991039, 'internalRaterFactor1': 0.8534382, 'helpfulNum': 0.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.17631010711193085
TRAIN FIT LOSS: 0.15190833806991577
epoch 50 0.13062815368175507
TRAIN FIT LOSS: 0.10499828308820724
Num epochs: 88
epoch 88 0.13054625689983368
TRAIN FIT LOSS: 0.10486648976802826
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-6', 'raterIndex': 23501, 'internalRaterIntercept': 0.5991039, 'internalRaterFactor1': 0.8534382, 'helpfulNum': 1.0}
------------------
Users: 23502, Notes: 69970
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16241049766540527
TRAIN FIT LOSS: 0.13594377040863037
epoch 50 0.11258627474308014
TRAIN FIT LOSS: 0.08652327209711075
Num epochs: 79
epoch 79 0.11249891668558121
TRAIN FIT LOSS: 0.08625540137290955
/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/incorrect_filter.py:59: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ratings_w_user_totals.drop(
/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/incorrect_filter.py:59: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ratings_w_user_totals.drop(
Applying scoring rule: InitialNMR (v1.0)
Applying scoring rule: GeneralCRH (v1.0)
Applying scoring rule: LcbCRH (v1.0)
Applying scoring rule: GeneralCRNH (v1.0)
Applying scoring rule: UcbCRNH (v1.0)
Applying scoring rule: NmCRNH (v1.0)
Applying scoring rule: GeneralCRHInertia (v1.0)
Applying scoring rule: TagFilter (v1.0)
CRH notes prior to tag filtering: 7349
CRH notes above crhSuperThreshold: 1894
Checking note tags:
notHelpfulOther
ratio threshold: 0.05310741536536798
notHelpfulIncorrect
ratio threshold: 0.029573177485621466
notHelpfulSourcesMissingOrUnreliable
ratio threshold: 0.09141551110962542
notHelpfulOpinionSpeculationOrBias
ratio threshold: 0.0
notHelpfulMissingKeyPoints
ratio threshold: 0.10171619204365726
notHelpfulOutdated
ratio threshold: 0.0
notHelpfulHardToUnderstand
ratio threshold: 0.05382063690199473
outlier filtering disabled for tag: notHelpfulHardToUnderstand
notHelpfulArgumentativeOrBiased
ratio threshold: 0.05033946436374779
notHelpfulOffTopic
ratio threshold: 0.0
notHelpfulSpamHarassmentOrAbuse
ratio threshold: 0.0002994736018899559
notHelpfulIrrelevantSources
ratio threshold: 0.03902540974158701
notHelpfulOpinionSpeculation
ratio threshold: 0.07229388729970718
notHelpfulNoteNotNeeded
ratio threshold: 0.11037072746701154
outlier filtering disabled for tag: notHelpfulNoteNotNeeded
Total {note, tag} pairs where tag filter logic triggered: 381
Total unique notes impacted by tag filtering: 315
Applying scoring rule: CRHSuperThreshold (v1.0)
Applying scoring rule: ElevatedCRHInertia (v1.0)
Applying scoring rule: FilterIncorrect (v1.0)
Total notes impacted by incorrect filtering: 89
Traceback (most recent call last):
File "/Users/[REDACTED]/git/communitynotes/sourcecode/main.py", line 96, in <module>
main()
File "/Users/[REDACTED]/git/communitynotes/sourcecode/main.py", line 84, in main
scoredNotes, helpfulnessScores, newStatus, auxNoteInfo = run_scoring(
^^^^^^^^^^^^
File "/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/run_scoring.py", line 469, in run_scoring
scoredNotes, helpfulnessScores, auxiliaryNoteInfo = _run_scorers(
^^^^^^^^^^^^^
File "/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/run_scoring.py", line 151, in _run_scorers
modelResultsAndTimes = [
^
File "/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/run_scoring.py", line 152, in <listcomp>
_run_scorer_parallelizable(s, ratings, noteStatusHistory, userEnrollment) for s in scorers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/run_scoring.py", line 103, in _run_scorer_parallelizable
result = ModelResult(*scorer.score(ratings, noteStatusHistory, userEnrollment))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/[REDACTED]/git/communitynotes/sourcecode/scoring/scorer.py", line 108, in score
assert set(noteScores.columns) == set(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: all columns must be either dropped or explicitly defined in an output.
Extra columns that were in noteScores: {'raterParticipantId_interval', 'raterParticipantId_same'}
Missing expected columns that should've been in noteScores: set()
$ python --version
Python 3.11.3
contourpy==1.0.7
cycler==0.11.0
filelock==3.12.0
fonttools==4.39.4
Jinja2==3.1.2
kiwisolver==1.4.4
MarkupSafe==2.1.3
matplotlib==3.7.1
mpmath==1.3.0
networkx==3.1
numpy==1.24.3
packaging==23.1
pandas==2.0.2
Pillow==9.5.0
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2023.3
six==1.16.0
sympy==1.12
torch==2.0.1
typing_extensions==4.6.3
tzdata==2023.3