kippnorcal / google_classroom Goto Github PK

View Code? Open in Web Editor NEW

21.0 5.0 9.0 234 KB

Google Classroom Data Pipeline

License: GNU General Public License v3.0

Dockerfile 1.29% Python 98.48% Shell 0.23%

pipeline docker-compose flake8 google-classroom-api google-classroom googleapiclient google-oauth2 pandas sqlsorcery

google_classroom's People

Contributors

Stargazers

Watchers

Forkers

innovateedu-nyc denglender

google_classroom's Issues

Adding Notifications

The current job would benefit from a way to notify success/errors when running as a job. We do this standard by adding a mailer that emails a Slack channel, but am wondering if we need a broader approach for this project.

Add unit tests for CourseWork and StudentSubmissions

https://github.com/kipp-bayarea/google_classroom/blob/3736fe4d85a8b9428bf77323c0789ba3e80ff169/tests.py#L24

Now that all endpoints have been added with PR #62, it would be good to have full coverage of all the endpoints mocked out in the tests.

Student Submissions error

We're also now getting the following error downloading student submissions:

[SQL: INSERT INTO dbo.[GoogleClassroom_StudentSubmissions] ([courseId], [courseWorkId], id, [userId], [creationTime], [updateTime], state, [draftGrade], [assignedGrade], [courseWorkType], [createdTime], [turnedInTimestamp], [returnedTimestamp], [draftMaxPoints], [draftGradeTimestamp], [draftGraderId], [assignedMaxPoints], [assignedGradeTimestamp], [assignedGraderId], late) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('231301848111', '238611370303', 'Cg4I3LPjrqgEEL-i4vL4Bg', '115842805580895735978', datetime.datetime(2020, 12, 1, 16, 41, 25, 957000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 25, 924000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I4rfjrqgEEL-i4vL4Bg', '107177606335613004747', datetime.datetime(2020, 12, 1, 16, 41, 18, 113000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 18, 98000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IneGKr6gEEL-i4vL4Bg', '102514046507660962400', datetime.datetime(2020, 12, 1, 16, 40, 47, 472000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 458000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I7umKr6gEEL-i4vL4Bg', '105987267483095051848', datetime.datetime(2020, 12, 1, 16, 40, 47, 544000), datetime.datetime(2020, 12, 1, 17, 21, 46, 949000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 432000), datetime.datetime(2020, 12, 1, 17, 21, 46, 948000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzIvEr6gEEL-i4vL4Bg', '108912011662447611108', datetime.datetime(2020, 12, 1, 16, 40, 22, 776000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 22, 713000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IwYj2r6gEEL-i4vL4Bg', '110557822443959198991', datetime.datetime(2020, 12, 1, 16, 40, 30, 982000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 30, 977000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzNyas6gEEL-i4vL4Bg', '104140230101238714287', datetime.datetime(2020, 12, 4, 20, 44, 52, 870000), datetime.datetime(2020, 12, 4, 20, 44, 56, 67000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 4, 20, 44, 52, 856000), datetime.datetime(2020, 12, 4, 20, 44, 56, 66000), None, None, None, None, None, None, None, 1), ('231301848111', '238611370303', 'Cg4Imdavs6gEEL-i4vL4Bg', '107722742003146079646', datetime.datetime(2020, 12, 1, 16, 42, 56, 994000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 42, 56, 939000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), None, None, None, None, None, None, None, 0) ... displaying 10 of 10000 total bound parameter sets ... ('203613708078', '231661121210', 'Cg4I26-ThtgDELrd0IDfBg', '108281326632597912647', datetime.datetime(2020, 11, 18, 13, 3, 37, 457000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 37, 394000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), None, None, None, None, None, None, None, 0), ('203613708078', '231661121210', 'Cg4I_siYh9gDELrd0IDfBg', '104154892059107388602', datetime.datetime(2020, 11, 18, 13, 3, 39, 694000), datetime.datetime(2020, 11, 18, 14, 53, 46, 714000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 39, 635000), datetime.datetime(2020, 11, 18, 14, 53, 46, 713000), None, None, None, None, None, None, None, 0))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Traceback (most recent call last):
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
pyodbc.DataError: ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "main.py", line 171, in
main(Config)
File "main.py", line 162, in main
StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
File "S:\Projects\google_classroom\timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "S:\Projects\google_classroom\api.py", line 259, in batch_pull_data
self._write_to_db(df)
File "S:\Projects\google_classroom\api.py", line 96, in write_to_db
self.sql.insert_into(self.table_name, df, chunksize=10000)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlsorcery_init.py", line 210, in insert_into
dtype=dtype,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 521, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 1317, in to_sql
table.insert(chunksize, method=method)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 755, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 669, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1014, in execute
return meth(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\sql\elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1133, in _execute_clauseelement
distilled_params,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1318, in execute_context
e, statement, parameters, cursor, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1512, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from=e
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise
raise exception
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
[SQL: INSERT INTO dbo.[GoogleClassroom_StudentSubmissions] ([courseId], [courseWorkId], id, [userId], [creationTime], [updateTime], state, [draftGrade], [assignedGrade], [courseWorkType], [createdTime], [turnedInTimestamp], [returnedTimestamp], [draftMaxPoints], [draftGradeTimestamp], [draftGraderId], [assignedMaxPoints], [assignedGradeTimestamp], [assignedGraderId], late) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('231301848111', '238611370303', 'Cg4I3LPjrqgEEL-i4vL4Bg', '115842805580895735978', datetime.datetime(2020, 12, 1, 16, 41, 25, 957000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 25, 924000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I4rfjrqgEEL-i4vL4Bg', '107177606335613004747', datetime.datetime(2020, 12, 1, 16, 41, 18, 113000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 18, 98000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IneGKr6gEEL-i4vL4Bg', '102514046507660962400', datetime.datetime(2020, 12, 1, 16, 40, 47, 472000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 458000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I7umKr6gEEL-i4vL4Bg', '105987267483095051848', datetime.datetime(2020, 12, 1, 16, 40, 47, 544000), datetime.datetime(2020, 12, 1, 17, 21, 46, 949000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 432000), datetime.datetime(2020, 12, 1, 17, 21, 46, 948000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzIvEr6gEEL-i4vL4Bg', '108912011662447611108', datetime.datetime(2020, 12, 1, 16, 40, 22, 776000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 22, 713000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IwYj2r6gEEL-i4vL4Bg', '110557822443959198991', datetime.datetime(2020, 12, 1, 16, 40, 30, 982000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 30, 977000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzNyas6gEEL-i4vL4Bg', '104140230101238714287', datetime.datetime(2020, 12, 4, 20, 44, 52, 870000), datetime.datetime(2020, 12, 4, 20, 44, 56, 67000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 4, 20, 44, 52, 856000), datetime.datetime(2020, 12, 4, 20, 44, 56, 66000), None, None, None, None, None, None, None, 1), ('231301848111', '238611370303', 'Cg4Imdavs6gEEL-i4vL4Bg', '107722742003146079646', datetime.datetime(2020, 12, 1, 16, 42, 56, 994000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 42, 56, 939000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), None, None, None, None, None, None, None, 0) ... displaying 10 of 10000 total bound parameter sets ... ('203613708078', '231661121210', 'Cg4I26-ThtgDELrd0IDfBg', '108281326632597912647', datetime.datetime(2020, 11, 18, 13, 3, 37, 457000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 37, 394000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), None, None, None, None, None, None, None, 0), ('203613708078', '231661121210', 'Cg4I_siYh9gDELrd0IDfBg', '104154892059107388602', datetime.datetime(2020, 11, 18, 13, 3, 39, 694000), datetime.datetime(2020, 11, 18, 14, 53, 46, 714000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 39, 635000), datetime.datetime(2020, 11, 18, 14, 53, 46, 713000), None, None, None, None, None, None, None, 0))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)

Fix request retries to be per request

Currently with the new method get_and_write_to_db when a request fails, the entire loop starts again. This should happen per request so that if a single request fails only that request retries.

I.E. if the students query loops over 1200 courses and a single course request fails midway it starts back over at course 1. It should just retry that single course request and continue on.

Coursework endpoint failing

We're getting the following error now when trying to download coursework:

2020-12-08 01:16:47PM Central Standard Time | ERROR: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
[SQL: INSERT INTO dbo.[GoogleClassroom_CourseWork] ([courseId], id, title, description, state, [alternateLink], [creationTime], [updateTime], [dueDate], [maxPoints], [workType], [assigneeMode], [submissionModificationMode], [creatorUserId], [topicId]) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('222693886482', '223016268500', 'to do', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIzMDE2MjY4NTAw/details', datetime.datetime(2020, 11, 11, 13, 29, 5, 499000), datetime.datetime(2020, 11, 11, 13, 31, 44, 627000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222693886482', '222718588596', 'class class', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIyNzE4NTg4NTk2/details', datetime.datetime(2020, 11, 10, 20, 16, 31, 164000), datetime.datetime(2020, 11, 10, 20, 16, 44, 358000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222691921702', '232238999150', 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?', 'ANSWER THE QUESTION!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/sa/MjMyMjM4OTk5MTUw/details', datetime.datetime(2020, 11, 19, 13, 44, 26, 403000), datetime.datetime(2020, 11, 19, 13, 48, 37, 97000), None, 70, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222691921702', '228735845599', 'SPELL IT RIGHT!', 'You have to spell all the questions right!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/a/MjI4NzM1ODQ1NTk5/details', datetime.datetime(2020, 11, 11, 15, 24, 38, 974000), datetime.datetime(2020, 11, 11, 15, 39, 12, 928000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222689364757', '229324082553', 'Commercial Script ', 'Directions inside', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjg5MzY0NzU3/a/MjI5MzI0MDgyNTUz/details', datetime.datetime(2020, 11, 12, 18, 53, 12, 849000), datetime.datetime(2020, 11, 12, 18, 55, 45, 685000), datetime.datetime(2020, 11, 14, 0, 0), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '101375802740655167828', None), ('205884161781', '228719101204', 'nez perce', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMjA0/details', datetime.datetime(2020, 11, 11, 14, 38, 51, 239000), datetime.datetime(2020, 11, 11, 14, 39, 51, 698000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '228719101157', 'interviw your sibling', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMTU3/details', datetime.datetime(2020, 11, 11, 14, 37, 32, 584000), datetime.datetime(2020, 11, 11, 14, 37, 49, 542000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '223029072321', "what did you learn about the labraant family today are they still famouse i don't know you need to answer that question", 'today if you watch the video about the labrant family well then you would have no problem with this question you needed two wtch both of the video if you took the test if you have not go take the test then aswer this question', 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/sa/MjIzMDI5MDcyMzIx/details', datetime.datetime(2020, 11, 11, 13, 36, 30, 870000), datetime.datetime(2020, 11, 11, 13, 39, 35, 641000), None, 100, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None) ... displaying 10 of 2227 total bound parameter sets ... ('202185157887', '230526788854', 'basketball', 'look at it', 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjMwNTI2Nzg4ODU0/details', datetime.datetime(2020, 11, 16, 13, 31, 25, 455000), datetime.datetime(2020, 11, 16, 13, 31, 22, 341000), datetime.datetime(2020, 11, 19, 5, 59), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', '230526788855'), ('202185157887', '229922436189', 'idk', None, 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjI5OTIyNDM2MTg5/details', datetime.datetime(2020, 11, 13, 20, 35, 49, 875000), datetime.datetime(2020, 11, 13, 20, 35, 51, 242000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', None))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Traceback (most recent call last):
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
pyodbc.DataError: ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "main.py", line 171, in
main(Config)
File "main.py", line 150, in main
CourseWork(classroom_service, sql, config).batch_pull_data(course_ids)
File "S:\Projects\google_classroom\timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "S:\Projects\google_classroom\api.py", line 259, in batch_pull_data
self._write_to_db(df)
File "S:\Projects\google_classroom\api.py", line 96, in write_to_db
self.sql.insert_into(self.table_name, df, chunksize=10000)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlsorcery_init.py", line 210, in insert_into
dtype=dtype,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 521, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 1317, in to_sql
table.insert(chunksize, method=method)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 755, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 669, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1014, in execute
return meth(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\sql\elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1133, in _execute_clauseelement
distilled_params,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1318, in execute_context
e, statement, parameters, cursor, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1512, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from=e
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise
raise exception
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
[SQL: INSERT INTO dbo.[GoogleClassroom_CourseWork] ([courseId], id, title, description, state, [alternateLink], [creationTime], [updateTime], [dueDate], [maxPoints], [workType], [assigneeMode], [submissionModificationMode], [creatorUserId], [topicId]) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('222693886482', '223016268500', 'to do', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIzMDE2MjY4NTAw/details', datetime.datetime(2020, 11, 11, 13, 29, 5, 499000), datetime.datetime(2020, 11, 11, 13, 31, 44, 627000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222693886482', '222718588596', 'class class', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIyNzE4NTg4NTk2/details', datetime.datetime(2020, 11, 10, 20, 16, 31, 164000), datetime.datetime(2020, 11, 10, 20, 16, 44, 358000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222691921702', '232238999150', 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?', 'ANSWER THE QUESTION!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/sa/MjMyMjM4OTk5MTUw/details', datetime.datetime(2020, 11, 19, 13, 44, 26, 403000), datetime.datetime(2020, 11, 19, 13, 48, 37, 97000), None, 70, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222691921702', '228735845599', 'SPELL IT RIGHT!', 'You have to spell all the questions right!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/a/MjI4NzM1ODQ1NTk5/details', datetime.datetime(2020, 11, 11, 15, 24, 38, 974000), datetime.datetime(2020, 11, 11, 15, 39, 12, 928000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222689364757', '229324082553', 'Commercial Script ', 'Directions inside', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjg5MzY0NzU3/a/MjI5MzI0MDgyNTUz/details', datetime.datetime(2020, 11, 12, 18, 53, 12, 849000), datetime.datetime(2020, 11, 12, 18, 55, 45, 685000), datetime.datetime(2020, 11, 14, 0, 0), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '101375802740655167828', None), ('205884161781', '228719101204', 'nez perce', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMjA0/details', datetime.datetime(2020, 11, 11, 14, 38, 51, 239000), datetime.datetime(2020, 11, 11, 14, 39, 51, 698000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '228719101157', 'interviw your sibling', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMTU3/details', datetime.datetime(2020, 11, 11, 14, 37, 32, 584000), datetime.datetime(2020, 11, 11, 14, 37, 49, 542000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '223029072321', "what did you learn about the labraant family today are they still famouse i don't know you need to answer that question", 'today if you watch the video about the labrant family well then you would have no problem with this question you needed two wtch both of the video if you took the test if you have not go take the test then aswer this question', 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/sa/MjIzMDI5MDcyMzIx/details', datetime.datetime(2020, 11, 11, 13, 36, 30, 870000), datetime.datetime(2020, 11, 11, 13, 39, 35, 641000), None, 100, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None) ... displaying 10 of 2227 total bound parameter sets ... ('202185157887', '230526788854', 'basketball', 'look at it', 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjMwNTI2Nzg4ODU0/details', datetime.datetime(2020, 11, 16, 13, 31, 25, 455000), datetime.datetime(2020, 11, 16, 13, 31, 22, 341000), datetime.datetime(2020, 11, 19, 5, 59), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', '230526788855'), ('202185157887', '229922436189', 'idk', None, 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjI5OTIyNDM2MTg5/details', datetime.datetime(2020, 11, 13, 20, 35, 49, 875000), datetime.datetime(2020, 11, 13, 20, 35, 51, 242000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', None))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)

Add calendarId to course object/table

Courses in Google Classroom now have Google Calendars associated with them. It would be nice to have that ID available so that we can construct links to the calendar from the database. The Course resource in the Classroom API has a calendarId field, so it seems like a simple matter of adding that to the models in this code.

DB Agnosticism

The current library assumes in a number of places that MSSQL is used as the library. One of the broader goals is to make this library database-agnostic, so that it can be used with everything from MSSQL to SQLite to cloud-based DBs like BigQuery or Redshift.

Converting Params To Environment Variables

Right now to configure which data gets pulled requires modifying the dockerfile to exec different commands. This should be moved to booleans in an env file so that code does not need to be modified to configure the call, and so configurations can be saved locally.

Add individualStudentOptions for courseWork

There is a nested dataset within the courseWork endpoint that is used when an assignment is assigned to a subset of students. This is necessary to pull in so that it can be determined which assignments were not assigned to all students and so missing assignments in analysis can be more exact. However, adding this directly in the CourseWork table would blow up the data and it would no longer be at the same level of granularity.

I propose we add a new table that maps the courseWorkId to the studentId from this individualStudentOptions list.

Possible to get only new or updated data?

Anyone know if it's possible to get new/updated data from the API? I have a similar script running to pull data, but have a bottleneck on StudentSubmissions since it has a huge amount of data now after about ~2 months of remote learning. Curious to hear if it might be possible to get new and or recently updated data in incremental fashion.

Add ability to pull usage data for a range of dates

https://github.com/kipp-bayarea/google_classroom/blob/e8e40c5ae5891adf1eb94ef1b717f9a6a0d91b9c/api.py#L158-L172

Modify the student usage request to use the max AsOfDate in the database to query a range of dates between then and the max date available (two days prior to today). This should also delete the latest day in the database table to ensure completeness of data (some days can load partial data depending on availability).

Add OrgUnit name to db table

It would useful to see the name of the org unit along with its id in the db table for auditing purposes.

Fix OrgUnit ValueError bug

https://github.com/kipp-bayarea/google_classroom/blob/e8e40c5ae5891adf1eb94ef1b717f9a6a0d91b9c/api.py#L169-L171

@zkagin I am encountering an issue running the latest branch (traceback below)

Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 394, in call
    result = fn(*args, **kwargs)
  File "/code/timer.py", line 22, in wrapper
    results = func(*args, **kwargs)
  File "/code/api.py", line 104, in get_and_write_to_db
    results = self.request_data().execute()
  File "/code/api.py", line 169, in request_data
    if self.org_unit_id:
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 159, in <module>
    main(Config)
  File "main.py", line 97, in main
    sql, overwrite=False, debug=config.DEBUG
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 311, in wrapped_f
    return self.call(f, *args, **kw)
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 391, in call
    do = self.iter(retry_state=retry_state)
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 351, in iter
    six.raise_from(retry_exc, fut.exception())
  File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0x7f1a4e5b4910 state=finished raised ValueError>]

Support for Meet Data

Classroom has an integration w/Meet, which provides video chat. Would be great to pull in Meet usage data as well via reports API for the call_ended event.

From my perspective, the relevant columns are:

conference_id
device_type
display_name
duration_seconds
endpoint_id
identifier
identifier_type
ip_address
is_external
meeting_code
organizer_email

Add CourseAlias endpoint

Some schools use the course alias to map their classes to their SIS. It would be useful to include this in the connector for full coverage: https://developers.google.com/classroom/reference/rest/v1/courses.aliases

Add runtime option for running the job for all endpoints

Now that batching is in place, it would be good to add a way to call the end-to-end job for all endpoints.

Currently this would need to run as:
docker run --rm -t google_classroom --usage --courses --topics --coursework --students --teachers --guardians --invites --submissions

Or by setting these all in the .env file.

It would be preferable to either add an optional --all flag to run them or to configure it such that if no flag is passed (or no env vars) then it defaults to running all.

Add dueDate parsing to CourseWork

Currently the dueDate timestamp is parsed into the following fields:

dueDate.year
dueDate.month
dueDate.day
duetTime.hours
dueTime.minutes

It would be preferable for this to be a single datetime field.

Add unit tests for Meets data

Now that we've incorporated the Meets endpoint, it would be good to add test coverage.

StudentSubmissions Memory Error

@zkagin Here's a traceback for that memory error I mentioned:

2020-09-26 12:09:40AM UTC | INFO: StudentSubmissions: Generating requests...
2020-09-26 12:09:46AM UTC | INFO: StudentSubmissions: 1923 requests remaining.
2020-09-26 12:09:57AM UTC | INFO: StudentSubmissions: 1630 requests remaining.
2020-09-26 12:10:25AM UTC | INFO: StudentSubmissions: 1382 requests remaining.
2020-09-26 12:10:36AM UTC | INFO: StudentSubmissions: Quota exceeded. Pausing for 20 seconds...
2020-09-26 12:10:56AM UTC | INFO: StudentSubmissions: 1287 requests remaining.
2020-09-26 12:11:30AM UTC | INFO: StudentSubmissions: 1101 requests remaining.
2020-09-26 12:12:01AM UTC | INFO: StudentSubmissions: 874 requests remaining.
2020-09-26 12:12:40AM UTC | INFO: StudentSubmissions: 671 requests remaining.
2020-09-26 12:14:42AM UTC | ERROR: RetryError[<Future at 0x7ff71ff4d250 state=finished raised MemoryError>]
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 412, in call
    result = fn(*args, **kwargs)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/http.py", line 1528, in execute
    self._execute(http, self._order, self._requests)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/http.py", line 1473, in _execute
    parser.feed(for_parser)
  File "/usr/local/lib/python3.7/email/feedparser.py", line 175, in feed
    self._input.push(data)
  File "/usr/local/lib/python3.7/email/feedparser.py", line 110, in push
    parts = self._partial.readlines()
MemoryError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 167, in <module>
    main(Config)
  File "main.py", line 158, in main
    StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
  File "/google_classroom/timer.py", line 22, in wrapper
    results = func(*args, **kwargs)
  File "/google_classroom/api.py", line 259, in batch_pull_data
    self._execute_batch_with_retry(batch)
  File "/google_classroom/api.py", line 161, in _execute_batch_with_retry
    retryer(batch.execute)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 409, in call
    do = self.iter(retry_state=retry_state)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 369, in iter
    six.raise_from(retry_exc, fut.exception())
  File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0x7ff71ff4d250 state=finished raised MemoryError>]

Archive StudentSubmissions before refresh

If a student is removed from a course all of the submission data for that course is lost. We should create a snapshot of prior day data before truncating the table.

Possible logic:

Before truncating StudentSubmission table, append data to archive table
Refresh StudentSubmission table
Remove any matching assignment (by submission id) from archive table that exists in current table

This logic would also help maintain year over year data as well. This is potentially something we should consider for other data sources: teachers, students, courses, coursework, etc.

Merging Strategy

@zkagin What strategy would you recommend for approving multiple PRs? Do you generally prefer to approve the dependent branches and merge them into the parent branch before merging all of them into master?

Otherwise, it seems its necessary to keep rebasing after each merge.

Add autodoc documentation

As we begin to near a place of stability and other users want to leverage this connector, it would be useful to include a basic level of documentation. We should consider adding Sphinx autodoc to auto-generate documentation from docstrings.

Optimizing Performance

Right now all calls to the Google Classroom API are done sequentially and individually. As we identify bottlenecks, the goal is to optimize these calls and minimize the time to run this from hours to minutes. This can potentially be done through both batch calling the endpoints and through parallelization of calls.

Further details are pending a performance test.

Aliases endpoint does not record courseId

The course.aliases.list API responds with an array of CourseAlias objects. CourseAlias does not contain a courseId property though. Unfortunately, to be useful from a data reporting perspective, we need the courseId in the CourseAliases table.

I've been able to work around this issue for now by specifically handling aliases in the internal callback method of api.py -> EndPoint.batch_pull_data. If you insert the following lines starting at line 220 of api.py (right after the nextPageToken conditional block) it will add the courseId property to each CourseAlias object in the response["aliases"] array.

            """Fix up alias records to include course Id"""
            if "aliases" in response:
                for alias in response["aliases"]:
                    if "courseId" not in alias:
                        alias["courseId"] = course_id

I don't think this should be the permanent fix, as it breaks inheritance (the parent shouldn't know about things that don't apply to all children), but the only other way I could see for now was to reimplement the batch_pull_data callback as a class method on EndPoint that could be overridden in CourseAliases with the updated callback code. That looked like a lot more work though and would require refactoring external references like batch_data. I'm sure there's a better way to do this as well, given I'm not much of a python coder yet.

Invalid column name 'ImportDate'.

Our morning sync errored today with the following:

2020/09/04 11:00:36 - Google Classroom - Running on platform : Windows Server 2012 R2
2020/09/04 11:00:36 - Google Classroom - Executing command : cmd.exe /C "C:\Users\SZ\AppData\Local\Temp\kettle_637ecd0b-eebf-11ea-bf57-dfabbe81a29eshell.bat"
2020/09/04 11:00:36 - Google Classroom - (stdout) 
2020/09/04 11:00:36 - Google Classroom - (stdout) c:\users\sz\google_classroom\google_classroom>pipenv run python c:\users\sz\google_classroom\google_classroom\main.py 
2020/09/04 11:00:37 - Google Classroom - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : (stderr) Loading .env environment variables…
2020/09/04 11:00:46 - Google Classroom - (stdout) 2020-09-04 11:00:46AM Eastern Daylight Time | INFO: OrgUnits: Generating requests...
2020/09/04 11:00:46 - Google Classroom - (stdout) 2020-09-04 11:00:46AM Eastern Daylight Time | INFO: OrgUnits: 1 requests remaining. On page 0.
2020/09/04 11:00:48 - Google Classroom - (stdout) 2020-09-04 11:00:48AM Eastern Daylight Time | INFO: EndPoint.batch_pull_data completed in 5.26 seconds.
2020/09/04 11:01:05 - Google Classroom - (stdout) 2020-09-04 11:01:05AM Eastern Daylight Time | INFO: StudentUsage: Generating requests...
2020/09/04 11:01:05 - Google Classroom - (stdout) 2020-09-04 11:01:05AM Eastern Daylight Time | INFO: StudentUsage: 6 requests remaining.
2020/09/04 11:01:11 - Google Classroom - (stdout) 2020-09-04 11:01:11AM Eastern Daylight Time | ERROR: (pyodbc.ProgrammingError) ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'ImportDate'. (207) (SQLExecute); [42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
2020/09/04 11:01:11 - Google Classroom - (stdout) [SQL: INSERT INTO custom.[GoogleClassroom_StudentUsage] ([Email], [AsOfDate], [LastUsedTime], [ImportDate]) VALUES (?, ?, ?, ?)]
[...]
2020/09/04 11:01:11 - Google Classroom - (stdout) (Background on this error at: http://sqlalche.me/e/f405)
2020/09/04 11:01:11 - Google Classroom - (stdout) Traceback (most recent call last):
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1228, in _execute_context
2020/09/04 11:01:11 - Google Classroom - (stdout)     cursor, statement, parameters, context
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 413, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout)     cursor, statement, parameters, context=context
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\default.py", line 587, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout)     cursor.executemany(statement, parameters)
2020/09/04 11:01:11 - Google Classroom - (stdout) pyodbc.ProgrammingError: ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'ImportDate'. (207) (SQLExecute); [42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
2020/09/04 11:01:11 - Google Classroom - (stdout) 
2020/09/04 11:01:11 - Google Classroom - (stdout) The above exception was the direct cause of the following exception:
2020/09/04 11:01:11 - Google Classroom - (stdout) 
2020/09/04 11:01:11 - Google Classroom - (stdout) Traceback (most recent call last):
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "c:\users\sz\google_classroom\google_classroom\main.py", line 174, in <module>
2020/09/04 11:01:11 - Google Classroom - (stdout)     main(Config)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "c:\users\sz\google_classroom\google_classroom\main.py", line 69, in main
2020/09/04 11:01:11 - Google Classroom - (stdout)     pull_data(config, creds, sql)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "c:\users\sz\google_classroom\google_classroom\main.py", line 96, in pull_data
2020/09/04 11:01:11 - Google Classroom - (stdout)     usage.batch_pull_data(dates=date_range_string, overwrite=False)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "c:\users\sz\google_classroom\google_classroom\timer.py", line 22, in wrapper
2020/09/04 11:01:11 - Google Classroom - (stdout)     results = func(*args, **kwargs)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "c:\users\sz\google_classroom\google_classroom\endpoints\base.py", line 265, in batch_pull_data
2020/09/04 11:01:11 - Google Classroom - (stdout)     self._write_to_db(df)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "c:\users\sz\google_classroom\google_classroom\endpoints\base.py", line 97, in _write_to_db
2020/09/04 11:01:11 - Google Classroom - (stdout)     self.sql.insert_into(self.table_name, df, chunksize=10000)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlsorcery\__init__.py", line 210, in insert_into
2020/09/04 11:01:11 - Google Classroom - (stdout)     dtype=dtype,
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
2020/09/04 11:01:11 - Google Classroom - (stdout)     method=method,
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 521, in to_sql
2020/09/04 11:01:11 - Google Classroom - (stdout)     method=method,
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 1317, in to_sql
2020/09/04 11:01:11 - Google Classroom - (stdout)     table.insert(chunksize, method=method)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 755, in insert
2020/09/04 11:01:11 - Google Classroom - (stdout)     exec_insert(conn, keys, chunk_iter)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 669, in _execute_insert
2020/09/04 11:01:11 - Google Classroom - (stdout)     conn.execute(self.table.insert(), data)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 984, in execute
2020/09/04 11:01:11 - Google Classroom - (stdout)     return meth(self, multiparams, params)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\sql\elements.py", line 293, in _execute_on_connection
2020/09/04 11:01:11 - Google Classroom - (stdout)     return connection._execute_clauseelement(self, multiparams, params)
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1103, in _execute_clauseelement
2020/09/04 11:01:11 - Google Classroom - (stdout)     distilled_params,
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1288, in _execute_context
2020/09/04 11:01:11 - Google Classroom - (stdout)     e, statement, parameters, cursor, context
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1482, in _handle_dbapi_exception
2020/09/04 11:01:11 - Google Classroom - (stdout)     sqlalchemy_exception, with_traceback=exc_info[2], from_=e
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise_
2020/09/04 11:01:11 - Google Classroom - (stdout)     raise exception
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1228, in _execute_context
2020/09/04 11:01:11 - Google Classroom - (stdout)     cursor, statement, parameters, context
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 413, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout)     cursor, statement, parameters, context=context
2020/09/04 11:01:11 - Google Classroom - (stdout)   File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\default.py", line 587, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout)     cursor.executemany(statement, parameters)
2020/09/04 11:01:11 - Google Classroom - (stdout) sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'ImportDate'. (207) (SQLExecute); [42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
2020/09/04 11:01:11 - Google Classroom - (stdout) [SQL: INSERT INTO custom.[GoogleClassroom_StudentUsage] ([Email], [AsOfDate], [LastUsedTime], [ImportDate]) VALUES (?, ?, ?, ?)]
[...]
2020/09/04 11:01:11 - Google Classroom - (stdout) (Background on this error at: http://sqlalche.me/e/f405)

A couple days ago I pulled down the last few months of commits (in an attempt to see if that would address the deadlocking crash that we started to see again), so I'm guessing that's the proximal cause. I can easily create what appears to be the missing column in our DB, just passing along as an FYI that there may have been a breaking change in the DB structure at some point.

SQL Deadlock

When running the StudentSubmissions endpoint query, the following error occurs (and is only visible after retries are exhausted or disabled):

sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('40001', '[40001] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 85) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)')
[SQL: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]]
[parameters: ('custom', 'BASE TABLE')]
(Background on this error at: http://sqlalche.me/e/dbapi)

Move database inserts to batch level

Right now the database writes are happening as part of the batch request callback. This may be creating performance bottlenecks. One option would be to move this to the batch level and cache the data for writing until the entire batch request is finished.

Add late column to studentSubmissions

There is a late boolean field in the API that would be good to add for simpler flagging of late assignments in analysis. I'm not sure why we didn't add this initially, but it seems like a simple addition.

Add Write Endpoint For Students

Task: Add ability to add, delete, and update student items based on the source of truth provided.

Usage Data fail on first run

@dchess - Was thinking more about the error we looked at today, and (admittedly not having fully immersed myself in the code), I don't think it was due to not having quotes around the date in the .env file.

I think it will always fail if there is no data in the student usage table and that it only worked today because my workaround fed data into the student usage data prior uncommenting the original code.

        usage = StudentUsage(admin_reports_service, sql, config, org_unit_id)
        last_date = usage.get_last_date()
        start_date = last_date + timedelta(days=1) or datetime.strptime(config.SCHOOL_YEAR_START, "%Y-%m-%d")

My quick read of StudentUsage.get_last_date() is that it's going to return None if there's no data previously loaded in the table, at which point the next line will throw an error because you can't add to None.

I had enough energy to type up this issue, but not enough to clear my database and try from scratch again to double check, so hopefully this is useful, and apologies if it's a false alarm.

Creating Diffing Logic to Check Difference Between Stored List of Items and Provided List

Task: create functions for comparing the most recently pulled list of items with a provided "master list" of items. This should result in a list of changes that need to be made in the form of deletes, adds, and updates.

Add Invitations endpoint

One previously ignored endpoint that could be valuable is the Invitations resource. This could be useful for identifying students who have been sent an invite, but haven't shown up in the Students by course dataset.

Adapt Batching Logic For Write Endpoints

Task: Take the existing batching logic and adapt it for the new write endpoints. This may also include handling additional error types.

Add additional course statuses to Courses Endpoint

@zkagin When a course is archived in Google Classroom it doesn't get pulled by the API. This currently only pulls active.

Build First Write Endpoint For Courses

Task: Add a new endpoint for adding or updating new courses, and test it in both sandbox and via new written tests.

Adding Safety Checks

This library currently assumes a lot about the structure of data coming from the Google Classroom API and what files/tables exist in the integrated database. The goal here would be to make sure to check and no-op rather than crash in these cases, and potentially to warn the user with a clear error message if something is wrong and needs to be fixed to continue.

Adding Idempotency

Right now, when the library is called multiple times, it can duplicate some data. The goal is eventually to allow the user to run the script as many times as needed without worry that they may accidentally delete old data or double-count new data.

One option to solve this may involve transitioning away from the files caches in /data which currently are causing some of these issues and instead using the database as the source of truth, potentially instead using file logs only for auditing purposes.

Query StudentSubmissions by coursework

Rather than querying the StudentSubmission endpoint by looping over Courses, it could be preferable to loop over individual assignments (CourseWork). That way the query could be date-bound by pre-filtering the assignments by creationTime. While this might result in more calls, it may reduce pagination because it would be less data volume overall. Using Courses to query pulls in every assignment ever in that class and many may no longer be relevant to monitor after a certain point. If history is necessary to maintain, one approach would be to handle similar to the StudentUsage data and truncate a subset and append.

Investigate CustomerUsageReport

https://developers.google.com/admin-sdk/reports/v1/appendix/usage/customer/classroom

Will this report give us better usage data than the userUsage endpoint we are using now?

StudentSubmissions request writing dupes to db

StudentSubmissions(classroom_service).get_and_write_to_db(
            sql, course_ids, debug=config.DEBUG
        )

When this call encounters certain courses, it is loading way more records than exist in the course. For instance there is a course with 22 assignments and 29 students, that returns 100Ks of records. It should only be pulling back ~600+ records.

Adding Tests

The google_classroom library doesn't currently have any tests. The goal of this issue is to add testing coverage for the API processing and saving into a DB. For each call to the API, this would include:

Patching the API return and confirming that data is received and processed correctly.
Spinning up a fake database and confirming that data is written correctly to the right tables.

Add Write Endpoints for Teachers

Task: Add ability to add, delete, and update teacher items based on the source of truth provided.

Add Write Endpoint For Topics

Task: Add ability to add, delete, and update topics items based on the source of truth provided.

Incremental Meet Updates

Pulling Google Meet data currently takes about 15 minutes for us, a relatively small network of schools. For a mid- to large-sized network, this could take a extremely long time.

It's probably a good idea eventually to change the Meet logic to query only for new data since the last update, and append that to the table, rather than dropping the table and re-adding everything.

Batch request ID already exists

Traceback (most recent call last):
File "main.py", line 131, in
main(Config)
File "main.py", line 126, in main
StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
File "/code/timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "/code/api.py", line 182, in batch_pull_data
batch.add(request, request_id=request_id)
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/googleapiclient/http.py", line 1398, in add
raise KeyError("A request with this ID already exists: %s" % request_id)
KeyError: 'A request with this ID already exists: 41010390897;None;2'

Fix data conversion error on CourseWork

When importing the CourseWork endpoint, the following error is occurring:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Error converting data type nvarchar to float. (8114) (SQLExecDirectW)')

This is either coming from the topicId or one of the dueDate/dueTime fields.

Add Announcements Endpoint

Some courses use announcements to share resources and it may be a useful datapoint for analysis: https://developers.google.com/classroom/reference/rest/v1/courses.announcements#Announcement

Add partial data tracking on usage reports

When running the usage report for multiple dates if more than the most recent date has partial data, reruns only refresh the latest date and do not account for partial data on past dates.

One option would be to flag any dates that have partial data warnings and when rerunning the job truncate those dates.

Failing tests don't trigger a failed Github Action Check

Even though the tests are failing with an exit code 1, the Github Action is still allowing the check to pass.

Add error handling for dates

If a date in Google Classroom is outside of the bounds panda can parse, add error handling that logs the record id (and course id if relevant) and inserts it with a null date. Example if a coursework timestamp looks like "9999-10-10 04:59:00" this would be handled as a NULL date instead but log the coursework id and the corresponding course id.

This issue was identified by KIPP Texas.