kippnorcal / google_classroom Goto Github PK
View Code? Open in Web Editor NEWGoogle Classroom Data Pipeline
License: GNU General Public License v3.0
Google Classroom Data Pipeline
License: GNU General Public License v3.0
The current job would benefit from a way to notify success/errors when running as a job. We do this standard by adding a mailer that emails a Slack channel, but am wondering if we need a broader approach for this project.
Now that all endpoints have been added with PR #62, it would be good to have full coverage of all the endpoints mocked out in the tests.
We're also now getting the following error downloading student submissions:
[SQL: INSERT INTO dbo.[GoogleClassroom_StudentSubmissions] ([courseId], [courseWorkId], id, [userId], [creationTime], [updateTime], state, [draftGrade], [assignedGrade], [courseWorkType], [createdTime], [turnedInTimestamp], [returnedTimestamp], [draftMaxPoints], [draftGradeTimestamp], [draftGraderId], [assignedMaxPoints], [assignedGradeTimestamp], [assignedGraderId], late) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('231301848111', '238611370303', 'Cg4I3LPjrqgEEL-i4vL4Bg', '115842805580895735978', datetime.datetime(2020, 12, 1, 16, 41, 25, 957000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 25, 924000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I4rfjrqgEEL-i4vL4Bg', '107177606335613004747', datetime.datetime(2020, 12, 1, 16, 41, 18, 113000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 18, 98000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IneGKr6gEEL-i4vL4Bg', '102514046507660962400', datetime.datetime(2020, 12, 1, 16, 40, 47, 472000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 458000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I7umKr6gEEL-i4vL4Bg', '105987267483095051848', datetime.datetime(2020, 12, 1, 16, 40, 47, 544000), datetime.datetime(2020, 12, 1, 17, 21, 46, 949000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 432000), datetime.datetime(2020, 12, 1, 17, 21, 46, 948000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzIvEr6gEEL-i4vL4Bg', '108912011662447611108', datetime.datetime(2020, 12, 1, 16, 40, 22, 776000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 22, 713000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IwYj2r6gEEL-i4vL4Bg', '110557822443959198991', datetime.datetime(2020, 12, 1, 16, 40, 30, 982000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 30, 977000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzNyas6gEEL-i4vL4Bg', '104140230101238714287', datetime.datetime(2020, 12, 4, 20, 44, 52, 870000), datetime.datetime(2020, 12, 4, 20, 44, 56, 67000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 4, 20, 44, 52, 856000), datetime.datetime(2020, 12, 4, 20, 44, 56, 66000), None, None, None, None, None, None, None, 1), ('231301848111', '238611370303', 'Cg4Imdavs6gEEL-i4vL4Bg', '107722742003146079646', datetime.datetime(2020, 12, 1, 16, 42, 56, 994000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 42, 56, 939000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), None, None, None, None, None, None, None, 0) ... displaying 10 of 10000 total bound parameter sets ... ('203613708078', '231661121210', 'Cg4I26-ThtgDELrd0IDfBg', '108281326632597912647', datetime.datetime(2020, 11, 18, 13, 3, 37, 457000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 37, 394000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), None, None, None, None, None, None, None, 0), ('203613708078', '231661121210', 'Cg4I_siYh9gDELrd0IDfBg', '104154892059107388602', datetime.datetime(2020, 11, 18, 13, 3, 39, 694000), datetime.datetime(2020, 11, 18, 14, 53, 46, 714000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 39, 635000), datetime.datetime(2020, 11, 18, 14, 53, 46, 713000), None, None, None, None, None, None, None, 0))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Traceback (most recent call last):
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
pyodbc.DataError: ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 171, in
main(Config)
File "main.py", line 162, in main
StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
File "S:\Projects\google_classroom\timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "S:\Projects\google_classroom\api.py", line 259, in batch_pull_data
self._write_to_db(df)
File "S:\Projects\google_classroom\api.py", line 96, in write_to_db
self.sql.insert_into(self.table_name, df, chunksize=10000)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlsorcery_init.py", line 210, in insert_into
dtype=dtype,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 521, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 1317, in to_sql
table.insert(chunksize, method=method)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 755, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 669, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1014, in execute
return meth(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\sql\elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1133, in _execute_clauseelement
distilled_params,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1318, in execute_context
e, statement, parameters, cursor, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1512, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from=e
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise
raise exception
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
[SQL: INSERT INTO dbo.[GoogleClassroom_StudentSubmissions] ([courseId], [courseWorkId], id, [userId], [creationTime], [updateTime], state, [draftGrade], [assignedGrade], [courseWorkType], [createdTime], [turnedInTimestamp], [returnedTimestamp], [draftMaxPoints], [draftGradeTimestamp], [draftGraderId], [assignedMaxPoints], [assignedGradeTimestamp], [assignedGraderId], late) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('231301848111', '238611370303', 'Cg4I3LPjrqgEEL-i4vL4Bg', '115842805580895735978', datetime.datetime(2020, 12, 1, 16, 41, 25, 957000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 25, 924000), datetime.datetime(2020, 12, 1, 16, 57, 22, 45000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I4rfjrqgEEL-i4vL4Bg', '107177606335613004747', datetime.datetime(2020, 12, 1, 16, 41, 18, 113000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 41, 18, 98000), datetime.datetime(2020, 12, 1, 16, 55, 44, 996000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IneGKr6gEEL-i4vL4Bg', '102514046507660962400', datetime.datetime(2020, 12, 1, 16, 40, 47, 472000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 458000), datetime.datetime(2020, 12, 1, 16, 52, 35, 223000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4I7umKr6gEEL-i4vL4Bg', '105987267483095051848', datetime.datetime(2020, 12, 1, 16, 40, 47, 544000), datetime.datetime(2020, 12, 1, 17, 21, 46, 949000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 47, 432000), datetime.datetime(2020, 12, 1, 17, 21, 46, 948000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzIvEr6gEEL-i4vL4Bg', '108912011662447611108', datetime.datetime(2020, 12, 1, 16, 40, 22, 776000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 22, 713000), datetime.datetime(2020, 12, 1, 21, 24, 5, 623000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IwYj2r6gEEL-i4vL4Bg', '110557822443959198991', datetime.datetime(2020, 12, 1, 16, 40, 30, 982000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 40, 30, 977000), datetime.datetime(2020, 12, 1, 16, 58, 13, 592000), None, None, None, None, None, None, None, 0), ('231301848111', '238611370303', 'Cg4IzNyas6gEEL-i4vL4Bg', '104140230101238714287', datetime.datetime(2020, 12, 4, 20, 44, 52, 870000), datetime.datetime(2020, 12, 4, 20, 44, 56, 67000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 4, 20, 44, 52, 856000), datetime.datetime(2020, 12, 4, 20, 44, 56, 66000), None, None, None, None, None, None, None, 1), ('231301848111', '238611370303', 'Cg4Imdavs6gEEL-i4vL4Bg', '107722742003146079646', datetime.datetime(2020, 12, 1, 16, 42, 56, 994000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 12, 1, 16, 42, 56, 939000), datetime.datetime(2020, 12, 1, 17, 1, 19, 356000), None, None, None, None, None, None, None, 0) ... displaying 10 of 10000 total bound parameter sets ... ('203613708078', '231661121210', 'Cg4I26-ThtgDELrd0IDfBg', '108281326632597912647', datetime.datetime(2020, 11, 18, 13, 3, 37, 457000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 37, 394000), datetime.datetime(2020, 11, 18, 20, 36, 35, 543000), None, None, None, None, None, None, None, 0), ('203613708078', '231661121210', 'Cg4I_siYh9gDELrd0IDfBg', '104154892059107388602', datetime.datetime(2020, 11, 18, 13, 3, 39, 694000), datetime.datetime(2020, 11, 18, 14, 53, 46, 714000), 'TURNED_IN', None, None, 'ASSIGNMENT', datetime.datetime(2020, 11, 18, 13, 3, 39, 635000), datetime.datetime(2020, 11, 18, 14, 53, 46, 713000), None, None, None, None, None, None, None, 0))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Currently with the new method get_and_write_to_db
when a request fails, the entire loop starts again. This should happen per request so that if a single request fails only that request retries.
I.E. if the students query loops over 1200 courses and a single course request fails midway it starts back over at course 1. It should just retry that single course request and continue on.
We're getting the following error now when trying to download coursework:
2020-12-08 01:16:47PM Central Standard Time | ERROR: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
[SQL: INSERT INTO dbo.[GoogleClassroom_CourseWork] ([courseId], id, title, description, state, [alternateLink], [creationTime], [updateTime], [dueDate], [maxPoints], [workType], [assigneeMode], [submissionModificationMode], [creatorUserId], [topicId]) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('222693886482', '223016268500', 'to do', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIzMDE2MjY4NTAw/details', datetime.datetime(2020, 11, 11, 13, 29, 5, 499000), datetime.datetime(2020, 11, 11, 13, 31, 44, 627000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222693886482', '222718588596', 'class class', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIyNzE4NTg4NTk2/details', datetime.datetime(2020, 11, 10, 20, 16, 31, 164000), datetime.datetime(2020, 11, 10, 20, 16, 44, 358000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222691921702', '232238999150', 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?', 'ANSWER THE QUESTION!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/sa/MjMyMjM4OTk5MTUw/details', datetime.datetime(2020, 11, 19, 13, 44, 26, 403000), datetime.datetime(2020, 11, 19, 13, 48, 37, 97000), None, 70, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222691921702', '228735845599', 'SPELL IT RIGHT!', 'You have to spell all the questions right!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/a/MjI4NzM1ODQ1NTk5/details', datetime.datetime(2020, 11, 11, 15, 24, 38, 974000), datetime.datetime(2020, 11, 11, 15, 39, 12, 928000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222689364757', '229324082553', 'Commercial Script ', 'Directions inside', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjg5MzY0NzU3/a/MjI5MzI0MDgyNTUz/details', datetime.datetime(2020, 11, 12, 18, 53, 12, 849000), datetime.datetime(2020, 11, 12, 18, 55, 45, 685000), datetime.datetime(2020, 11, 14, 0, 0), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '101375802740655167828', None), ('205884161781', '228719101204', 'nez perce', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMjA0/details', datetime.datetime(2020, 11, 11, 14, 38, 51, 239000), datetime.datetime(2020, 11, 11, 14, 39, 51, 698000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '228719101157', 'interviw your sibling', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMTU3/details', datetime.datetime(2020, 11, 11, 14, 37, 32, 584000), datetime.datetime(2020, 11, 11, 14, 37, 49, 542000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '223029072321', "what did you learn about the labraant family today are they still famouse i don't know you need to answer that question", 'today if you watch the video about the labrant family well then you would have no problem with this question you needed two wtch both of the video if you took the test if you have not go take the test then aswer this question', 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/sa/MjIzMDI5MDcyMzIx/details', datetime.datetime(2020, 11, 11, 13, 36, 30, 870000), datetime.datetime(2020, 11, 11, 13, 39, 35, 641000), None, 100, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None) ... displaying 10 of 2227 total bound parameter sets ... ('202185157887', '230526788854', 'basketball', 'look at it', 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjMwNTI2Nzg4ODU0/details', datetime.datetime(2020, 11, 16, 13, 31, 25, 455000), datetime.datetime(2020, 11, 16, 13, 31, 22, 341000), datetime.datetime(2020, 11, 19, 5, 59), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', '230526788855'), ('202185157887', '229922436189', 'idk', None, 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjI5OTIyNDM2MTg5/details', datetime.datetime(2020, 11, 13, 20, 35, 49, 875000), datetime.datetime(2020, 11, 13, 20, 35, 51, 242000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', None))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Traceback (most recent call last):
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
pyodbc.DataError: ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 171, in
main(Config)
File "main.py", line 150, in main
CourseWork(classroom_service, sql, config).batch_pull_data(course_ids)
File "S:\Projects\google_classroom\timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "S:\Projects\google_classroom\api.py", line 259, in batch_pull_data
self._write_to_db(df)
File "S:\Projects\google_classroom\api.py", line 96, in write_to_db
self.sql.insert_into(self.table_name, df, chunksize=10000)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlsorcery_init.py", line 210, in insert_into
dtype=dtype,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 521, in to_sql
method=method,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 1317, in to_sql
table.insert(chunksize, method=method)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 755, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\pandas\io\sql.py", line 669, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1014, in execute
return meth(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\sql\elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1133, in _execute_clauseelement
distilled_params,
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1318, in execute_context
e, statement, parameters, cursor, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1512, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from=e
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise
raise exception
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\base.py", line 1258, in _execute_context
cursor, statement, parameters, context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 417, in do_executemany
cursor, statement, parameters, context=context
File "C:\Users\WBartley.virtualenvs\google_classroom-J9HVh9GF\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.DataError: (pyodbc.DataError) ('22018', '[22018] [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification (0) (SQLParamData)')
[SQL: INSERT INTO dbo.[GoogleClassroom_CourseWork] ([courseId], id, title, description, state, [alternateLink], [creationTime], [updateTime], [dueDate], [maxPoints], [workType], [assigneeMode], [submissionModificationMode], [creatorUserId], [topicId]) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (('222693886482', '223016268500', 'to do', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIzMDE2MjY4NTAw/details', datetime.datetime(2020, 11, 11, 13, 29, 5, 499000), datetime.datetime(2020, 11, 11, 13, 31, 44, 627000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222693886482', '222718588596', 'class class', None, 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkzODg2NDgy/a/MjIyNzE4NTg4NTk2/details', datetime.datetime(2020, 11, 10, 20, 16, 31, 164000), datetime.datetime(2020, 11, 10, 20, 16, 44, 358000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '115624344439250234383', None), ('222691921702', '232238999150', 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?', 'ANSWER THE QUESTION!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/sa/MjMyMjM4OTk5MTUw/details', datetime.datetime(2020, 11, 19, 13, 44, 26, 403000), datetime.datetime(2020, 11, 19, 13, 48, 37, 97000), None, 70, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222691921702', '228735845599', 'SPELL IT RIGHT!', 'You have to spell all the questions right!', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjkxOTIxNzAy/a/MjI4NzM1ODQ1NTk5/details', datetime.datetime(2020, 11, 11, 15, 24, 38, 974000), datetime.datetime(2020, 11, 11, 15, 39, 12, 928000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '104144867849088240281', None), ('222689364757', '229324082553', 'Commercial Script ', 'Directions inside', 'PUBLISHED', 'https://classroom.google.com/c/MjIyNjg5MzY0NzU3/a/MjI5MzI0MDgyNTUz/details', datetime.datetime(2020, 11, 12, 18, 53, 12, 849000), datetime.datetime(2020, 11, 12, 18, 55, 45, 685000), datetime.datetime(2020, 11, 14, 0, 0), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '101375802740655167828', None), ('205884161781', '228719101204', 'nez perce', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMjA0/details', datetime.datetime(2020, 11, 11, 14, 38, 51, 239000), datetime.datetime(2020, 11, 11, 14, 39, 51, 698000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '228719101157', 'interviw your sibling', "\nHey it's Thursday already! Bam! You are an overcomer, an overachiever, and blessed with all that you need! Just like you put your clothes on put l ... (121 characters truncated) ... put your mind and heart into. Con ganas todo se puede! We believe in you! It is better to give than to receive. Being thankful makes your day better!", 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/a/MjI4NzE5MTAxMTU3/details', datetime.datetime(2020, 11, 11, 14, 37, 32, 584000), datetime.datetime(2020, 11, 11, 14, 37, 49, 542000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None), ('205884161781', '223029072321', "what did you learn about the labraant family today are they still famouse i don't know you need to answer that question", 'today if you watch the video about the labrant family well then you would have no problem with this question you needed two wtch both of the video if you took the test if you have not go take the test then aswer this question', 'PUBLISHED', 'https://classroom.google.com/c/MjA1ODg0MTYxNzgx/sa/MjIzMDI5MDcyMzIx/details', datetime.datetime(2020, 11, 11, 13, 36, 30, 870000), datetime.datetime(2020, 11, 11, 13, 39, 35, 641000), None, 100, 'SHORT_ANSWER_QUESTION', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '105301322954643330860', None) ... displaying 10 of 2227 total bound parameter sets ... ('202185157887', '230526788854', 'basketball', 'look at it', 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjMwNTI2Nzg4ODU0/details', datetime.datetime(2020, 11, 16, 13, 31, 25, 455000), datetime.datetime(2020, 11, 16, 13, 31, 22, 341000), datetime.datetime(2020, 11, 19, 5, 59), 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', '230526788855'), ('202185157887', '229922436189', 'idk', None, 'PUBLISHED', 'https://classroom.google.com/c/MjAyMTg1MTU3ODg3/a/MjI5OTIyNDM2MTg5/details', datetime.datetime(2020, 11, 13, 20, 35, 49, 875000), datetime.datetime(2020, 11, 13, 20, 35, 51, 242000), None, 100, 'ASSIGNMENT', 'ALL_STUDENTS', 'MODIFIABLE_UNTIL_TURNED_IN', '110665100727005067242', None))]
(Background on this error at: http://sqlalche.me/e/13/9h9h)
Courses in Google Classroom now have Google Calendars associated with them. It would be nice to have that ID available so that we can construct links to the calendar from the database. The Course resource in the Classroom API has a calendarId field, so it seems like a simple matter of adding that to the models in this code.
The current library assumes in a number of places that MSSQL is used as the library. One of the broader goals is to make this library database-agnostic, so that it can be used with everything from MSSQL to SQLite to cloud-based DBs like BigQuery or Redshift.
Right now to configure which data gets pulled requires modifying the dockerfile to exec different commands. This should be moved to booleans in an env file so that code does not need to be modified to configure the call, and so configurations can be saved locally.
There is a nested dataset within the courseWork endpoint that is used when an assignment is assigned to a subset of students. This is necessary to pull in so that it can be determined which assignments were not assigned to all students and so missing assignments in analysis can be more exact. However, adding this directly in the CourseWork table would blow up the data and it would no longer be at the same level of granularity.
I propose we add a new table that maps the courseWorkId to the studentId from this individualStudentOptions list.
Anyone know if it's possible to get new/updated data from the API? I have a similar script running to pull data, but have a bottleneck on StudentSubmissions since it has a huge amount of data now after about ~2 months of remote learning. Curious to hear if it might be possible to get new and or recently updated data in incremental fashion.
Modify the student usage request to use the max AsOfDate in the database to query a range of dates between then and the max date available (two days prior to today). This should also delete the latest day in the database table to ensure completeness of data (some days can load partial data depending on availability).
It would useful to see the name of the org unit along with its id in the db table for auditing purposes.
@zkagin I am encountering an issue running the latest branch (traceback below)
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 394, in call
result = fn(*args, **kwargs)
File "/code/timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "/code/api.py", line 104, in get_and_write_to_db
results = self.request_data().execute()
File "/code/api.py", line 169, in request_data
if self.org_unit_id:
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 159, in <module>
main(Config)
File "main.py", line 97, in main
sql, overwrite=False, debug=config.DEBUG
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 311, in wrapped_f
return self.call(f, *args, **kw)
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 391, in call
do = self.iter(retry_state=retry_state)
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 351, in iter
six.raise_from(retry_exc, fut.exception())
File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0x7f1a4e5b4910 state=finished raised ValueError>]
Classroom has an integration w/Meet, which provides video chat. Would be great to pull in Meet usage data as well via reports API for the call_ended event.
From my perspective, the relevant columns are:
Some schools use the course alias to map their classes to their SIS. It would be useful to include this in the connector for full coverage: https://developers.google.com/classroom/reference/rest/v1/courses.aliases
Now that batching is in place, it would be good to add a way to call the end-to-end job for all endpoints.
Currently this would need to run as:
docker run --rm -t google_classroom --usage --courses --topics --coursework --students --teachers --guardians --invites --submissions
Or by setting these all in the .env
file.
It would be preferable to either add an optional --all
flag to run them or to configure it such that if no flag is passed (or no env vars) then it defaults to running all.
Currently the dueDate timestamp is parsed into the following fields:
It would be preferable for this to be a single datetime field.
Now that we've incorporated the Meets endpoint, it would be good to add test coverage.
@zkagin Here's a traceback for that memory error I mentioned:
2020-09-26 12:09:40AM UTC | INFO: StudentSubmissions: Generating requests...
2020-09-26 12:09:46AM UTC | INFO: StudentSubmissions: 1923 requests remaining.
2020-09-26 12:09:57AM UTC | INFO: StudentSubmissions: 1630 requests remaining.
2020-09-26 12:10:25AM UTC | INFO: StudentSubmissions: 1382 requests remaining.
2020-09-26 12:10:36AM UTC | INFO: StudentSubmissions: Quota exceeded. Pausing for 20 seconds...
2020-09-26 12:10:56AM UTC | INFO: StudentSubmissions: 1287 requests remaining.
2020-09-26 12:11:30AM UTC | INFO: StudentSubmissions: 1101 requests remaining.
2020-09-26 12:12:01AM UTC | INFO: StudentSubmissions: 874 requests remaining.
2020-09-26 12:12:40AM UTC | INFO: StudentSubmissions: 671 requests remaining.
2020-09-26 12:14:42AM UTC | ERROR: RetryError[<Future at 0x7ff71ff4d250 state=finished raised MemoryError>]
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 412, in call
result = fn(*args, **kwargs)
File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/http.py", line 1528, in execute
self._execute(http, self._order, self._requests)
File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/http.py", line 1473, in _execute
parser.feed(for_parser)
File "/usr/local/lib/python3.7/email/feedparser.py", line 175, in feed
self._input.push(data)
File "/usr/local/lib/python3.7/email/feedparser.py", line 110, in push
parts = self._partial.readlines()
MemoryError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 167, in <module>
main(Config)
File "main.py", line 158, in main
StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
File "/google_classroom/timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "/google_classroom/api.py", line 259, in batch_pull_data
self._execute_batch_with_retry(batch)
File "/google_classroom/api.py", line 161, in _execute_batch_with_retry
retryer(batch.execute)
File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 409, in call
do = self.iter(retry_state=retry_state)
File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 369, in iter
six.raise_from(retry_exc, fut.exception())
File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0x7ff71ff4d250 state=finished raised MemoryError>]
If a student is removed from a course all of the submission data for that course is lost. We should create a snapshot of prior day data before truncating the table.
Possible logic:
This logic would also help maintain year over year data as well. This is potentially something we should consider for other data sources: teachers, students, courses, coursework, etc.
@zkagin What strategy would you recommend for approving multiple PRs? Do you generally prefer to approve the dependent branches and merge them into the parent branch before merging all of them into master?
Otherwise, it seems its necessary to keep rebasing after each merge.
As we begin to near a place of stability and other users want to leverage this connector, it would be useful to include a basic level of documentation. We should consider adding Sphinx autodoc to auto-generate documentation from docstrings.
Right now all calls to the Google Classroom API are done sequentially and individually. As we identify bottlenecks, the goal is to optimize these calls and minimize the time to run this from hours to minutes. This can potentially be done through both batch calling the endpoints and through parallelization of calls.
Further details are pending a performance test.
The course.aliases.list API responds with an array of CourseAlias objects. CourseAlias does not contain a courseId property though. Unfortunately, to be useful from a data reporting perspective, we need the courseId in the CourseAliases table.
I've been able to work around this issue for now by specifically handling aliases in the internal callback method of api.py -> EndPoint.batch_pull_data. If you insert the following lines starting at line 220 of api.py (right after the nextPageToken conditional block) it will add the courseId property to each CourseAlias object in the response["aliases"] array.
"""Fix up alias records to include course Id"""
if "aliases" in response:
for alias in response["aliases"]:
if "courseId" not in alias:
alias["courseId"] = course_id
I don't think this should be the permanent fix, as it breaks inheritance (the parent shouldn't know about things that don't apply to all children), but the only other way I could see for now was to reimplement the batch_pull_data callback as a class method on EndPoint that could be overridden in CourseAliases with the updated callback code. That looked like a lot more work though and would require refactoring external references like batch_data. I'm sure there's a better way to do this as well, given I'm not much of a python coder yet.
Our morning sync errored today with the following:
2020/09/04 11:00:36 - Google Classroom - Running on platform : Windows Server 2012 R2
2020/09/04 11:00:36 - Google Classroom - Executing command : cmd.exe /C "C:\Users\SZ\AppData\Local\Temp\kettle_637ecd0b-eebf-11ea-bf57-dfabbe81a29eshell.bat"
2020/09/04 11:00:36 - Google Classroom - (stdout)
2020/09/04 11:00:36 - Google Classroom - (stdout) c:\users\sz\google_classroom\google_classroom>pipenv run python c:\users\sz\google_classroom\google_classroom\main.py
2020/09/04 11:00:37 - Google Classroom - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : (stderr) Loading .env environment variables…
2020/09/04 11:00:46 - Google Classroom - (stdout) 2020-09-04 11:00:46AM Eastern Daylight Time | INFO: OrgUnits: Generating requests...
2020/09/04 11:00:46 - Google Classroom - (stdout) 2020-09-04 11:00:46AM Eastern Daylight Time | INFO: OrgUnits: 1 requests remaining. On page 0.
2020/09/04 11:00:48 - Google Classroom - (stdout) 2020-09-04 11:00:48AM Eastern Daylight Time | INFO: EndPoint.batch_pull_data completed in 5.26 seconds.
2020/09/04 11:01:05 - Google Classroom - (stdout) 2020-09-04 11:01:05AM Eastern Daylight Time | INFO: StudentUsage: Generating requests...
2020/09/04 11:01:05 - Google Classroom - (stdout) 2020-09-04 11:01:05AM Eastern Daylight Time | INFO: StudentUsage: 6 requests remaining.
2020/09/04 11:01:11 - Google Classroom - (stdout) 2020-09-04 11:01:11AM Eastern Daylight Time | ERROR: (pyodbc.ProgrammingError) ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'ImportDate'. (207) (SQLExecute); [42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
2020/09/04 11:01:11 - Google Classroom - (stdout) [SQL: INSERT INTO custom.[GoogleClassroom_StudentUsage] ([Email], [AsOfDate], [LastUsedTime], [ImportDate]) VALUES (?, ?, ?, ?)]
[...]
2020/09/04 11:01:11 - Google Classroom - (stdout) (Background on this error at: http://sqlalche.me/e/f405)
2020/09/04 11:01:11 - Google Classroom - (stdout) Traceback (most recent call last):
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1228, in _execute_context
2020/09/04 11:01:11 - Google Classroom - (stdout) cursor, statement, parameters, context
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 413, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout) cursor, statement, parameters, context=context
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\default.py", line 587, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout) cursor.executemany(statement, parameters)
2020/09/04 11:01:11 - Google Classroom - (stdout) pyodbc.ProgrammingError: ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'ImportDate'. (207) (SQLExecute); [42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
2020/09/04 11:01:11 - Google Classroom - (stdout)
2020/09/04 11:01:11 - Google Classroom - (stdout) The above exception was the direct cause of the following exception:
2020/09/04 11:01:11 - Google Classroom - (stdout)
2020/09/04 11:01:11 - Google Classroom - (stdout) Traceback (most recent call last):
2020/09/04 11:01:11 - Google Classroom - (stdout) File "c:\users\sz\google_classroom\google_classroom\main.py", line 174, in <module>
2020/09/04 11:01:11 - Google Classroom - (stdout) main(Config)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "c:\users\sz\google_classroom\google_classroom\main.py", line 69, in main
2020/09/04 11:01:11 - Google Classroom - (stdout) pull_data(config, creds, sql)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "c:\users\sz\google_classroom\google_classroom\main.py", line 96, in pull_data
2020/09/04 11:01:11 - Google Classroom - (stdout) usage.batch_pull_data(dates=date_range_string, overwrite=False)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "c:\users\sz\google_classroom\google_classroom\timer.py", line 22, in wrapper
2020/09/04 11:01:11 - Google Classroom - (stdout) results = func(*args, **kwargs)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "c:\users\sz\google_classroom\google_classroom\endpoints\base.py", line 265, in batch_pull_data
2020/09/04 11:01:11 - Google Classroom - (stdout) self._write_to_db(df)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "c:\users\sz\google_classroom\google_classroom\endpoints\base.py", line 97, in _write_to_db
2020/09/04 11:01:11 - Google Classroom - (stdout) self.sql.insert_into(self.table_name, df, chunksize=10000)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlsorcery\__init__.py", line 210, in insert_into
2020/09/04 11:01:11 - Google Classroom - (stdout) dtype=dtype,
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\core\generic.py", line 2663, in to_sql
2020/09/04 11:01:11 - Google Classroom - (stdout) method=method,
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 521, in to_sql
2020/09/04 11:01:11 - Google Classroom - (stdout) method=method,
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 1317, in to_sql
2020/09/04 11:01:11 - Google Classroom - (stdout) table.insert(chunksize, method=method)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 755, in insert
2020/09/04 11:01:11 - Google Classroom - (stdout) exec_insert(conn, keys, chunk_iter)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\pandas\io\sql.py", line 669, in _execute_insert
2020/09/04 11:01:11 - Google Classroom - (stdout) conn.execute(self.table.insert(), data)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 984, in execute
2020/09/04 11:01:11 - Google Classroom - (stdout) return meth(self, multiparams, params)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\sql\elements.py", line 293, in _execute_on_connection
2020/09/04 11:01:11 - Google Classroom - (stdout) return connection._execute_clauseelement(self, multiparams, params)
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1103, in _execute_clauseelement
2020/09/04 11:01:11 - Google Classroom - (stdout) distilled_params,
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1288, in _execute_context
2020/09/04 11:01:11 - Google Classroom - (stdout) e, statement, parameters, cursor, context
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1482, in _handle_dbapi_exception
2020/09/04 11:01:11 - Google Classroom - (stdout) sqlalchemy_exception, with_traceback=exc_info[2], from_=e
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise_
2020/09/04 11:01:11 - Google Classroom - (stdout) raise exception
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\base.py", line 1228, in _execute_context
2020/09/04 11:01:11 - Google Classroom - (stdout) cursor, statement, parameters, context
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\dialects\mssql\pyodbc.py", line 413, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout) cursor, statement, parameters, context=context
2020/09/04 11:01:11 - Google Classroom - (stdout) File "C:\Users\SZ\.virtualenvs\google_classroom-Q34DGSlj\lib\site-packages\sqlalchemy\engine\default.py", line 587, in do_executemany
2020/09/04 11:01:11 - Google Classroom - (stdout) cursor.executemany(statement, parameters)
2020/09/04 11:01:11 - Google Classroom - (stdout) sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S22', "[42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid column name 'ImportDate'. (207) (SQLExecute); [42S22] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)")
2020/09/04 11:01:11 - Google Classroom - (stdout) [SQL: INSERT INTO custom.[GoogleClassroom_StudentUsage] ([Email], [AsOfDate], [LastUsedTime], [ImportDate]) VALUES (?, ?, ?, ?)]
[...]
2020/09/04 11:01:11 - Google Classroom - (stdout) (Background on this error at: http://sqlalche.me/e/f405)
A couple days ago I pulled down the last few months of commits (in an attempt to see if that would address the deadlocking crash that we started to see again), so I'm guessing that's the proximal cause. I can easily create what appears to be the missing column in our DB, just passing along as an FYI that there may have been a breaking change in the DB structure at some point.
When running the StudentSubmissions endpoint query, the following error occurs (and is only visible after retries are exhausted or disabled):
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('40001', '[40001] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 85) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)')
[SQL: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]]
[parameters: ('custom', 'BASE TABLE')]
(Background on this error at: http://sqlalche.me/e/dbapi)
Right now the database writes are happening as part of the batch request callback. This may be creating performance bottlenecks. One option would be to move this to the batch level and cache the data for writing until the entire batch request is finished.
There is a late
boolean field in the API that would be good to add for simpler flagging of late assignments in analysis. I'm not sure why we didn't add this initially, but it seems like a simple addition.
Task: Add ability to add, delete, and update student items based on the source of truth provided.
@dchess - Was thinking more about the error we looked at today, and (admittedly not having fully immersed myself in the code), I don't think it was due to not having quotes around the date in the .env file.
I think it will always fail if there is no data in the student usage table and that it only worked today because my workaround fed data into the student usage data prior uncommenting the original code.
usage = StudentUsage(admin_reports_service, sql, config, org_unit_id)
last_date = usage.get_last_date()
start_date = last_date + timedelta(days=1) or datetime.strptime(config.SCHOOL_YEAR_START, "%Y-%m-%d")
My quick read of StudentUsage.get_last_date() is that it's going to return None if there's no data previously loaded in the table, at which point the next line will throw an error because you can't add to None.
I had enough energy to type up this issue, but not enough to clear my database and try from scratch again to double check, so hopefully this is useful, and apologies if it's a false alarm.
Task: create functions for comparing the most recently pulled list of items with a provided "master list" of items. This should result in a list of changes that need to be made in the form of deletes, adds, and updates.
One previously ignored endpoint that could be valuable is the Invitations resource. This could be useful for identifying students who have been sent an invite, but haven't shown up in the Students by course dataset.
Task: Take the existing batching logic and adapt it for the new write endpoints. This may also include handling additional error types.
@zkagin When a course is archived in Google Classroom it doesn't get pulled by the API. This currently only pulls active.
Task: Add a new endpoint for adding or updating new courses, and test it in both sandbox and via new written tests.
This library currently assumes a lot about the structure of data coming from the Google Classroom API and what files/tables exist in the integrated database. The goal here would be to make sure to check and no-op rather than crash in these cases, and potentially to warn the user with a clear error message if something is wrong and needs to be fixed to continue.
Right now, when the library is called multiple times, it can duplicate some data. The goal is eventually to allow the user to run the script as many times as needed without worry that they may accidentally delete old data or double-count new data.
One option to solve this may involve transitioning away from the files caches in /data which currently are causing some of these issues and instead using the database as the source of truth, potentially instead using file logs only for auditing purposes.
Rather than querying the StudentSubmission endpoint by looping over Courses, it could be preferable to loop over individual assignments (CourseWork). That way the query could be date-bound by pre-filtering the assignments by creationTime. While this might result in more calls, it may reduce pagination because it would be less data volume overall. Using Courses to query pulls in every assignment ever in that class and many may no longer be relevant to monitor after a certain point. If history is necessary to maintain, one approach would be to handle similar to the StudentUsage data and truncate a subset and append.
https://developers.google.com/admin-sdk/reports/v1/appendix/usage/customer/classroom
Will this report give us better usage data than the userUsage endpoint we are using now?
StudentSubmissions(classroom_service).get_and_write_to_db(
sql, course_ids, debug=config.DEBUG
)
When this call encounters certain courses, it is loading way more records than exist in the course. For instance there is a course with 22 assignments and 29 students, that returns 100Ks of records. It should only be pulling back ~600+ records.
The google_classroom library doesn't currently have any tests. The goal of this issue is to add testing coverage for the API processing and saving into a DB. For each call to the API, this would include:
Task: Add ability to add, delete, and update teacher items based on the source of truth provided.
Task: Add ability to add, delete, and update topics items based on the source of truth provided.
Pulling Google Meet data currently takes about 15 minutes for us, a relatively small network of schools. For a mid- to large-sized network, this could take a extremely long time.
It's probably a good idea eventually to change the Meet logic to query only for new data since the last update, and append that to the table, rather than dropping the table and re-adding everything.
Traceback (most recent call last):
File "main.py", line 131, in
main(Config)
File "main.py", line 126, in main
StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
File "/code/timer.py", line 22, in wrapper
results = func(*args, **kwargs)
File "/code/api.py", line 182, in batch_pull_data
batch.add(request, request_id=request_id)
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/googleapiclient/http.py", line 1398, in add
raise KeyError("A request with this ID already exists: %s" % request_id)
KeyError: 'A request with this ID already exists: 41010390897;None;2'
When importing the CourseWork endpoint, the following error is occurring:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Error converting data type nvarchar to float. (8114) (SQLExecDirectW)')
This is either coming from the topicId or one of the dueDate/dueTime fields.
Some courses use announcements to share resources and it may be a useful datapoint for analysis: https://developers.google.com/classroom/reference/rest/v1/courses.announcements#Announcement
When running the usage report for multiple dates if more than the most recent date has partial data, reruns only refresh the latest date and do not account for partial data on past dates.
One option would be to flag any dates that have partial data warnings and when rerunning the job truncate those dates.
Even though the tests are failing with an exit code 1, the Github Action is still allowing the check to pass.
If a date in Google Classroom is outside of the bounds panda can parse, add error handling that logs the record id (and course id if relevant) and inserts it with a null date. Example if a coursework timestamp looks like "9999-10-10 04:59:00" this would be handled as a NULL date instead but log the coursework id and the corresponding course id.
This issue was identified by KIPP Texas.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.