University of Chicago
MACS 37000
Write-up draft Medium: HERE Write-up draft google-doc: HERE
Presentation link: HERE
Meeting Notes here
NEXT MEETING: Thurs 05/16/24, 11-12:20
TO DO BEFORE NEXT MEETING
- Make multimodal embeddings (Everyone-pick what you're interested in/can do)
- Text embed
- Text to Graph embed
- Image & Text embed
- Image & Text to Graph embed (extract features from images?)
- Knowledge graph (Tian)
- Url: https://workspace-preview.neo4j.io/connection/connect
- username and password in the shared drive
- Play with Neo4j to make queries & test embeddings (Everyone)
NEW DATA
- A large-scale Amazon Reviews dataset collected in 2023 by the McAuley Lab at UCSD. This dataset contains produdct and review information from 1996-2023.
- We will be focusing on the "Amazon_Fashion" category of products
- The Review and Metadata data files can be merged with a common key called "parent_asin"
- A sample tabular merged datafile with 1000 items can be found in the "data" folder called amazon_subset_0512.csv
Research framing:
![Screenshot 2024-05-12 at 6 19 33 PM](https://private-user-images.githubusercontent.com/143452850/329886613-01efb021-a697-4d27-83c3-1239e268acbf.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMyNDY4NjYsIm5iZiI6MTcyMzI0NjU2NiwicGF0aCI6Ii8xNDM0NTI4NTAvMzI5ODg2NjEzLTAxZWZiMDIxLWE2OTctNGQyNy04M2MzLTEyMzllMjY4YWNiZi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgwOVQyMzM2MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hZjc2MGU4ODM1MmUxYWRjMmM1M2YzY2IyM2IxODQxMzg0NTI3N2YwYjNkZDc4NzIwMTRmY2YwYzY3MjY4ZjU5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.NpJjTA6fDE9dKMwZaNtFc8ir7eIxzUGDyCxFoE51bzY)
PROJECT
- 3 forms of data in a SINGLE model
- 3 types of models (from 3 separate weeks)
- Validate inferences, predictions, results
PRESENTATION
- 8 minutes/32 slides for 4-person team
- Slides due Thursday 5/23 4:00pm
- Ignite style presentation Thursday 5/23, 4:30pm (We should aim to start presentation over the weekend?)
WRITEUP
- Due Friday 5/24, 5:00pm
- Medium post: A detailed, entertaining and informative public-facing Medium.com blog-post about their projects that includes the motivation, methodological justification and detail, descriptive data and deep learning modeling, interpretation of findings (e.g., discovered structures, predictions, generations), conclusion, and annotated code appendix. These should not read like an academic paper, but a mixture of (1) explanatory tutorial; and (2) digital museum exhibit, balancing intermittent text with figures, description boxes, equations, and/or conceptual diagrams including at least one visual element (e.g., figure, graph, conceptual diagram) for every 300 words of text; and a minimum total of 5000 words and 17 visual elements.
Amazon Reviews
- McAuley Lab Amazon Reviews 2023 dataset
Amazon KDD
- Amazon KDD Cup: https://www.aicrowd.com/challenges/amazon-kdd-cup-2024-multi-task-online-shopping-challenge-for-llms
- Training data (maybe):
- Webshop: https://github.com/princeton-nlp/webshop
- An agent benchmark using Webshop: https://github.com/THUDM/AgentBench
- Langchain: https://python.langchain.com/docs/
- Dataset and Instruction: https://gitlab.aicrowd.com/aicrowd/challenges/amazon-kdd-cup-2024/amazon-kdd-cup-2024-starter-kit/-/tree/master
https://discourse.aicrowd.com/t/where-is-the-shopbench-amazon-dataset/9730