Git Product home page Git Product logo

fashion-shopping-assistant's Introduction

AI Confident for Amazon Fashion

$${\color{CadetBlue}Mingxuan\space He,\space \color{Maroon}Erika\space Zhang,\space \color{Thistle}Tianze\space Zhang,\space \color{MidnightBlue}Tianfang\space Zhu}$$

University of Chicago

MACS 37000

Write-up draft Medium: HERE Write-up draft google-doc: HERE

Presentation link: HERE

Latest Team Update: Sunday 05/12/24

Meeting Notes here

NEXT MEETING: Thurs 05/16/24, 11-12:20

TO DO BEFORE NEXT MEETING

  1. Make multimodal embeddings (Everyone-pick what you're interested in/can do)
    • Text embed
    • Text to Graph embed
    • Image & Text embed
    • Image & Text to Graph embed (extract features from images?)
  2. Knowledge graph (Tian)
  1. Play with Neo4j to make queries & test embeddings (Everyone)

NEW DATA

  1. A large-scale Amazon Reviews dataset collected in 2023 by the McAuley Lab at UCSD. This dataset contains produdct and review information from 1996-2023.
  2. We will be focusing on the "Amazon_Fashion" category of products
  3. The Review and Metadata data files can be merged with a common key called "parent_asin"
  4. A sample tabular merged datafile with 1000 items can be found in the "data" folder called amazon_subset_0512.csv

Research framing:

Screenshot 2024-05-12 at 6 19 33 PM

FINALS details

PROJECT

  • 3 forms of data in a SINGLE model
  • 3 types of models (from 3 separate weeks)
  • Validate inferences, predictions, results

PRESENTATION

  • 8 minutes/32 slides for 4-person team
  • Slides due Thursday 5/23 4:00pm
  • Ignite style presentation Thursday 5/23, 4:30pm (We should aim to start presentation over the weekend?)

WRITEUP

  • Due Friday 5/24, 5:00pm
  • Medium post: A detailed, entertaining and informative public-facing Medium.com blog-post about their projects that includes the motivation, methodological justification and detail, descriptive data and deep learning modeling, interpretation of findings (e.g., discovered structures, predictions, generations), conclusion, and annotated code appendix. These should not read like an academic paper, but a mixture of (1) explanatory tutorial; and (2) digital museum exhibit, balancing intermittent text with figures, description boxes, equations, and/or conceptual diagrams including at least one visual element (e.g., figure, graph, conceptual diagram) for every 300 words of text; and a minimum total of 5000 words and 17 visual elements.

Resources:

Amazon Reviews

Amazon KDD

FAQ

https://discourse.aicrowd.com/t/where-is-the-shopbench-amazon-dataset/9730

Team Google Drive

Link here

fashion-shopping-assistant's People

Contributors

erikaz1 avatar mingxuan-he avatar beilrz avatar marugannwg avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

fashion-shopping-assistant's Issues

Meeting 05/12/24 4pm Zoom link

Erika Zhang is inviting you to a scheduled Zoom meeting.

Topic: DL Group Meeting!
Time: May 12, 2024 04:00 PM Central Time (US and Canada)

Join Zoom Meeting
https://uchicago.zoom.us/j/98374523894?pwd=WWxnMzVscnBqUE5KdW1sYU9KejlLUT09

Meeting ID: 983 7452 3894
Passcode: 565602


One tap mobile
+13092053325,,98374523894#,,,,*565602# US
+13126266799,,98374523894#,,,,*565602# US (Chicago)


Dial by your location
• +1 309 205 3325 US
• +1 312 626 6799 US (Chicago)
• +1 646 558 8656 US (New York)
• +1 646 931 3860 US
• +1 301 715 8592 US (Washington DC)
• +1 305 224 1968 US
• +1 346 248 7799 US (Houston)
• +1 360 209 5623 US
• +1 386 347 5053 US
• +1 507 473 4847 US
• +1 564 217 2000 US
• +1 669 444 9171 US
• +1 669 900 9128 US (San Jose)
• +1 689 278 1000 US
• +1 719 359 4580 US
• +1 253 205 0468 US
• +1 253 215 8782 US (Tacoma)

Meeting ID: 983 7452 3894
Passcode: 565602

Find your local number: https://uchicago.zoom.us/u/abQOVLuLw5


Join by SIP
[email protected]


Join by H.323
• 162.255.37.11 (US West)
• 162.255.36.11 (US East)
• 115.114.131.7 (India Mumbai)
• 115.114.115.7 (India Hyderabad)
• 213.19.144.110 (Amsterdam Netherlands)
• 213.244.140.110 (Germany)
• 103.122.166.55 (Australia Sydney)
• 103.122.167.55 (Australia Melbourne)
• 149.137.40.110 (Singapore)
• 64.211.144.160 (Brazil)
• 149.137.68.253 (Mexico)
• 69.174.57.160 (Canada Toronto)
• 65.39.152.160 (Canada Vancouver)
• 207.226.132.110 (Japan Tokyo)
• 149.137.24.110 (Japan Osaka)

Meeting ID: 983 7452 3894
Passcode: 565602

Meeting with James 04/26

Here're some notes from Friday's meeting with James alongside my thoughts.

James recommended two things:

  1. Scrape Amazon for the list of "products similar to this" to create a distance metric between products
  2. Create multimodal embeddings on product image + description, and use vector search to query the product DB

I personally think (2) is more doable than (1) unless we have a easy way to scrape millions of Amazon products ourselves. I'm doing the coding presentation for multimodal learning next week so hopefully will experiment with that.

James also mentioned he likes the idea of a natural language (LLM agent) interface. But he also said will still need a "social scientific research question" rather than a general goal for a real world application.

Absolutely any thoughts are welcomed! plz also feel free to add notes on anything I missed from the meeting

Some prelimnary work on the 5/16 parquet

  • title, features and description columns contain a lot of text related to the product;
  • details column is a little trick, it is a dictionary, with very few working keys indicating some not NONE value -- has information as well.
  • bought_together and categories columns seem to be completely empty (or empty lists only)
  • title_review, text, object, asin, user_id, timestamp, helpful_vote, verified_purchase --- they are same length list in each row, representing each review under the product.

-images column contain list of dictionaries with values as URL to images

These are ready to use --
average_rating float64
rating_number int64
price float64
store String object

Final Project Group Meeting TODAY 6:15pm ish?

Hi guys, we need to find a time to meet and discuss the final project. I can't find some of your emails. Tian and I have class until 6pm and will be at 1155 afterwards. We are thinking of meeting to discuss this around 6-6:15pm to 7 ish.

Here's a zoom link if convenient. Please let us know if you can or can't make it.

Erika Zhang is inviting you to a scheduled Zoom meeting.

Topic: 37000 Final Project Team Meeting
Time: Apr 24, 2024 06:15 PM Central Time (US and Canada)

Join Zoom Meeting
https://uchicago.zoom.us/j/7568137988?pwd=Ulgxbk9HZUpXaU0rWTJmVzBlNkV5UT09&omn=94177260673

Meeting ID: 756 813 7988
Passcode: 566544


One tap mobile
+13092053325,,7568137988#,,,,*566544# US
+13126266799,,7568137988#,,,,*566544# US (Chicago)


Dial by your location
• +1 309 205 3325 US
• +1 312 626 6799 US (Chicago)
• +1 301 715 8592 US (Washington DC)
• +1 305 224 1968 US
• +1 646 558 8656 US (New York)
• +1 646 931 3860 US
• +1 669 444 9171 US
• +1 669 900 9128 US (San Jose)
• +1 689 278 1000 US
• +1 719 359 4580 US
• +1 253 205 0468 US
• +1 253 215 8782 US (Tacoma)
• +1 346 248 7799 US (Houston)
• +1 360 209 5623 US
• +1 386 347 5053 US
• +1 507 473 4847 US
• +1 564 217 2000 US

Meeting ID: 756 813 7988
Passcode: 566544

Find your local number: https://uchicago.zoom.us/u/acAKuQrmv


Join by SIP
[email protected]


Join by H.323
• 162.255.37.11 (US West)
• 162.255.36.11 (US East)
• 115.114.131.7 (India Mumbai)
• 115.114.115.7 (India Hyderabad)
• 213.19.144.110 (Amsterdam Netherlands)
• 213.244.140.110 (Germany)
• 103.122.166.55 (Australia Sydney)
• 103.122.167.55 (Australia Melbourne)
• 149.137.40.110 (Singapore)
• 64.211.144.160 (Brazil)
• 149.137.68.253 (Mexico)
• 69.174.57.160 (Canada Toronto)
• 65.39.152.160 (Canada Vancouver)
• 207.226.132.110 (Japan Tokyo)
• 149.137.24.110 (Japan Osaka)

Meeting ID: 756 813 7988
Passcode: 566544

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.