alexander-moore / ds502_final Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 87.43 MB

DS502 Final Project

HTML 41.41% Python 0.09% Jupyter Notebook 58.50% R 0.01%

ds502_final's Introduction

Postdoctoral Computer Vision 🔭 Research Scientist at Lawrence Livermore National Laboratory. Personal github account.

Interested in transformers for video processing. Two-way attention. Multimodal LLMs. Parameter-efficient fine-tuning.

I'm currently working on

Multimodal 🚀 Large Language Models:

Uploading code ~~soon~~ for adapting any image transformer and transformer language model into a multimodal-llm (MLLM)
Train a custom adapter to link the latent representations of the two token sequences
Potentially fine-tune with parameter-efficient fine-tuning (peft, LoRA)
Custom training pipeline for bootstrapping text-image pairs into <text, image, text>, <text, image>, <image, text> as an augmentation
Hosting as a Gradio or Huggingface space to demo

Chemical 🚨 Sensing:

Novel architectures for multitask learning + early classification of time series
Optimized preprocessing
Learning from Samples Worth Learning

Molecular 🔬 Representations:

VicReg over molecular images for augmentation-invariant embeddings
Graph Transformers over 3D molecular structure for unsupervised property embeddings

I'm interested and have ongoing projects in

Fine-tuning LLMs for multimodality in images, video, or domain-specific data types
Graph contrastive representation learning
Sequential representations of time series

Past work:

Publications and some public projects on my page

ds502_final's People

Contributors

Stargazers

Watchers

ds502_final's Issues

Final Meeting (ever?) :crying:

https://www.when2meet.com/?8436517-scqjV

Let's talk about having a final meeting
I know Mia and Quincy have mentioned things they are still implementing

I wanted to meet to make sure we're all on the same page and ready to present, write, and submit our findings. Quick meeting to discuss everything and create a cohesive presentation goal?

If you can't meet, no big deal, just post here your specific topics that you want to Present on and Write on for the report

lets meet for 501 502

https://www.when2meet.com/?8396742-DxE3U

lets get unified for our goals, work division, and presentation themes

object now or forever hold your peace

https://www.kaggle.com/c/nfl-big-data-bowl-2020/data

Guys I got my TDA working on the nn_input data Ethan put together!!! I'm going to need more time to interpret it but I've got some lovely clustering in my simplicial complex going on right now. Check it out.

Project Proposal posted

Please read:
https://www.kaggle.com/c/nfl-big-data-bowl-2020/discussion/111918#latest-652437

https://www.kaggle.com/c/nfl-big-data-bowl-2020/data

https://www.kaggle.com/statsbymichaellopez/nfl-tracking-wrangling-voronoi-and-sonars

https://www.kaggle.com/c/nfl-big-data-bowl-2020/discussion/112303#latest-655229

you may need a kaggle acct to see the links above

Hey all
Let's discuss here the 502 Final Project Proposal
If we've all agreed on the NFL data, let's move forward and talk about what methods we will use.
Feel free to edit the document proposal I pushed.
one concern I have with the NFL data is how noisy it will be: im worried we will have trouble making any meaningful models since the yardage might be barely-correlated to the data predictors.
We can talk about it. idk haha.

Due Tuesday

Hi all - this proposal is due Tuesday in-class

Looks like people have been working into the Readme - let's try to get a cohesive submission together by monday, feel free to comment your ideas and concerns here. I put all the parts from the submission as headers.

It's not graded but let's have a tight idea for our goals

Need a wizard

Hi all
Im working with Ethans code from his version_2, but I want to work with X,Y,... data instead of mean(X), etc.

To do this, I'm trying to append the entire team's X coordinate instead of the mean. However, this gives an error in the np.amax(isnan()) line, and I don't see why

Any thoughts?

nn_input = []
nn_target = []
for _,play in raw_data.groupby(['PlayId']):
state_features = []
state_features.append(get_distance_to_touchdown(play['YardLine'].iloc[0], play['PossessionTeam'].iloc[0], play['FieldPosition'].iloc[0]))
state_features.append(get_time(play['Quarter'].iloc[0],play['GameClock'].iloc[0]))
state_features.append(play['Down'].iloc[0])
state_features.append(play['Distance'].iloc[0])
state_features.append(get_time_since_snap(play['TimeHandoff'].iloc[0], play['TimeSnap'].iloc[0]))
offense_features = get_offense_features(play['OffenseFormation'].iloc[0], play['OffensePersonnel'].iloc[0])
defense_features = get_defense_features(play['DefendersInTheBox'].iloc[0], play['DefensePersonnel'].iloc[0])
for t,team in play.groupby(['Team']):
team_features = []
print(team['X'])
team_features = team_features + list(team['X'])
team_features = team_features + list(team['Y'])
team_features = team_features + list(team['A'])
team_features = team_features + list(team['Dis'])
team_features = team_features + list(team['Orientation'])
team_features = team_features + list(team['Dir'])
team_features = team_features + list(team['PlayerHeight'])
team_features = team_features + list(team['PlayerWeight'])

    if t == 'home':
        team_features.append(team['HomeScoreBeforePlay'].iloc[0])
        if team['PossessionTeam'].iloc[0] == team['HomeTeamAbbr'].iloc[0]:
            offense_features = offense_features + team_features
        else:
            defense_features = defense_features + team_features
    elif t == 'away':
        team_features.append(team['VisitorScoreBeforePlay'].iloc[0])
        if team['PossessionTeam'].iloc[0] == team['VisitorTeamAbbr'].iloc[0]:
            offense_features = offense_features + team_features
        else:
            defense_features = defense_features + team_features
print(type(state_features), type(offense_features), type(defense_features))
print(type(state_features + offense_features + defense_features))
if np.amax(np.isnan(state_features + offense_features + defense_features)) == 0:
    nn_input.append(state_features + offense_features + defense_features)
    nn_target.append(play['Yards'].iloc[0])

General questions

I've been reading a lot about football strategy today. I noticed in the rules of the competition that we're not allowed to use external data when running the analysis. Would it be considered using external data if we classified all the teams as aggressive/offensive or more mild/defensive? I was thinking that if we lumped them into some general categories like that, it might help predict how they'd respond to different field situations. I'm just not sure how we could classify them that way using only the csv--I know the Pats are a fairly aggressive team, but that comes from my own background knowledge, and I'd need to watch more games to solidify my opinion on most of the other teams. Thoughts?