Git Product home page Git Product logo

charlesdedampierre / bunkatopics Goto Github PK

View Code? Open in Web Editor NEW
122.0 122.0 13.0 232.94 MB

๐Ÿ—บ๏ธ Data Cleaning and Textual Data Visualization ๐Ÿ—บ๏ธ

Home Page: https://charlesdedampierre.github.io/BunkaTopics/index.html

License: MIT License

Python 61.85% Makefile 1.49% Dockerfile 0.76% Shell 0.41% CSS 2.59% JavaScript 32.90%
cartography data-cleaning explainability fine-tuning llms machine-learning natural-language-processing nlp summarization topic-modeling

bunkatopics's People

Contributors

alcime avatar charlesdedampierre avatar elishowk avatar stjohn96 avatar tiphainelaurent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bunkatopics's Issues

bunka.get_topics

Hello there,
I successfully embed and create topics, but when I get to bunka.get_topics to extract them I get the following error message which I cannot solve nor understand. What is the problem? Tahnk you!

2024-03-07 17:44:14 - Bunka - INFO - Computing the topics
INFO:Bunka:Computing the topics

TypeError Traceback (most recent call last)
in <cell line: 2>()
1 # Get the list of topics
----> 2 df_topics = bunka.get_topics(n_clusters = 15, name_length=5, min_count_terms = 2)
3 df_topics = df_topics[['topic_id', 'name', 'size']].copy()

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value, sort)
504 return _unstack_frame(obj, level, fill_value=fill_value, sort=sort)
505 else:
--> 506 return obj.T.stack(future_stack=True)
507 elif not isinstance(obj.index, MultiIndex):
508 # GH 36113

TypeError: DataFrame.stack() got an unexpected keyword argument 'future_stack'

Guidance on Predicting Cluster for Trained Topic Model

@charlesdedampierre First and foremost, thank you for creating such an excellent library.


Description

I have successfully trained a topic model using your library on our dataset, which has been instrumental in uncovering various topics across the corpus. The model training and initial analysis have provided valuable insights into the dominant themes within my data.

However, we've identified a gap in documentation and usage guidelines, specifically regarding the application of the trained model to predict the cluster (or topic distribution) for new, unseen documents.

Is it possible to add a reference document on how to predict cluster membership with this library?


pb dans le notebook bourdieu

Bonjour Charles,

J'obtiens cette erreur :

Creating new labels for clusters: 0%| | 0/10 [00:00<?, ?it/s]

ValueError Traceback (most recent call last)
in <cell line: 13>()
11 }
12
---> 13 bourdieu_fig = bunka.visualize_bourdieu(
14 generative_model=llm,
15 x_left_words=["this is a positive content"],

12 frames

/usr/local/lib/python3.10/dist-packages/langchain/llms/huggingface_hub.py in _call(self, prompt, stop, run_manager, **kwargs)
112 response = self.client(inputs=prompt, params=params)
113 if "error" in response:
--> 114 raise ValueError(f"Error raised by inference API: {response['error']}")
115 if self.client.task == "text-generation":
116 # Text generation return includes the starter text.

ValueError: Error raised by inference API: Internal Server Error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.