Git Product home page Git Product logo

lnlp's Introduction

Large Natural Language Processing

The goal of this package is to be able run several distinct NLP related algorithms in parallel either within users own projects or through a provided CLI, currently a wrapper for the BERTopic topic modeling package. There is also infrastructure to continue adding new features to said CLI as well as use existing components of the CLI within users own projects.

Table of Contents

Installation and Usage

  • Make sure that you have python 3.9 installed

clone the repo

git clone https://github.com/Jayman391/lnlp.git

create and activate virtual environment

python3.9 -m venv venv

source venv/bin/activate

install requirements

python -m pip install -r requirements.txt

python -m spacy download en

run CLI

python main.py

for now only the topic modeling section is functional

for data sets under ~5000 documents, you might also need to rerun the script a few times to get a good partition of the data, as sometimes the clustering algorithm gets stuck in a local optimum which has only two clusters

there are also some runtime errors that occur somewhat regularly, another cause to rerun the script. There are some bugs already open on the issues page in the repo

add data

can be any number of columns, just has to have a column named text

python main.py --data=tests/test_data/usa-vaccine-comments.csv

add an output directory

python main.py --data=tests/test_data/usa-vaccine-comments.csv --save_dir=output

specify number of samples

python main.py --data=tests/test_data/usa-vaccine-comments.csv --num_samples=1000

run a cli menu sequence (read documentation)

python main.py --data=tests/test_data/usa-vaccine-comments.csv --sequence='1,11,21,31,41,9'

all together now

python main.py --save_dir=output --data=tests/test_data/usa-vaccine-comments.csv --num_samples=1000 --sequence='1,11,21,31,41,9'

Contributing

Your contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make LNLP better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

lnlp's People

Contributors

jayman391 avatar

Watchers

 avatar

lnlp's Issues

IndexError: arrays used as indices must be of integer (or boolean) type

(nllp) MacBook-Air-3:lnlp user$ python main.py tests/test_data/data.csv

Welcome to the NLLP CLI!
Loaded data from tests/test_data/data.csv

  1. Run a Topic Model
  2. Run an Optimization routine for a Topic Model (GPU reccomended)
  3. Run a Classification Model
  4. Load Global Configuration Files
  5. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}]}
  6. Select LLM to generate Embeddings
  7. Select Dimensionality Reduction Technique
  8. Select Clustering Technique
  9. Fine Tuning
  10. Plotting
  11. Save Session Configuration
  12. Run Topic Model
  13. Load Session Configuration
  14. Back
  15. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}]}
  16. all-MiniLM-L6-v2
  17. all-MiniLM-L12-v2
  18. multi-qa-MiniLM-L6-cos-v1
  19. all-mpnet-base-v2
  20. Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit
  21. Muennighoff/SGPT-125M-weightedmean-nli-bitfit
  22. Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit
  23. Add huggingface Model
  24. Back
  25. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'all-MiniLM-L12-v2'}]}
  26. Select LLM to generate Embeddings
  27. Select Dimensionality Reduction Technique
  28. Select Clustering Technique
  29. Fine Tuning
  30. Plotting
  31. Save Session Configuration
  32. Run Topic Model
  33. Load Session Configuration
  34. Back
  35. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'all-MiniLM-L12-v2'}, {'Topic': 'Dimensionality Reduction'}]}
  36. UMAP
  37. PCA
  38. t-SNE
  39. Truncated SVD
  40. Factor Analysis
  41. Back
  42. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'all-MiniLM-L12-v2'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}]}
  43. Select LLM to generate Embeddings
  44. Select Dimensionality Reduction Technique
  45. Select Clustering Technique
  46. Fine Tuning
  47. Plotting
  48. Save Session Configuration
  49. Run Topic Model
  50. Load Session Configuration
  51. Back
  52. Exit
    Choose an option: 3
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'all-MiniLM-L12-v2'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}]}
  53. hdbscan
  54. kmeans
  55. spectral clustering
  56. dbscan
  57. agglomerative clustering
  58. birch
  59. affinity propagation
  60. mean shift
  61. Back
  62. Exit
    Choose an option: 3
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'all-MiniLM-L12-v2'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}]}
  63. Select LLM to generate Embeddings
  64. Select Dimensionality Reduction Technique
  65. Select Clustering Technique
  66. Fine Tuning
  67. Plotting
  68. Save Session Configuration
  69. Run Topic Model
  70. Load Session Configuration
  71. Back
  72. Exit
    Choose an option: 7
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'all-MiniLM-L12-v2'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'BERTopic(calculate_probabilities=False, ctfidf_model=ClassTfidfTransformer(...), embedding_model=SentenceTransformer(...), hdbscan_model=HDBSCAN(...), language=None, low_memory=False, min_topic_size=10, n_gram_range=(1, 1), nr_topics=None, representation_model=None, seed_topic_list=None, top_n_words=10, umap_model=PCA(...), vectorizer_model=CountVectorizer(...), verbose=False, zeroshot_min_similarity=0.7, zeroshot_topic_list=None)'}]}
    Number of boolean topics: 0
    [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    []
    No plotting options selected. Visualizing all topics, documents, and terms.
    An error occurred. Please try again.
    Would you like to see the error trace? (y/n): y
    Traceback (most recent call last):
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 46, in run
    self._process_responses(self.landing, self.driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 66, in _process_responses
    driver.run_topic_model()
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 112, in run_topic_model
    self._visualize_topics(model, topics, dir)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 119, in _visualize_topics
    topic_viz = model.visualize_topics()
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 2249, in visualize_topics
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/plotting/_topics.py", line 78, in visualize_topics
    IndexError: arrays used as indices must be of integer (or boolean) type

Driver : add params to choice sequence

allow for the comma separated choice sequence to be a dictionary, if it is a dictionary then the value will also be parsed in the sequence execution.

Topic Model: vaccify

add documentation and scripting to be able to run topic models using the VACC

need a shell script with slurm configurations and a setup.txt file

Menu : Learn More

learn more option that opens a browser with more information on a specific feature that a menu implements

ValueError: zero-size array to reduction operation maximum which has no identity

(nllp) MacBook-Air-3:lnlp user$ python main.py tests/test_data/data.csv

Welcome to the NLLP CLI!
Loaded data from tests/test_data/data.csv

  1. Run a Topic Model
  2. Run an Optimization routine for a Topic Model (GPU reccomended)
  3. Run a Classification Model
  4. Load Global Configuration Files
  5. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}]}
  6. Select LLM to generate Embeddings
  7. Select Dimensionality Reduction Technique
  8. Select Clustering Technique
  9. Fine Tuning
  10. Plotting
  11. Save Session Configuration
  12. Run Topic Model
  13. Load Session Configuration
  14. Back
  15. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}]}
  16. all-MiniLM-L6-v2
  17. all-MiniLM-L12-v2
  18. multi-qa-MiniLM-L6-cos-v1
  19. all-mpnet-base-v2
  20. Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit
  21. Muennighoff/SGPT-125M-weightedmean-nli-bitfit
  22. Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit
  23. Add huggingface Model
  24. Back
  25. Exit
    Choose an option: 6
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}]}
  26. Select LLM to generate Embeddings
  27. Select Dimensionality Reduction Technique
  28. Select Clustering Technique
  29. Fine Tuning
  30. Plotting
  31. Save Session Configuration
  32. Run Topic Model
  33. Load Session Configuration
  34. Back
  35. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}]}
  36. UMAP
  37. PCA
  38. t-SNE
  39. Truncated SVD
  40. Factor Analysis
  41. Back
  42. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}]}
  43. Select LLM to generate Embeddings
  44. Select Dimensionality Reduction Technique
  45. Select Clustering Technique
  46. Fine Tuning
  47. Plotting
  48. Save Session Configuration
  49. Run Topic Model
  50. Load Session Configuration
  51. Back
  52. Exit
    Choose an option: 3
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}]}
  53. hdbscan
  54. kmeans
  55. spectral clustering
  56. dbscan
  57. agglomerative clustering
  58. birch
  59. affinity propagation
  60. mean shift
  61. Back
  62. Exit
    Choose an option: 3
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}]}
  63. Select LLM to generate Embeddings
  64. Select Dimensionality Reduction Technique
  65. Select Clustering Technique
  66. Fine Tuning
  67. Plotting
  68. Save Session Configuration
  69. Run Topic Model
  70. Load Session Configuration
  71. Back
  72. Exit
    Choose an option: 4
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}]}
  73. Enable 2-grams
  74. Enable 3-grams
  75. Ignore Words
  76. Enable BM25 weighting
  77. Reduce frequent words
  78. Enable KeyBERT algorithm
  79. Enable ZeroShotClassification
  80. Enable Maximal Marginal Relevance
  81. Enable Part of Speech filtering
  82. Back
  83. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 2-grams'}]}
  84. Select LLM to generate Embeddings
  85. Select Dimensionality Reduction Technique
  86. Select Clustering Technique
  87. Fine Tuning
  88. Plotting
  89. Save Session Configuration
  90. Run Topic Model
  91. Load Session Configuration
  92. Back
  93. Exit
    Choose an option: 5
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 2-grams'}, {'Topic': 'Plotting'}]}
  94. Enable Topic Visualizations
  95. Enable Document Visualizations
  96. Enable Term Visualizations
  97. Enable All Visualizations
  98. Specify Plot Directory
  99. Specify Web Browser
  100. Back
  101. Exit
    Choose an option: 4
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 2-grams'}, {'Topic': 'Plotting'}, {'Plotting': 'Enable All Visualizations'}]}
  102. Select LLM to generate Embeddings
  103. Select Dimensionality Reduction Technique
  104. Select Clustering Technique
  105. Fine Tuning
  106. Plotting
  107. Save Session Configuration
  108. Run Topic Model
  109. Load Session Configuration
  110. Back
  111. Exit
    Choose an option: 5
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 2-grams'}, {'Topic': 'Plotting'}, {'Plotting': 'Enable All Visualizations'}, {'Topic': 'Plotting'}]}
  112. Enable Topic Visualizations
  113. Enable Document Visualizations
  114. Enable Term Visualizations
  115. Enable All Visualizations
  116. Specify Plot Directory
  117. Specify Web Browser
  118. Back
  119. Exit
    Choose an option: 5
    Enter the plot directory: tests
    directory : tests
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 2-grams'}, {'Topic': 'Plotting'}, {'Plotting': 'Enable All Visualizations'}, {'Topic': 'Plotting'}, {'Plotting': 'tests'}]}
  120. Select LLM to generate Embeddings
  121. Select Dimensionality Reduction Technique
  122. Select Clustering Technique
  123. Fine Tuning
  124. Plotting
  125. Save Session Configuration
  126. Run Topic Model
  127. Load Session Configuration
  128. Back
  129. Exit
    Choose an option: 7
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Cluster Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-nli-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Cluster'}, {'Cluster': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 2-grams'}, {'Topic': 'Plotting'}, {'Plotting': 'Enable All Visualizations'}, {'Topic': 'Plotting'}, {'Plotting': 'tests'}, {'Topic': 'BERTopic(calculate_probabilities=False, ctfidf_model=ClassTfidfTransformer(...), embedding_model=SentenceTransformer(...), hdbscan_model=HDBSCAN(...), language=None, low_memory=False, min_topic_size=10, n_gram_range=(1, 1), nr_topics=None, representation_model=None, seed_topic_list=None, top_n_words=10, umap_model=PCA(...), vectorizer_model=CountVectorizer(...), verbose=False, zeroshot_min_similarity=0.7, zeroshot_topic_list=None)'}]}
    Number of boolean topics: 0
    [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
    [{'Plotting': 'Enable All Visualizations'}, {'Plotting': 'tests'}]
    An error occurred. Please try again.
    Would you like to see the error trace? (y/n): y
    Traceback (most recent call last):
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 46, in run
    self._process_responses(self.landing, self.driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 68, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 66, in _process_responses
    driver.run_topic_model()
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 106, in run_topic_model
    self._visualize_topics(model, topics, dir)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 119, in visualize_topics
    topic_viz = model.visualize_topics()
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/bertopic.py", line 2249, in visualize_topics
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/plotting/topics.py", line 79, in visualize_topics
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 2887, in fit_transform
    self.fit(X, y, force_all_finite)
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 2780, in fit
    self.embedding
    , aux_data = self.fit_embed_data(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 2826, in fit_embed_data
    return simplicial_set_embedding(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 1086, in simplicial_set_embedding
    graph.data[graph.data < (graph.data.max() / float(n_epochs_max))] = 0.0
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/numpy/core/_methods.py", line 41, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
    ValueError: zero-size array to reduction operation maximum which has no identity

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

(nllp) MacBook-Air-3:lnlp user$ python main.py tests/test_data/data.csv

Welcome to the NLLP CLI!
Loaded data from tests/test_data/data.csv

  1. Run a Topic Model
  2. Run an Optimization routine for a Topic Model (GPU reccomended)
  3. Run a Classification Model
  4. Load Global Configuration Files
  5. Exit
    Choose an option: 1
  6. Select LLM to generate Embeddings
  7. Select Dimensionality Reduction Technique
  8. Select Clustering Technique
  9. Fine Tuning
  10. Plotting
  11. Save Topic Model Configuration
  12. Load Topic Model Configuration
  13. Save Session Data
  14. Run Topic Model
  15. Back
  16. Exit
    Choose an option: 8
    Please enter the path of the directory to save this sessions data: tests
  17. Select LLM to generate Embeddings
  18. Select Dimensionality Reduction Technique
  19. Select Clustering Technique
  20. Fine Tuning
  21. Plotting
  22. Save Topic Model Configuration
  23. Load Topic Model Configuration
  24. Save Session Data
  25. Run Topic Model
  26. Back
  27. Exit
    Choose an option: 9
    2024-03-25 15:44:48,448 - BERTopic - Embedding - Transforming documents to embeddings.
    Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:43<00:00, 1.45it/s]
    2024-03-25 15:45:31,909 - BERTopic - Embedding - Completed ✓
    2024-03-25 15:45:31,909 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
    UMAP( verbose=True)
    Mon Mar 25 15:45:31 2024 Construct fuzzy simplicial set
    Mon Mar 25 15:45:33 2024 Finding Nearest Neighbors
    Mon Mar 25 15:45:34 2024 Finished Nearest Neighbor Search
    Mon Mar 25 15:45:34 2024 Construct embedding
    Epochs completed: 0%| 0/500 [00:00]completed 0 / 500 epochs
    Epochs completed: 0%| ▎ 1/500 [00:00]completed 50 / 500 epochs
    Epochs completed: 19%| █████████████████████████████████▋ 94/500 [00:00]completed 100 / 500 epochs
    completed 150 / 500 epochs
    Epochs completed: 37%| ██████████████████████████████████████████████████████████████████▏ 186/500 [00:00]completed 200 / 500 epochs
    completed 250 / 500 epochs
    Epochs completed: 55%| ██████████████████████████████████████████████████████████████████████████████████████████████████▎ 276/500 [00:01]completed 300 / 500 epochs
    completed 350 / 500 epochs
    Epochs completed: 73%| ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ 367/500 [00:01]completed 400 / 500 epochs
    completed 450 / 500 epochs
    Epochs completed: 100%| ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 500/500 [00:01]
    Mon Mar 25 15:45:36 2024 Finished embedding
    2024-03-25 15:45:36,318 - BERTopic - Dimensionality - Completed ✓
    2024-03-25 15:45:36,318 - BERTopic - Cluster - Start clustering the reduced embeddings
    2024-03-25 15:45:36,343 - BERTopic - Cluster - Completed ✓
    2024-03-25 15:45:36,346 - BERTopic - Representation - Extracting topics from clusters using representation models.
    2024-03-25 15:45:36,454 - BERTopic - Representation - Completed ✓
    No plotting options selected. Visualizing all topics, documents, and terms.
    /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/spectral.py:521: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.
    eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
    Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.
    An error occurred. Please try again.
    Would you like to see the error trace? (y/n): y
    Traceback (most recent call last):
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 162, in visualize_topics
    topic_viz = model.visualize_topics()
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/bertopic.py", line 2249, in visualize_topics
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/plotting/topics.py", line 79, in visualize_topics
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 2887, in fit_transform
    self.fit(X, y, force_all_finite)
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 2780, in fit
    self.embedding
    , aux_data = self.fit_embed_data(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 2826, in fit_embed_data
    return simplicial_set_embedding(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/umap
    .py", line 1106, in simplicial_set_embedding
    embedding = spectral_layout(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/spectral.py", line 304, in spectral_layout
    return _spectral_layout(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/umap_learn-0.5.5-py3.9.egg/umap/spectral.py", line 521, in _spectral_layout
    eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1608, in eigsh
    raise TypeError("Cannot use scipy.linalg.eigh for sparse A with "
    TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Fine Tune : Menus

Build out the fine tuning menus for the CLI

From landing :
2. Fine Tune a LLM
->

  1. Upload Model Weights -> enter filepath
  2. Upload Data -> enter filepath
  3. Specify System prompt -> enter system prompt
  4. Specify Training and Validation -> Menu
    -> 1. Learning Rate
    2. Num Epochs
    3. Output Dir
    4. ?
  5. Commence Fine Tuning

N-Gram range in json file must be turned into a tuple

Traceback (most recent call last):
File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 46, in run
self._process_responses(self.landing, self.driver)
File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 64, in _process_responses
self._process_responses(response, driver)
File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 67, in _process_responses
driver.run_topic_model(from_file=True)
File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 72, in run_topic_model
topics, _ = model.fit_transform(self.session.data)
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 433, in fit_transform
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 3778, in _extract_topics
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 3977, in _c_tf_idf
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/feature_extraction/text.py", line 1340, in fit
self.fit_transform(raw_documents)
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/base.py", line 1467, in wrapper
estimator._validate_params()
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/base.py", line 666, in _validate_params
validate_parameter_constraints(
File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'ngram_range' parameter of CountVectorizer must be an instance of 'tuple'. Got [1, 2] instead.

ValueError: The number of observations cannot be determined on an empty distance matrix.

python main.py tests/test_data/data.csv

Welcome to the NLLP CLI!
Loaded data from tests/test_data/data.csv

  1. Run a Topic Model
  2. Run an Optimization routine for a Topic Model (GPU reccomended)
  3. Run a Classification Model
  4. Load Global Configuration Files
  5. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}]}
  6. Select LLM to generate Embeddings
  7. Select Dimensionality Reduction Technique
  8. Select Clustering Technique
  9. Fine Tuning
  10. Plotting
  11. Save Session Configuration
  12. Run Topic Model
  13. Load Session Configuration
  14. Back
  15. Exit
    Choose an option: 1
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}]}
  16. all-MiniLM-L6-v2
  17. all-MiniLM-L12-v2
  18. multi-qa-MiniLM-L6-cos-v1
  19. all-mpnet-base-v2
  20. Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit
  21. Muennighoff/SGPT-125M-weightedmean-nli-bitfit
  22. Muennighoff/SGPT-1.3B-weightedmean-msmarco-specb-bitfit
  23. Add huggingface Model
  24. Back
  25. Exit
    Choose an option: 5
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}]}
  26. Select LLM to generate Embeddings
  27. Select Dimensionality Reduction Technique
  28. Select Clustering Technique
  29. Fine Tuning
  30. Plotting
  31. Save Session Configuration
  32. Run Topic Model
  33. Load Session Configuration
  34. Back
  35. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}]}
  36. UMAP
  37. PCA
  38. t-SNE
  39. Truncated SVD
  40. Factor Analysis
  41. Back
  42. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}]}
  43. Select LLM to generate Embeddings
  44. Select Dimensionality Reduction Technique
  45. Select Clustering Technique
  46. Fine Tuning
  47. Plotting
  48. Save Session Configuration
  49. Run Topic Model
  50. Load Session Configuration
  51. Back
  52. Exit
    Choose an option: 3
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Clustering'}]}
  53. hdbscan
  54. kmeans
  55. spectral clustering
  56. dbscan
  57. agglomerative clustering
  58. birch
  59. affinity propagation
  60. mean shift
  61. Back
  62. Exit
    Choose an option: 3
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Clustering'}, {'Clustering': 'spectral clustering'}]}
  63. Select LLM to generate Embeddings
  64. Select Dimensionality Reduction Technique
  65. Select Clustering Technique
  66. Fine Tuning
  67. Plotting
  68. Save Session Configuration
  69. Run Topic Model
  70. Load Session Configuration
  71. Back
  72. Exit
    Choose an option: 4
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Clustering'}, {'Clustering': 'spectral clustering'}, {'Topic': 'Fine Tuning'}]}
  73. Enable 2-grams
  74. Enable 3-grams
  75. Ignore Words
  76. Enable BM25 weighting
  77. Reduce frequent words
  78. Enable KeyBERT algorithm
  79. Enable ZeroShotClassification
  80. Enable Maximal Marginal Relevance
  81. Enable Part of Speech filtering
  82. Back
  83. Exit
    Choose an option: 2
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Clustering'}, {'Clustering': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 3-grams'}]}
  84. Select LLM to generate Embeddings
  85. Select Dimensionality Reduction Technique
  86. Select Clustering Technique
  87. Fine Tuning
  88. Plotting
  89. Save Session Configuration
  90. Run Topic Model
  91. Load Session Configuration
  92. Back
  93. Exit
    Choose an option: 7
    Building topic model from logs
    [{'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}]
    [{'Dimensionality Reduction': 'PCA'}]
    [{'Clustering': 'spectral clustering'}]
    {}
    {}
    {'errors': [], 'info': ['Initialized Global Session Object and Global Driver', 'Initialized Landing Menu', 'Initialized Embeddings Menu', 'Initialized Dimensionality Reduction Menu', 'Initialized Clustering Menu', 'Initialized Fine Tuning Menu', 'Initialized Plotting Menu', 'Initialized ConfigMenu Menu', 'Initialized Topic Menu', 'Initialized ConfigMenu Menu'], 'data': [{'Landing': 'Topic'}, {'Topic': 'Embeddings'}, {'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}, {'Topic': 'Dimensionality Reduction'}, {'Dimensionality Reduction': 'PCA'}, {'Topic': 'Clustering'}, {'Clustering': 'spectral clustering'}, {'Topic': 'Fine Tuning'}, {'Fine Tuning': 'Enable 3-grams'}, {'Topic': 'BERTopic(calculate_probabilities=False, ctfidf_model=ClassTfidfTransformer(...), embedding_model=SentenceTransformer(...), hdbscan_model=SpectralClustering(...), language=None, low_memory=False, min_topic_size=10, n_gram_range=(1, 1), nr_topics=None, representation_model=None, seed_topic_list=None, top_n_words=10, umap_model=PCA(...), vectorizer_model=CountVectorizer(...), verbose=False, zeroshot_min_similarity=0.7, zeroshot_topic_list=None)'}]}
    Building topic model from logs
    [{'Embeddings': 'Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit'}]
    [{'Dimensionality Reduction': 'PCA'}]
    [{'Clustering': 'spectral clustering'}]
    {}
    {}
    SentenceTransformer(
    (0): Transformer({'max_seq_length': 300, 'do_lower_case': False}) with Transformer model: GPTNeoModel
    (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False, 'include_prompt': True})
    )
    PCA()
    SpectralClustering()
    /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/numpy/core/_methods.py:176: RuntimeWarning: overflow encountered in multiply
    x = um.multiply(x, x, out=x)
    /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/utils/extmath.py:208: RuntimeWarning: overflow encountered in matmul
    ret = a @ b
    /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/metrics/pairwise.py:383: RuntimeWarning: invalid value encountered in add
    distances += XX
    /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scikit_learn-1.4.1.post1-py3.9-macosx-11.1-arm64.egg/sklearn/cluster/_kmeans.py:704: RuntimeWarning: overflow encountered in square
    lloyd_iter(
    /Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/numpy/core/_methods.py:49: RuntimeWarning: overflow encountered in reduce
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)
    [[-3.37943465e-01 6.29494131e-01 -2.42340565e-01 -1.21323444e-01
    6.43052876e-01 1.01818740e+00 1.51308015e-01 -5.05418062e-01
    3.90750527e-01 -8.81013811e-01 6.41503096e-01 1.00391102e+00
    -5.65844953e-01 5.61737001e-01 -8.23109388e-01 -1.97309121e-01
    1.73583314e-01 -3.83237422e-01 4.92538244e-01 -7.57770061e-01
    -1.37655810e-01 8.14577520e-01 5.41032970e-01 -6.14629328e-01
    -1.14306025e-02 1.35825467e+00 -7.58565366e-01 -9.36706781e-01
    3.01892400e-01 1.70184040e+00 1.37070215e+00 -8.28083456e-01
    -1.17205095e+00 9.42910135e-01 -5.10190070e-01 1.00378203e+00
    3.77682686e-01 -2.76817656e+00 5.09495795e-01 -2.88637459e-01
    -1.28382492e+00 -2.95686543e-01 9.23165083e-02 -1.57267833e+00
    7.27139652e-01 3.96969587e-01 2.08877230e+00 8.09231937e-01
    -1.44847140e-01 1.07452214e+00 4.58445966e-01 7.32348502e-01
    3.16043317e-01 -2.94261038e-01 7.70114601e-01 1.63893417e-01
    -8.42540383e-01 -6.49182558e-01 1.44444421e-01 6.59512058e-02
    -1.55640483e-01 4.84877795e-01 -4.29784209e-01 -1.85507834e-01
    2.31727795e-03 -2.71137834e-01 8.63851845e-01 2.18423262e-01
    -6.16317153e-01 7.75537789e-02 -9.91273880e-01 -1.83185749e-02
    6.47128224e-01 2.77608186e-01 2.95994788e-01 1.12721466e-01
    2.52481490e-01 -1.05544746e+00 -1.29722631e+00 2.27479115e-01
    -6.74743056e-01 1.00208335e-02 6.91715837e-01 3.10045749e-01
    1.50649194e-02 4.14506227e-01 1.23148608e+00 1.46657184e-01
    1.93784893e-01 1.07864931e-01 -1.42837167e+00 -2.32765228e-01
    9.64416444e-01 5.56423247e-01 2.25867301e-01 -3.91246527e-02
    6.96675241e-01 -9.61884201e-01 -1.09361261e-01 3.42048436e-01
    1.13335729e+00 1.14738417e+00 3.02453727e-01 -9.78551388e-01
    -1.32836297e-01 3.10879558e-01 8.63382041e-01 -6.38108909e-01
    1.28739759e-01 5.18443048e-01 -4.01991576e-01 1.57989949e-01
    1.38126984e-01 3.22906494e-01 1.00935206e-01 6.62976027e-01
    7.77774692e-01 -1.45812500e+00 -2.47111663e-01 4.61435676e-01
    -9.72481966e-01 6.87723398e-01 -8.21399212e-01 7.46142805e-01
    3.24629009e-01 2.66669303e-01 1.91775823e+00 -7.23790586e-01
    2.66863137e-01 -2.17597391e-02 -4.44849998e-01 1.81813985e-01
    5.80988407e-01 -2.58265465e-01 1.80867165e-01 1.46584070e+00
    -2.23950133e-01 5.62133491e-01 2.90566008e-04 -8.51132989e-01
    -2.43258402e-01 5.12126684e-02 -1.67548895e-01 3.04725349e-01
    -7.13925660e-01 7.88617134e-01 2.15596890e+00 4.18819785e-01
    -4.97516900e-01 -7.08875835e-01 4.01430666e-01 -1.28385320e-01
    8.71903479e-01 -1.47732460e+00 -5.96004963e-01 5.78400567e-02
    -1.39094904e-01 9.80079249e-02 -7.34725893e-01 2.32699350e-01
    -1.64560780e-01 1.55819929e+00 -8.20916653e-01 9.76328313e-01
    -1.11916518e+00 1.22750592e+00 7.43890822e-01 -8.39753687e-01
    -1.10286854e-01 1.82059184e-01 -1.12587428e+00 -2.73650318e-01
    6.73122287e-01 -4.47719246e-01 2.97807604e-01 2.56420672e-01
    8.62642527e-01 -6.03688002e-01 -3.38254198e-02 2.08692551e-02
    -4.42939818e-01 -1.22754565e-02 5.06591201e-01 -9.98921171e-02
    1.23989308e+00 -2.60250235e+00 -3.84024680e-01 2.67864287e-01
    2.01749384e-01 -1.04009068e+00 -9.21131730e-01 -5.04072070e-01
    6.17250681e-01 4.49983716e-01 3.66548568e-01 -1.81106284e-01
    -7.63888136e-02 -1.08568788e+00 4.58486497e-01 1.27838421e+00
    -6.50748610e-02 1.39156699e-01 -1.70831800e+00 2.60781258e-01
    -3.00476104e-01 4.19166297e-01 -2.34760642e-01 -1.49472266e-01
    2.05067053e-01 1.40029895e+00 1.31475359e-01 6.72907352e-01
    -5.56384146e-01 1.32434392e+00 1.93755880e-01 6.55524880e-02
    2.81185818e+00 9.47750449e-01 -1.77618399e-01 -4.08562750e-01
    -1.48725733e-01 2.32389688e-01 -6.01434350e-01 7.52474368e-02
    5.96738279e-01 3.17690074e-01 -3.82173471e-02 -7.15790987e-01
    -2.84574572e-02 1.12515122e-01 9.36254025e-01 -9.55303311e-01
    -6.13571882e-01 6.50005519e-01 -2.84423679e-01 7.60944009e-01
    9.00086820e-01 -2.79017508e-01 -1.20299911e+00 -3.30405706e-03
    3.40526640e-01 5.95904469e-01 -4.56937522e-01 -1.36522424e+00
    -4.88013357e-01 4.35721368e-01 -6.21253192e-01 -5.46013653e-01
    4.09269243e-01 2.18451118e+00 1.13539791e+00 1.63425946e+00
    -1.68452740e-01 4.06841815e-01 6.71297021e-04 2.60694599e+00
    -2.09918201e-01 -6.14889443e-01 -1.15703273e+00 -6.52586222e-01
    -4.29258138e-01 6.46545351e-01 -8.05244327e-01 5.34235835e-01
    -1.48764104e-01 -1.44551158e-01 -6.67386711e-01 -4.78590131e-01
    -2.15693563e-01 -5.21787524e-01 4.29671019e-01 4.13086116e-01
    9.09409642e-01 -2.47707352e-01 -9.77410316e-01 -4.25094128e-01
    -3.35184485e-01 5.52541137e-01 -1.08747208e+00 -1.52805734e+00
    -7.12055922e-01 -5.85885406e-01 -5.61130047e-01 2.65131623e-01
    -1.49403799e+00 -6.31830096e-02 1.53296277e-01 -5.31115115e-01
    4.87351805e-01 -1.97646320e+00 5.18457830e-01 2.46920988e-01
    -2.92548358e-01 -4.11337674e-01 -4.24177527e-01 5.08536577e-01
    -3.13080812e+00 -3.44313914e-04 5.01981154e-02 -5.27625717e-02
    6.72170967e-02 1.10187933e-01 -6.24061048e-01 -4.11194772e-01
    -3.15123987e+00 -3.74887824e-01 4.08594251e-01 -4.49939817e-01
    -1.19413488e-01 -2.50189286e-02 9.37682986e-01 -3.93960595e-01
    -5.27336597e-01 5.51487267e-01 -1.47688806e+00 7.12489903e-01
    6.90142691e-01 -2.40683034e-02 5.49990296e-01 -6.90340877e-01
    -4.41329539e-01 -5.65167844e-01 -1.86065510e-01 -1.57932997e+00
    -1.29620755e+00 -1.15663409e+00 1.33154774e+00 -1.21553689e-01
    -5.46818316e-01 6.91201806e-01 -1.45055562e-01 -9.98109281e-01
    -6.43511653e-01 -9.70657706e-01 2.40699232e-01 -3.73212188e-01
    -3.02014768e-01 6.99740291e-01 4.91380185e-01 1.84379733e+00
    -1.96821487e+00 -7.71244586e-01 3.58918488e-01 -1.34294465e-01
    -2.84716785e-01 -4.95457977e-01 9.27896500e-02 2.30687410e-02
    -5.92639506e-01 -3.22450697e-01 1.82965386e+00 -1.83327883e-01
    -2.71054476e-01 -1.46929219e-01 1.02496839e+00 1.72983184e-01
    -3.84793758e-01 9.56017911e-01 1.62617409e+00 -2.91777372e-01
    -9.84682560e-01 3.24638337e-01 -2.73915499e-01 -8.71294200e-01
    2.82114446e-01 6.22368991e-01 -1.39403665e+00 5.40873289e-01
    9.61468279e-01 1.46700692e+00 -4.32328045e-01 2.84664810e-01
    6.13873936e-02 -8.56770277e-01 6.16494596e-01 -4.39931482e-01
    -3.05552334e-01 4.21034038e-01 -8.03260803e-02 -5.54818690e-01
    7.34049559e-01 -8.67500961e-01 1.48211122e+00 2.32802331e-01
    1.90908265e+00 -7.59900928e-01 1.14120507e+00 -1.00547004e+00
    -2.81067342e-01 5.72543383e-01 2.30269122e+00 -3.88862997e-01
    -6.74269259e-01 1.69545904e-01 -1.00190067e+00 -3.50272655e-01
    3.66039760e-02 3.19608361e-01 -1.27267733e-01 1.08831191e+00
    -2.68272996e-01 -5.57655632e-01 3.49523008e-01 2.79507756e-01
    5.06069481e-01 2.18722537e-01 -1.11982548e+00 4.10205126e-01
    2.04713464e-01 1.73169756e+00 -2.36251041e-01 1.18155468e+00
    -1.71682701e-01 -2.48745516e-01 -9.65398729e-01 -1.53930521e+00
    -1.98730096e-01 -3.00291181e-01 -1.55742121e+00 1.00151837e+00
    4.90132362e-01 -4.15347385e+00 4.14509296e-01 -2.03903109e-01
    -5.32471061e-01 5.51049598e-02 1.61208296e+00 -1.88612640e-02
    1.89928269e+00 -4.53855217e-01 1.72608519e+00 7.65823066e-01
    -1.34455657e+00 2.22323518e-02 5.81889451e-01 5.28319240e-01
    7.12977171e-01 5.66961467e-01 -6.27630413e-01 3.27288032e-01
    5.16720772e-01 7.42399693e-01 -1.08582425e+00 1.28099695e-01
    -1.03752188e-01 1.77149236e-01 -2.14935288e-01 -5.95093071e-01
    1.25597697e-02 1.58918634e-01 -5.40666103e-01 -1.71444714e-01
    -1.05586278e+00 -3.10373098e-01 9.52655196e-01 -9.13583338e-01
    3.87289464e-01 2.70682722e-01 5.94775736e-01 -7.87819922e-01
    -6.45827651e-01 -3.89958096e+00 2.78211117e-01 6.95817471e-01
    -1.25178635e+00 3.62283438e-01 -2.13176265e-01 9.93891716e-01
    -5.36469877e-01 8.89246047e-01 1.55409193e+00 9.32397544e-02
    2.78432083e+00 -1.15548871e-01 -5.67032576e-01 -1.80769414e-01
    3.77191566e-02 1.37392461e-01 9.26756579e-03 -3.77971768e-01
    -5.18062294e-01 2.95534164e-01 9.00202751e-01 -9.81250703e-02
    -4.90664452e-01 -1.40208796e-01 5.52160561e-01 1.75584763e-01
    3.70164007e-01 2.05595821e-01 -4.31409717e-01 6.33025169e-01
    -1.34640300e+00 -8.60677183e-01 -1.01965058e+00 2.62665600e-01
    -1.16671526e+00 4.30600691e+00 -1.36300170e+00 5.16966224e-01
    8.36010695e-01 1.25814962e+00 -1.31524226e-03 -5.27025104e-01
    -4.38752413e-01 -5.98696768e-01 2.55219609e-01 3.48926671e-02
    -4.16281968e-02 -7.11234212e-01 -1.01182199e+00 -2.80551910e-02
    -8.02766979e-01 -7.94797912e-02 2.63770550e-01 -2.14467227e-01
    9.30322766e-01 -9.06426981e-02 1.78003818e-01 6.56476378e-01
    -5.48458546e-02 -9.77396592e-02 7.26030648e-01 3.12412009e-02
    2.29005776e-02 9.66184735e-01 1.16580677e+00 -4.00092959e-01
    9.15391147e-01 -4.55992997e-01 2.50750452e-01 5.93621247e-02
    1.11350957e-02 -3.10403442e+00 -7.69121170e-01 -8.41681898e-01
    -2.47106060e-01 3.22103441e-01 1.07813966e+00 -4.12613928e-01
    1.75055206e+00 -5.71600080e-01 -2.08311424e-01 -1.13739550e+00
    -1.05675590e+00 6.40082002e-01 -6.50879085e-01 2.21636224e+00
    -5.86179018e-01 -7.32955635e-01 8.49182606e-01 3.40307117e-01
    -4.56918389e-01 1.35852158e+00 -7.71059275e-01 -6.38076663e-01
    -1.86105773e-01 6.91289485e-01 3.15849304e-01 7.27778524e-02
    -3.20127457e-01 -1.27278537e-01 -2.89274365e-01 -7.63292909e-01
    7.96829760e-01 5.93696713e-01 -9.13206860e-02 8.42064321e-01
    7.25407004e-01 -1.41559243e-01 -1.80914730e-01 1.21023929e+00
    -7.70465255e-01 2.78777659e-01 -5.27730882e-02 -5.03107131e-01
    5.02018809e-01 1.41955554e+00 8.34496140e-01 -1.27388239e-02
    -5.69947004e-01 5.94920576e-01 -1.74724042e-01 3.77857596e-01
    -5.68922639e-01 4.19139951e-01 -2.75706917e-01 -2.06479669e+00
    -2.23343790e-01 -5.38141690e-02 5.25683403e-01 2.30987400e-01
    -3.73130053e-01 4.67138700e-02 -6.82258070e-01 -2.31083974e-01
    9.24109697e-01 -6.34425223e-01 9.61616576e-01 5.32897472e-01
    -1.79584175e-02 9.23209667e-01 -3.94392103e-01 1.20633090e+00
    1.32039154e+00 3.03081483e-01 -1.51912645e-02 -1.77623904e+00
    4.18116003e-01 -2.10219717e+00 6.15286757e-04 7.99640954e-01
    -1.07328737e+00 1.80457222e+00 7.37768650e-01 -1.15914035e+00
    -7.96180725e-01 -7.61470050e-02 1.54384780e+00 5.14905155e-01
    -1.32012308e-01 4.20175254e-01 2.79083550e-01 -5.51756144e-01
    -2.34146923e-01 1.25614572e+00 -8.02416503e-01 -9.12095845e-01
    5.18799877e+00 -6.72910452e-01 6.14131168e-02 -6.04476929e-01
    -3.21294874e-01 -7.15059638e-01 5.86848378e-01 2.51936764e-01
    -1.86807543e-01 -4.13822412e-01 3.95015717e-01 -1.56982198e-01
    3.78910631e-01 -9.70112920e-01 6.13179266e-01 8.74606192e-01
    -6.58938736e-02 -1.07379782e+00 8.43066573e-01 -5.74609995e+00
    3.51057231e-01 6.06372952e-01 6.31575108e-01 -3.03595424e-01
    9.40038741e-01 -1.34424233e+00 7.05159903e-01 4.72247034e-01
    -4.63819265e-01 -3.58555257e-01 8.11325908e-02 -7.79408589e-03
    -1.36327311e-01 4.69275832e-01 -9.06477869e-02 2.82235622e-01
    -1.39438808e-02 8.52530822e-02 -7.92428136e-01 1.95214546e+00
    1.10460863e-01 3.25053483e-01 -1.52235591e+00 -6.81127071e-01
    7.67830312e-01 4.95087147e-01 6.59111798e-01 1.09067392e+00
    -1.12956035e+00 -8.00565064e-01 -4.25624430e-01 -5.12296379e-01
    5.28731763e-01 -1.54080415e+00 -8.80116582e-01 -5.65894604e-01
    1.02661349e-01 -1.43948078e-01 1.44971955e+00 2.25885734e-01
    -6.27577126e-01 2.04492360e-02 -8.21592152e-01 -4.02381010e-02
    -1.70298055e-01 1.25701702e+00 1.82263041e+00 -2.24724621e-01
    -6.77974403e-01 5.09839416e-01 -6.83805525e-01 -7.74089754e-01
    -1.23348010e+00 9.38402653e-01 4.10215348e-01 5.46513379e-01
    2.89907515e-01 -1.64224899e+00 -1.20345187e+00 6.94046378e-01
    2.86890179e-01 -9.67657864e-02 -1.36825785e-01 -2.64689350e+00
    -1.95882730e-02 -5.44935241e-02 -4.26379107e-02 -4.52003181e-02
    8.81192446e-01 -3.59131962e-01 -1.51897177e-01 -1.61933827e+00
    -3.04078013e-01 3.46597433e-01 -2.91673827e+00 6.55099034e-01
    -1.19330800e+00 6.50656343e-01 2.26371270e-02 -7.35934734e-01
    8.46882701e-01 -7.09249198e-01 2.75138587e-01 -1.30410930e-02
    3.49485800e-02 4.49981928e-01 -2.02950910e-01 -5.25047898e-01
    -1.07650745e+00 3.80161516e-02 -7.72672296e-01 9.88527298e-01
    1.03450787e+00 -1.41643775e+00 1.51979113e-02 -1.09312572e-01
    -4.71795678e-01 -3.75048727e-01 -1.00997055e+00 -9.34393764e-01
    9.58709270e-02 1.56782940e-01 -2.84477144e-01 -2.10036799e-01
    -1.11573124e+00 -6.27978221e-02 -8.79191816e-01 7.11585879e-02
    -1.23872411e+00 -9.56774727e-02 -1.51974425e-01 1.41490436e+00
    -4.88654822e-01 2.88828284e-01 -4.45558220e-01 7.34821975e-01
    -2.20754370e-01 5.46759248e-01 -5.12399197e-01 5.31018198e-01]]
    No plotting options selected. Visualizing all topics, documents, and terms.
    An error occurred. Please try again.
    Would you like to see the error trace? (y/n): y
    Traceback (most recent call last):
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 47, in run
    self._process_responses(self.landing, self.driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 65, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 65, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 72, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 65, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 72, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 65, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 72, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 65, in _process_responses
    self._process_responses(response, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 72, in _process_responses
    self._process_responses(menu.parent, driver)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/_lnlpcli.py", line 70, in _process_responses
    driver.run_topic_model()
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 128, in run_topic_model
    self._visualize_topics(model, dir)
    File "/Users/user/Desktop/Spring_2024/Research/lnlp/src/drivers/_driver.py", line 135, in _visualize_topics
    hierarchical_topics = model.hierarchical_topics(docs=self.session.data)
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 980, in hierarchical_topics
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/bertopic-0.16.0-py3.9.egg/bertopic/_bertopic.py", line 972, in
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1033, in linkage
    n = int(distance.num_obs_y(y))
    File "/Users/user/anaconda3/envs/nllp/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2657, in num_obs_y
    raise ValueError("The number of observations cannot be determined on "
    ValueError: The number of observations cannot be determined on an empty distance matrix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.