Git Product home page Git Product logo

whatsnext's Introduction

CityIO_CityMatrix_Reader

Display the CityMatrix throught TCP/IP protocol.

CityTable

A hello world example that enable to create particle (red, green, and blue) and to add attractor and detractor. This was a hello world to test the usage of colortizer.

Abstract City

A generic representation of the CityMatrix with different rendering from a .json file.

AndorraDemo

A template to show what could be displayed on the Andorra Data Observatory.

whatsnext's People

Watchers

 avatar  avatar

Forkers

gintow

whatsnext's Issues

parenthesis in strings to find prevent analex from working

Unbalanced parenthesis cause analex not to find a string. For example, the string 'otto von schirach (liv' cannot be found.

Error appears at least in function getTagContainingAllStrings in analexUtilities.

To fix: replace parenthesis and stuff by escaped caracters in both source file and scrap strings.

Handle Style

So far I have created this file styleConversion.json
buit not sure how to deal with this in scrapex

I guess it shoudl be around this part of the code

 eventInfo.eventStyle = venue.hasOwnProperty('defaultStyle')?venue.defaultStyle:globalDefaultStyle;

but I would prefer to check this with regex

make a GUI

A GUI would be useful in order to:

  • deal with places, cities, countries and languages
  • analyse a web source
  • scrap a websource

Remove duplicates events from different sources

We have to define the process to remove duplicates.

For places for which scrapex scraps the dedicated site (eg: Transbordeur), we can assume that events are fully and correctly listed there. Unless some other sites provide better info, I propose to skip events from other sources: eg, events from Petit Bulletin for Transbordeur will be skipped.

For places with no dedicated site scrapped, some priority order and way to process the events should be defined.

PopUp window update on IOS

problem on ios when a popup is already open in a window and i try to open a new one it will only update the content of the already open window instead of opening a new window (which is more intuitive for the user)

@tnguyenh is it happening on Android?

Differentiate venue by its style

Implement a way to easily differentiate the type of places with icon especailly for projet like Hanoi where there are not only music venue but also art, painting, theatre etc.

This information will have to be placed in the venues file from the beginning (on top of its GPS location and url)
Screenshot 2024-03-11 at 05 40 31

Style dedection improvement

This is so far the repartition of style that we have (60% of the event are live... which doesn't mean anything, 10% are rock, 7% electro, 3% jazz etc)

Screenshot 2024-05-06 at 16 55 27

If the info is not here the info is not here , but how can we infer it or at least for it more depending on the venue
By looking at it I discover the Style |Concert which seems to be only for le Sucre like this one "https://le-sucre.eu/agenda/high-lo-presente-8ruki/" is there a way to get better the style? At least maybe Electro as it's le sucre

Make an App

Work in progress and note around te creation of a mobile app for WhatsNext.

It seems that React Native can be the option to port the Javascript code into an IOS and Android App without having to change to much stuff in the browser version (and ideally keeping both working in parrallel). Some famous app like (Facebook,Walmart,Bloomberg,SoundCloud or Instagram) are built using the tech so it's a good sign to choose also this one.

App like Airbnb,Twitter,Lyft,Test Center,SlideShare or developped using Swift (dedicated language for IOS (iphone, ipad, iwatch etc)

Some useful link:
https://radixweb.com/blog/react-native-vs-swift

Replace generic link with the place link

We realized that having the link to le Petit Bulletin or Infoconcert is at then end not useful at all it because it doesn't give any more information of what we have so far (name, time , style) (ans it also shoes our sources...)

So what about (if easy to do) putting instead the link of the place? At least the user can go to the event page just by clicking few link. As it is you have the link to le petit bulletin but if you want to know more about the event you have to google it

Default style in venues broken

I am trying to put a default style to theatre for Espace Gerson like this

{
    "name": "Espace Gerson",
    "country": "France",
    "city": "Lyon",
    "defaultStyle": "Theatre"
  }

but it doesn't work maybe it needs other scrapping information? DefaultStyle is working for terminal for instance but there is more info in the srapping

Generate an extra filed with a better date in the scraped results

Is it possible for Regex to generate a more readable date string in order to generate a better UI in the final app

Currently we have this
Screenshot 2024-02-26 at 12 14 15

At the end we could almost have only the hour as the date is the current day (maybe for now we keep something like dd/mm - hh:mm?)

going worldwide

Code is not yet ready for different places/countries, there will be some bugs.

It would be nice to add venues from different cities/countries.

Cannot run Reparatorex

/Users/arno/Projects/GitHub/WhatsNext/scraping/reparatorex.js:31
return v.hasOwnProperty('linkedPage');
^

TypeError: Cannot read properties of undefined (reading 'hasOwnProperty')
at hasLinkedPage (/Users/arno/Projects/GitHub/WhatsNext/scraping/reparatorex.js:31:14)
at /Users/arno/Projects/GitHub/WhatsNext/scraping/reparatorex.js:36:40
at Array.filter ()
at Object. (/Users/arno/Projects/GitHub/WhatsNext/scraping/reparatorex.js:36:27)
at Module._compile (node:internal/modules/cjs/loader:1378:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1437:10)
at Module.load (node:internal/modules/cjs/loader:1212:32)
at Module._load (node:internal/modules/cjs/loader:1028:12)
at Function.executeUserEntryPoint [as runMain]

Deal with several event at the same place the same day

It happens now a lot that the same places proposes 2 event on the same day, right now only one dot is displayed with the text of the first (tatina in Paris plays at 7pm) event and when you click on it it shoes the second event (Trio Grande plays at 21pm) .

We need to find a way to deal with this, is this something that could be done at the regex level?
Screenshot 2024-03-29 at 11 02 53

Start time and end time

Most event have a starting time. Some have an ending time. We have to define how to handle this.

I propose to keep only the start time, in order to reduce work. We can keep the event displayed something like 8 hours after the start time.

Some venues have uncorrect time settings. For example, events at Terminal Club start at 0:00 between feb 2 and feb 3, but are indicated at 0:00 on feb 2. We can in this case indicate 23:59 on feb 2, using a default start time.

keep history of scrapping + change .csv to .json

Scrapping should not completely change the results:

  • Events that have been already scrapped should not be changed.
  • New info from scrapping should be merged with existing events.
  • old events should be deleted.

In order to do so, using a json file instead of csv is more appropriate, as we can keep more information and in a more flexible way.

Place to replace

CCO- La Rayonne, CCO La Rayonne => La Rayonne
Toï Toï le zinc,Toï Toï le Zinc => Toï Toï Le Zinc
Épicerie Moderne=> Epicerie Moderne
Jack Jack - MJC Aragon => Jack Jack
Bar Rock'n eat, Rock n Eat=> Rock N Eat
Amphithéâtre - Salle 3000=> Salle 3000
Grrrnd Zero => Grrrnd Zéro

Hour of event

How to deal with the date when event has no date so far it seems to be push to today's date but we shoudl change this

Error on analex

Only one venue should correspond to the arguments. Use the optional arguments to ensure that the venue is unique
/Users/arno/Projects/GitHub/WhatsNext/scraping/import/stringUtilities.js:23
const res = removeBlanks(removeSpecialCharacters(removeAccents(string.toLowerCase()),' '));
^

TypeError: Cannot read properties of undefined (reading 'toLowerCase')
at simplify (/Users/arno/Projects/GitHub/WhatsNext/scraping/import/stringUtilities.js:23:73)
at /Users/arno/Projects/GitHub/WhatsNext/scraping/analex.js:22:79
at Array.filter ()
at Object. (/Users/arno/Projects/GitHub/WhatsNext/scraping/analex.js:22:39)
at Module._compile (node:internal/modules/cjs/loader:1368:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1426:10)
at Module.load (node:internal/modules/cjs/loader:1205:32)
at Module._load (node:internal/modules/cjs/loader:1021:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:142:12)
at node:internal/main/run_main_module:28:49

Aspiratorex better console output

As Aspiratorex is in theory maint to be run many times, like once a week per city, it would be nice (if possible) to have a better output (int the console) or in a log to know what has been working or what didn't. Maybe with an option to make it verbose or not

Right now I have to admit that I don't really know if the aspiration went well or not and if not what went wrong.

Extension to other countries/cities: venues must be identified by a unique ID

Venues must be identified with a unique identifier. The venue name is not appropriate, since several venues may have the same name around the world.

This identifier should be defined now, because if we change it later, it will be a lot of code to change.

Right now, each venue has fields name, city and country, which is sufficient for a unique identification. It does not require a lot of code change. However, the way to communicate the event place through the csv file should be discussed now.

Problem with some aliases (capital case and accent?)

  1. I might be worng but this aliases seems not to work
  {
    "name": "Epicerie Moderne",
    "country": "France",
    "city": "Lyon",
    "aliases": [
      "L'epicerie Moderne","L’Épicerie Moderne"
    ]
  },

I keep having this name L'epicerie Moderneinstead of Epicerie Moderne in screpaex.csv

In the same style

 {
    "name": "Bourse du Travail",
    "country": "France",
    "city": "Lyon",
    "aliases": [
      "Bourse Du Travail"
    ]
  }
{
    "name": "LDLC Arena",
    "country": "France",
    "city": "Lyon",
    "aliases": [
      "Ldlc Arena"
    ]
  }
 "name": "Le Périscope",
    "aliases": [
      "périscope", "Le Periscope"
    ],
 {
    "name": "Salle 3000",
    "country": "France",
    "city": "Lyon",
    "aliases": [
      "Amphithéâtre - Salle 3000", "Cite Internationale (Amphitheatre/Salle 3000/Auditorium Lumiere)"
    ]
  },

avoid horizontal scrolling on IOS

Right now on iPhone by default the user can horizaontally scrool the page which messup everything if the sldier is not well selected. We need to find a way to avoid this

@tnguyenh is it happening on Android? (just slide left or right on the lower side (not on the slider)

cancelled gigs

A first solution was implemented, but removed, since some non cancelled events had strings like "cancelled" or "reporté" in their description, since they may be replacement events.

First we should indentify where is the "cancelled" word in the description, in order to find an appropriate pattern. Please add cancelled events in this thread so I can analyze the patterns.

If no universal pattern can be found, we should discuss how to treat cancelled venues. A way to do so is to raise a flag in order to check manually if the event is cancelled or not.

Data doc for data

For information, currently this is the list of places that appears at least 3 times in the results of scrapex but that are not referenced or treated as valide places.

It needs to be step by step modified by hand either by adding the places if not existing in the following files:
https://github.com/agrignard/WhatsNext/blob/main/www/lyon_place.csv

or by editing the aliases of the places in this files https://github.com/agrignard/WhatsNext/blob/main/scraping/venues.json
and adding the corresponding aliases
exemple si Opéra National de Lyonapparait Opéra de Lyon il faut ajouter ça

{
    "name": "Opéra National de Lyon",
    "country": "France",
    "city": "Lyon",
    "aliases": [
      "Opéra de Lyon"
    ]
  },
Place Salle Planete Culture : 175 fois
(index):203 Place Le Boui Boui : 92 fois
(index):203 Place Espace Gerson : 88 fois
(index):203 Place Theatre A L'ouest : 78 fois
(index):203 Place Cafe Theatre Le Complexe : 19 fois
(index):203 Place Le Nombril Du Monde : 16 fois
(index):203 Place Theatre De La Renaissance : 9 fois
(index):203 Place Salle Paul Garcin : 7 fois
(index):203 Place Theatre Theo Argence : 7 fois
(index):203 Place Théâtre de la Renaissance : 6 fois
(index):203 Place Centre Culturel L'aqueduc : 6 fois
(index):203 Place La Sucriere : 6 fois
(index):203 Place Groupama Stadium Lyon : 6 fois
(index):203 Place Salle Des Rancy : 5 fois
(index):203 Place Theatre De Venissieux (La Machinerie) : 5 fois
(index):203 Place L'intervalle : 5 fois
(index):203 Place Centre Culturel : 5 fois
(index):203 Place Centre Charlie Chaplin : 5 fois
(index):203 Place Les Grandes Locos : 5 fois
(index):203 Place Rita-Plage : 4 fois
(index):203 Place Vache Rouge : 4 fois
(index):203 Place L'agend'arts : 4 fois
(index):203 Place Salle Edouard Herriot : 4 fois
(index):203 Place Theatres Romains De Fourviere : 4 fois
(index):203 Place Theatre Comedie Odeon : 4 fois
(index):203 Place Le Repaire de la Comédie : 3 fois
(index):203 Place Théâtre Théo Argence : 3 fois
(index):203 Place Musée des Confluences : 3 fois
(index):203 Place Institut Lumière : 3 fois
(index):203 Place Théâtre Cinéma Jean Carmet : 3 fois
(index):203 Place Les Grandes Voisines : 3 fois
(index):203 Place O Totem Live : 3 fois
(index):203 Place Maison Du Peuple De Pierre Benite : 3 fois
(index):203 Place Espace Jean Poperen : 3 fois
(index):203 Place Chapelle De La Trinite : 3 fois
(index):203 Place Espace Culturel L'atrium : 3 fois
(index):203 Place Le Briscope : 3 fois
(index):203 Place Auditorium - Orchestre National De Lyon : 3 fois
(index):203 Place Theatre De Villefranche Sur Saone : 3 fois
(index):203 Place Heat (h7) : 3 fois
(index):203 Place Parc Naturel De Miribel Jonage : 3 fois
  1. Le Boui Boui is represented a lot because there are lots of event that play for long time e.g https://www.infoconcert.com/ticket/concert-felix-le-braz-lyon/1642663.html
    scrapex is doing well is job to create on event per day, however do we want to keep it? It seems that it's more like theater stuff? in general do we want to treat differently the event that are recurent?

  2. Planète Culture a lots of event (175) but more or less always the same name, do we add it?

handle extra events

Some events are undefined because of extra tags (empty tags ?) in some pages. Need to be removed in a generalist way.

RegEx Scraping

Right now there is some missing regex to finalize terminal

To run the script

cd scraping

Then run one of this line
node terminal.js

Terminal: Another error

Error while loading a linked page do not generate an error message: scrapex cannot parse the date

When using aspiratorex, some linked page seem to be correctly downloaded, however there content is an error message.

For Ville Morte: event bistro-cine-une-ile-et-une-nuit seemed to be correctly downloaded. But the content is:

"https://villemorte.fr/agenda/bistro-cine-une-ile-et-une-nuit/": "<body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at [email protected] to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body>"

Since no error was raised during aspiratorex, the user cannot be informed that an error occured. However, scrapex cannot find the informations needed. The error log is the following:

Date:
Event:
Place:
Style:
DetailedStyle:
source: Ville Morte
errorLog: Format de date invalide pour Ville Morte. Reçu "", converti en "" (attendu "dd-MM-HH:mm")
unixDate: 0
https://villemorte.fr/agenda/bistro-cine-une-ile-et-une-nuit/

The good point is that there is a workaround: knowing that, deleting the entry bistro-truc-sa-mere from linkedPages.json forces aspiratorex to redownload the file, which can be the scrapped.

The difficulty is to find a safe way to know that the download has failed when running aspiratorex and correct this automatically. We could check for keyword "error", however, if an event contains the word "error", aspiratorex may not be able to download it. A dirty solution would be to try downloading 3 times, and keep the last file in any case.

Give the possibility to define aliased style for place website scrap

The alias style seems to work for generic site like petit bulletin, info concert but not for scrapped place

e.g lClubbing is well transformed to Electro for all the event from petit bulletin info concert etc but not if it's scrapped from le Sucre:

Le Sucre;S.society x Dave Clarke 𝐩𝐫𝐞𝐬𝐞𝐧𝐭 𝐀𝐫𝐜𝐡𝐢𝐯𝐞 𝐎𝐧𝐞 𝐓𝐨𝐮𝐫 Dave Clarke NƵM 99;1707670800000;100;;Clubbing;https://le-sucre.eu/agenda/s-society-62/;11 fév. 18:00 — 00:00

vs

Le Sucre;Lumbago + Nicolas Lutz;1708293600000;100;Electro;Clubbing;https://www.petit-bulletin.fr/lyon/agenda-308833-lumbago-nicolas-lutz.html;Dimanche 18 février 2024 à 23h

Detect cancelled/reported events

Currently it is difficult to detect reported/cancelled events:

  • keywords may not appear in the scrapped tags. If the keyword is elsewhere, maybe it's not related to the event (for example if a button links to another cancelled event);
  • maybe there would be a band or event called 'annulé';
  • if the keyword only appears in the linked page, and the linked page is not aspiratorexed, we might miss it.

Since those problems may be rare, the proposed solution is to:

  • detect a keyword in a list:

["annule","reporte"]

  • if detected, define an action to do (remove the event, or display it as cancelled)
  • raise a warning, so Data can manually check that the removal is not a mistake.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.