Git Product home page Git Product logo

repeated_phrases's People

Contributors

fiveham avatar

Watchers

 avatar

repeated_phrases's Issues

multi-directional information access (FileBox, PhraseBox) can be organized more sensibly

Currently, various different pieces of information are connected but have to be connected forcibly so that efficient access from one piece to its connected pieces requires the construction of, for example, Maps from pieces of information of the first type to a list of those of the second type.

Access from one piece of connected information to all pieces connected to it can be organized as a graph where each node can report a list (or lists) of the pieces of information to which it connects.

FileBox maps from Chapter (via filename) to a list of Quotes (phrase-at-Location).
PhraseBox maps from raw phrase (text) to a list of Locations.

Example graph of connections.

Lazily-created data (in DataManager) is not recreated when different inputs are supplied

If a new Trail, for instance, is specified, linked chapters, for example, linked with a previously used Trail, are returned, saved, etc., instead of that being done with chapters modified to use the data of the new Trail.

Managed data should be lazily created after two checks:

  1. If the member is null, or
  2. if the inputs for the previous cached result are different than the current inputs

Currently, only the first check is used.

Operation LINK_CHAPTERS writeTo is incorrect

LINK_CHAPTERS says it writes to Folder.LINKED_CHAPTERS but the method that does the operation actually overwrites the original html files from Folder.HTML_CHAPTERS rather than writing to a purpose-specific folder.

Either

  • the method used by LINK_CHAPTERS should write its output to a purpose-specific folder, or
  • the writeTo folder for LINK_CHAPTERS should be changed to HTML_CHAPTERS, in which case,
    ** the readFrom for SET_TRAIL should be set to HTML_CHAPTERS

The term "phrase instance" when referring to quotes from the test should be renamed

For example, the concept of a phrase paired with a location could be called a Quote, while a phrase independent of a location is simply "phrase". It may be a good idea to call the un-located phrase something other than "phrase" since the purpose behind this renaming is to resolve ambiguity surrounding uses of the word "phrase".

Need to get Chapter objects from chapters' filenames in PhraseBox, FileBox, and LinkChapters

This could be resolved by

  • integrating the operation steps into one another so that data such as a collection of Chapters can be passed from one operation to the next, or
  • integrating the operations and a RepeatedPhrasesApp instance into each other so that data such as a collection of Chapters can be passed from the RepeatedPhrasesApp instance to each operation as the operation demands

HtmlBooks determine which book's chapterization process to use via an if-else cascade.

Should use an ifless technique instead, such as having a member for each HtmlBook that's assigned in the constructor, which has one of its methods called which chapterizes an arbitrary HtmlBook according to the member's own chapterization rules.

For example

enum Chapterizer{
    AS_NOVEL( /*process for a novel*/ ), 
    AS_NOVELLA( /*process for non-PQ novellas*/ ), 
    AS_PQ( /*process for PQ*/ );
    private final Function<HtmlBook, Collection<HtmlChapter>> func;
    private Chapterizer(Function<...> func){
        this.func = func;
    }

    static Chapterizer get(HtmlBook book){
        /* chooses the right enum element for a book */
    }
    
    static Collection<HtmlChapter> split(HtmlBook h){
        return func.apply(h);
    }
}

public class HtmlBook{
    private final Chapterizer chapterizer;
    public HtmlBook(...){
        ...
        this.chapterizer = Chapterizer.get(this);
    }
    ....
    public Collection<HtmlChapter> splitChapters(...){
        return chapterizer.split(this);
    }
}

PhraseProducer can track multiple phrase end-points and thus crawl a Chapter only once for all phrase sizes

Only the largest phrase size will need to find its next position after each batch of words gets returned. When endpoints update, the value for the longest phrase moves on to be the endpoint for the second longest phrase, and so the data cascades down.

A map from size to endpoint will be needed. An array or an ArrayList could do it well.

The code that uses PhraseProducer would also need to be able to handle getting all the phrases at once like that.

Need a command-line interface

Ideally it should be possible to call RepeatedPhrasesApp from the command line with some options in order to get results, rather than requiring the use of the GUI.

In addition to script-friendly uses of options in main's args, it'd be nice to have a human-friendly interactive text menu.

Parameters separated by commas should also feature whitespace after each separating comma

This includes but is not limited to method headers, method calls, parameter types, varargs, array construction, and parameters in lambda expressions.

Wrong Right
Map<String,Integer> Map<String, Integer>
map.put("",0"); map.put("", 0);
public void apply(Integer i,Integer j) public void apply(Integer i, Integer j)
System.out.printf(formatString, i,j,k,l,m,n); System.out.printf(formatString, i, j, k, l, m, n);
String[] s = new String[]{"-a",file,"--under","5"}; String[] s = new String[]{"-a", file, "--under", "5"};
(a,b) -> a + b * a (a, b) -> a + b * a

the use of messages is inconsistent

When this project was first given a GUI, messages which had originally been sent only to the console (System.out) were instead sent to a Consumer which was System.out::println when running the program from the command line and which was a setText method for a JLabel when running the GUI.

Since then, the use of messages has declined, leaving many formerly message-rich behaviors devoid of messages, with the corresponding code not even being given (and not even needing to be given) access to a Consumer for that purpose. However, some code has retained references to a message-handler. This situation is inconsistent and should be made consistent.

Either

  • all uses of a message-handler should be removed, or
  • all methods that could reasonably need to report on their status to the user should use a message-handler

What the program does can be expressed more clearly as a graph operation

Initially each Location connects to multiple phrases (except the final Location of any Chapter, which only connects to one phrase). The operations of removing unique phrases, and removing repeated phrases that are subsumed by a larger repeated phrase are simply removing edges that link a Location with a phrase. These operations are complete once each Location connects to at most one phrase.

By generating a runtime graph representing these connections and operating on that graph, the nature of what the program does may be a lot more clear.

Binary operators must have whitespace padding on either side

Wrong Right
a=b a = b
if(a==b) if(a == b)
if(a!=b) if(a != b)
a<b a < b
a>b a > b
a<=b a <= b
a>=b a >= b
a&b a & b
a&&b a && b
a|b a | b
a||b a || b
a^b a ^ b
a+b a + b
a-b a - b
a*b a * b
a/b a / b
a%b a % b
a<<1 a << 1
a>>1 a >> 1
a>>>1 a >>> 1
a&=b a &= b
a|=b a |= b
a^=b a ^= b
a+=b a += b
a-=b a -= b
a*=b a *= b
a/=b a /= b
a%=b a %= b
a<<=1 a <<= 1
a>>=1 a >>= 1
a>>>=1 a >>>= 1

DataManager does not properly encapsulate data

Currently, DataManager keeps all the bulk data separate from RepeatedPhrasesApp, which uses the data, but the idea behind DataManager is still not yet realized.

DataManager was supposed to restrict access to members that need to be lazily created when first needed so that the only way to access those members would be through a method that can intelligently create or recreate the data if it does not exist or pertains to different requirements.

While DataManager prevents code in any other class from having the option to access the initialized-null members directly, DataManager itself has direct access to those members since they are part of the class, which requires DataManager to be written more carefully so that the data is only accessed through intelligent methods.

Ideally, there should be a separate manager for each lazily-created member that RepeatedPhrasesApp needs:

  • a manager for html chapters,
  • a manager for Chapter objects,
  • a manager for anchor data, and
  • a manager for linked chapters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.