Git Product home page Git Product logo

Comments (3)

Mathebibel avatar Mathebibel commented on July 17, 2024

After some testing I managed to identify the bug.

In the following minimal working examples (MWE) I will use the command cachekill -s 'my-project/**/!(*.html)' -t 'my-project/**/*.html'

MWE 1

.
└── my-project/
    ├── a.jpg
    ├── b-a.jpg
    └── index.html

with index.html is

<img src="a.jpg">
<img src="b-a.jpg">

This works.

MWE 2

.
└── my-project/
    ├── assets/
    │   └── a.jpg
    ├── a.jpg
    └── index.html

with index.html is

<img src="a.jpg">
<img src="assets/a.jpg">

This doesn't work. Both relative paths get the same hash.

Solution for MWE 2

In line 126-133

async function replaceReferences(sourceBases: SourceBases[],
                                         targets: string[]): Promise<void> {
          await replace.replaceInFile({
            from: sourceBases.map(obj => new RegExp(obj.base, 'g')),
            to: sourceBases.map(obj => obj.newBase),
            files: targets
          });
}

we can see that you search for obj.base and replace it with obj.newBase.

  • You should search for obj.path and replace it with obj.newPath.
  • You should start your search with the lowest element in the file hierarchy.

To illustrate the last point:

.
└── my-project/
    ├── assets/
    │   ├── assets/
    │   │   └── a.jpg
    │   └── a.jpg
    ├── a.jpg
    └── index.html

The search/replace order should be:

  1. my-project/assets/assets/a.jpg (This is the lowest element in the file hierarchy!)
  2. my-project/assets/a.jpg
  3. my-project/a.jpg

Additional information

Of course the .jpg files in the examples above have only the same name but they look different - otherwise they would generate the same hash.

MWE 3

.
└── my-project/
    ├── a-b.jpg
    ├── b.jpg
    └── index.html

with index.html is

<img src="b.jpg">
<img src="a-b.jpg">

This doesn't work. Both relative paths get the same hash.

Solution for MWE 3

With line 73

sources.sort().reverse();

you solve the issue

  // Make sure that source files are sorted in a reverse alphabetical order,
  // so files with common filename endings don't get wrongly replaced. E.g.:
  // ['file.js', 'other-file.js'] --> ['other-file.js', 'file.js'].

but it doesn't solve the issue

// ['file-other.js', 'other.js'] --> ['other.js', 'file-other.js'].

Idea 1

I think you can delete this line and solve the issue if you add to your regex in line 129

from: sourcePaths.map(obj => new RegExp(obj.path, 'g')),

the delimiters of the path.

For example instead of searching for other.js we search for "other.js" (and 'other.js' and (other.js) and [other.js]).

I am not very happy with my solution because there are more delimiters and sometimes there are no delimiters (if the file name is mentioned in plain text for example).

Idea 2

Let's look at the following example:

.
└── my-project/
    ├── a.jpg
    ├── a-b.jpg
    ├── b.jpg
    ├── b-a.jpg
    └── index.html

with

sources.sort().reverse();

we get this order

1. b-a.jpg
2. b.jpg
3. a-b.jpg
4. a.jpg

The problem is if we search for b.jpg then a-b.jpg would be a match.

  • Group the file names by their character number
  • Start search/replace with the group with the most characters
  1. Group: a-b.jpg and b-a.jpg
  2. Group: a.jpg and b.jpg
  • Delete .sort().reverse() (Not necessary anymore as soon as we process the group with the most characters first)

Summary

Fix for MWE 2

  • You should search for obj.path and replace it with obj.newPath.
  • You should start your search with the lowest element in the file hierarchy.

On each hierarchy level we have to....

Fix for MWE 3

  • Group the file names by their character number
  • Start search/replace with the group with the most characters
  • Delete .sort().reverse() (Not necessary anymore as soon as we process the group with the most characters first)

Test case

my-project/
  assets/
    assets/
      a.jpg
      a-b.jpg
      b.jpg
      b-a.jpg
    a.jpg
    a-b.jpg
    b.jpg
    b-a.jpg
  a.jpg
  a-b.jpg
  b.jpg
  b-a.jpg
  index.html

with index.html is

<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<img src="assets/a.jpg">
<img src="assets/a-b.jpg">
<img src="assets/b.jpg">
<img src="assets/b-a.jpg">
<img src="assets/assets/a.jpg">
<img src="assets/assets/a-b.jpg">
<img src="assets/assets/b.jpg">
<img src="assets/assets/b-a.jpg">

(Of course the .jpg files in the example above have only the same name but they look different - otherwise they would generate the same hash.)

from cachekill.

Mathebibel avatar Mathebibel commented on July 17, 2024

In my test case above I forgot that there are absolute and relative path on all hierarchy levels.

Test case (Updated)

my-project/
  assets/
    assets/
      a.jpg
      a-b.jpg
      b.jpg
      b-a.jpg
      index.html
    a.jpg
    a-b.jpg
    b.jpg
    b-a.jpg
    index.html
  a.jpg
  a-b.jpg
  b.jpg
  b-a.jpg
  index.html

with index.html is

<!-- Relative Path -->
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<img src="assets/a.jpg">
<img src="assets/a-b.jpg">
<img src="assets/b.jpg">
<img src="assets/b-a.jpg">
<img src="assets/assets/a.jpg">
<img src="assets/assets/a-b.jpg">
<img src="assets/assets/b.jpg">
<img src="assets/assets/b-a.jpg">
<!-- Absolute Path -->
<img src="/a.jpg">
<img src="/a-b.jpg">
<img src="/b.jpg">
<img src="/b-a.jpg">
<img src="/assets/a.jpg">
<img src="/assets/a-b.jpg">
<img src="/assets/b.jpg">
<img src="/assets/b-a.jpg">
<img src="/assets/assets/a.jpg">
<img src="/assets/assets/a-b.jpg">
<img src="/assets/assets/b.jpg">
<img src="/assets/assets/b-a.jpg">

with assets/index.html is

<!-- Relative Path -->
<img src="../a.jpg">
<img src="../a-b.jpg">
<img src="../b.jpg">
<img src="../b-a.jpg">
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<img src="assets/a.jpg">
<img src="assets/a-b.jpg">
<img src="assets/b.jpg">
<img src="assets/b-a.jpg">
<!-- Absolute Path -->
<img src="/a.jpg">
<img src="/a-b.jpg">
<img src="/b.jpg">
<img src="/b-a.jpg">
<img src="/assets/a.jpg">
<img src="/assets/a-b.jpg">
<img src="/assets/b.jpg">
<img src="/assets/b-a.jpg">
<img src="/assets/assets/a.jpg">
<img src="/assets/assets/a-b.jpg">
<img src="/assets/assets/b.jpg">
<img src="/assets/assets/b-a.jpg">

with assets/assets/index.html is

<!-- Relative Path -->
<img src="../../a.jpg">
<img src="../../a-b.jpg">
<img src="../../b.jpg">
<img src="../../b-a.jpg">
<img src="../a.jpg">
<img src="../a-b.jpg">
<img src="../b.jpg">
<img src="../b-a.jpg">
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<!-- Absolute Path -->
<img src="/a.jpg">
<img src="/a-b.jpg">
<img src="/b.jpg">
<img src="/b-a.jpg">
<img src="/assets/a.jpg">
<img src="/assets/a-b.jpg">
<img src="/assets/b.jpg">
<img src="/assets/b-a.jpg">
<img src="/assets/assets/a.jpg">
<img src="/assets/assets/a-b.jpg">
<img src="/assets/assets/b.jpg">
<img src="/assets/assets/b-a.jpg">

from cachekill.

eneko89 avatar eneko89 commented on July 17, 2024

Thanks a lot for the great explanation of the problem and solution proposals.

This will be prioritized in the following version I release. As you mention, sorting the array was not an integral solution to the problem. I already thought on the delimiters idea, but abandoned it for the same reason, it was more a dirty workaround than a solution.

Haven't had time to analyze your findings in depth, but grouping by character count and replacing starting with the longest to the shortest group looks like a good idea. Thanks again!

Of course, PRs are always welcome!

from cachekill.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.