Comments (3)
After some testing I managed to identify the bug.
In the following minimal working examples (MWE) I will use the command cachekill -s 'my-project/**/!(*.html)' -t 'my-project/**/*.html'
MWE 1
.
└── my-project/
├── a.jpg
├── b-a.jpg
└── index.html
with index.html
is
<img src="a.jpg">
<img src="b-a.jpg">
This works.
MWE 2
.
└── my-project/
├── assets/
│ └── a.jpg
├── a.jpg
└── index.html
with index.html
is
<img src="a.jpg">
<img src="assets/a.jpg">
This doesn't work. Both relative paths get the same hash.
Solution for MWE 2
In line 126-133
async function replaceReferences(sourceBases: SourceBases[],
targets: string[]): Promise<void> {
await replace.replaceInFile({
from: sourceBases.map(obj => new RegExp(obj.base, 'g')),
to: sourceBases.map(obj => obj.newBase),
files: targets
});
}
we can see that you search for obj.base
and replace it with obj.newBase
.
- You should search for
obj.path
and replace it withobj.newPath
. - You should start your search with the lowest element in the file hierarchy.
To illustrate the last point:
.
└── my-project/
├── assets/
│ ├── assets/
│ │ └── a.jpg
│ └── a.jpg
├── a.jpg
└── index.html
The search/replace order should be:
my-project/assets/assets/a.jpg
(This is the lowest element in the file hierarchy!)my-project/assets/a.jpg
my-project/a.jpg
Additional information
Of course the .jpg files in the examples above have only the same name but they look different - otherwise they would generate the same hash.
MWE 3
.
└── my-project/
├── a-b.jpg
├── b.jpg
└── index.html
with index.html
is
<img src="b.jpg">
<img src="a-b.jpg">
This doesn't work. Both relative paths get the same hash.
Solution for MWE 3
With line 73
sources.sort().reverse();
you solve the issue
// Make sure that source files are sorted in a reverse alphabetical order,
// so files with common filename endings don't get wrongly replaced. E.g.:
// ['file.js', 'other-file.js'] --> ['other-file.js', 'file.js'].
but it doesn't solve the issue
// ['file-other.js', 'other.js'] --> ['other.js', 'file-other.js'].
Idea 1
I think you can delete this line and solve the issue if you add to your regex in line 129
from: sourcePaths.map(obj => new RegExp(obj.path, 'g')),
the delimiters of the path.
For example instead of searching for other.js
we search for "other.js"
(and 'other.js'
and (other.js)
and [other.js]
).
I am not very happy with my solution because there are more delimiters and sometimes there are no delimiters (if the file name is mentioned in plain text for example).
Idea 2
Let's look at the following example:
.
└── my-project/
├── a.jpg
├── a-b.jpg
├── b.jpg
├── b-a.jpg
└── index.html
with
sources.sort().reverse();
we get this order
1. b-a.jpg
2. b.jpg
3. a-b.jpg
4. a.jpg
The problem is if we search for b.jpg
then a-b.jpg
would be a match.
- Group the file names by their character number
- Start search/replace with the group with the most characters
- Group:
a-b.jpg
andb-a.jpg
- Group:
a.jpg
andb.jpg
- Delete
.sort().reverse()
(Not necessary anymore as soon as we process the group with the most characters first)
Summary
Fix for MWE 2
- You should search for
obj.path
and replace it withobj.newPath
. - You should start your search with the lowest element in the file hierarchy.
On each hierarchy level we have to....
Fix for MWE 3
- Group the file names by their character number
- Start search/replace with the group with the most characters
- Delete
.sort().reverse()
(Not necessary anymore as soon as we process the group with the most characters first)
Test case
my-project/
assets/
assets/
a.jpg
a-b.jpg
b.jpg
b-a.jpg
a.jpg
a-b.jpg
b.jpg
b-a.jpg
a.jpg
a-b.jpg
b.jpg
b-a.jpg
index.html
with index.html
is
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<img src="assets/a.jpg">
<img src="assets/a-b.jpg">
<img src="assets/b.jpg">
<img src="assets/b-a.jpg">
<img src="assets/assets/a.jpg">
<img src="assets/assets/a-b.jpg">
<img src="assets/assets/b.jpg">
<img src="assets/assets/b-a.jpg">
(Of course the .jpg files in the example above have only the same name but they look different - otherwise they would generate the same hash.)
from cachekill.
In my test case above I forgot that there are absolute and relative path on all hierarchy levels.
Test case (Updated)
my-project/
assets/
assets/
a.jpg
a-b.jpg
b.jpg
b-a.jpg
index.html
a.jpg
a-b.jpg
b.jpg
b-a.jpg
index.html
a.jpg
a-b.jpg
b.jpg
b-a.jpg
index.html
with index.html
is
<!-- Relative Path -->
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<img src="assets/a.jpg">
<img src="assets/a-b.jpg">
<img src="assets/b.jpg">
<img src="assets/b-a.jpg">
<img src="assets/assets/a.jpg">
<img src="assets/assets/a-b.jpg">
<img src="assets/assets/b.jpg">
<img src="assets/assets/b-a.jpg">
<!-- Absolute Path -->
<img src="/a.jpg">
<img src="/a-b.jpg">
<img src="/b.jpg">
<img src="/b-a.jpg">
<img src="/assets/a.jpg">
<img src="/assets/a-b.jpg">
<img src="/assets/b.jpg">
<img src="/assets/b-a.jpg">
<img src="/assets/assets/a.jpg">
<img src="/assets/assets/a-b.jpg">
<img src="/assets/assets/b.jpg">
<img src="/assets/assets/b-a.jpg">
with assets/index.html
is
<!-- Relative Path -->
<img src="../a.jpg">
<img src="../a-b.jpg">
<img src="../b.jpg">
<img src="../b-a.jpg">
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<img src="assets/a.jpg">
<img src="assets/a-b.jpg">
<img src="assets/b.jpg">
<img src="assets/b-a.jpg">
<!-- Absolute Path -->
<img src="/a.jpg">
<img src="/a-b.jpg">
<img src="/b.jpg">
<img src="/b-a.jpg">
<img src="/assets/a.jpg">
<img src="/assets/a-b.jpg">
<img src="/assets/b.jpg">
<img src="/assets/b-a.jpg">
<img src="/assets/assets/a.jpg">
<img src="/assets/assets/a-b.jpg">
<img src="/assets/assets/b.jpg">
<img src="/assets/assets/b-a.jpg">
with assets/assets/index.html
is
<!-- Relative Path -->
<img src="../../a.jpg">
<img src="../../a-b.jpg">
<img src="../../b.jpg">
<img src="../../b-a.jpg">
<img src="../a.jpg">
<img src="../a-b.jpg">
<img src="../b.jpg">
<img src="../b-a.jpg">
<img src="a.jpg">
<img src="a-b.jpg">
<img src="b.jpg">
<img src="b-a.jpg">
<!-- Absolute Path -->
<img src="/a.jpg">
<img src="/a-b.jpg">
<img src="/b.jpg">
<img src="/b-a.jpg">
<img src="/assets/a.jpg">
<img src="/assets/a-b.jpg">
<img src="/assets/b.jpg">
<img src="/assets/b-a.jpg">
<img src="/assets/assets/a.jpg">
<img src="/assets/assets/a-b.jpg">
<img src="/assets/assets/b.jpg">
<img src="/assets/assets/b-a.jpg">
from cachekill.
Thanks a lot for the great explanation of the problem and solution proposals.
This will be prioritized in the following version I release. As you mention, sorting the array was not an integral solution to the problem. I already thought on the delimiters idea, but abandoned it for the same reason, it was more a dirty workaround than a solution.
Haven't had time to analyze your findings in depth, but grouping by character count and replacing starting with the longest to the shortest group looks like a good idea. Thanks again!
Of course, PRs are always welcome!
from cachekill.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cachekill.