Comments (18)
I read up on sitemap submissions and it seems like this is okay to do. So I can add an option for it. Something like SitemapGenerator::Sitemap.create_index = false
. Seems pretty easy to do.
from sitemap_generator.
kjvarga,
The (or at least my) preferred way would be to default to no sitemap index, unless you have more than 50,000 pages, then it automatically creates an index (like dynamic_sitemaps)
Of course you could always add a SitemapGenerator::Sitemap.force_index = true
to keep the old format for users that prefer.
You could even allow users to specify less than 50,000 the pagination like dynamic_sitemaps does.
from sitemap_generator.
I'm not a fan of making this the default mainly because you would have to modify your robots.txt file to point to either the sitemap or the sitemap index depending on how many links you have. Most webmasters I think simply don't care if there is the extra file when they have < 50 000 urls and if you have dynamic numbers of links it's quite feasible to go back and forth below and above 50000 urls which means you would have to constantly change your robots.txt file. So for those reasons I would like to keep it as an option.
from sitemap_generator.
I guess I forgot about the very important aspect of legacy support. I was thinking that it would just be /sitemap.xml.gz as the sitemap if <= 50,000 pages and /sitemap.xml.gz would be the index if it was more than 50,000.
You could keep the _index.html as default for legacy support...
from sitemap_generator.
Heya,
It would be great to have this functionality:
SitemapGenerator::Sitemap.create_index = false
Thank you for creating such a great plugin, btw =).
Best,
- V
from sitemap_generator.
I also would like option to not create index files
from sitemap_generator.
Ok I'll get to this hopefully this week. I'm thinking:
SitemapGenerator::Sitemap.create_index = (:auto, :always, :never, true, false)
:always and true are the same thing. :auto means create as needed. :never suppresses index creation altogether. I'm just not decided on what false should be. I think it should be the same as :auto i.e. don't create until needed.
from sitemap_generator.
Sure, that works for me. Maybe all you really need is?
true - Forces an index (current / default)
false - Force no index. Even if it exceeds 50,000 pages.
auto - Create an index only if needed. Probably my description above where sitemap.xml becomes the index and sitemap1.xml becomes the first sitemap...
from sitemap_generator.
Just pinging here to see if there's any update...
Also, thanks for taking my PR on autoplay (and cleaning it up). All warnings gone for me...
from sitemap_generator.
Hey Eric, I've released the feature in v3.3. Thanks again for generously paying for this feature! I'm sure everyone that uses it will appreciate that. I've added a shout-out to you on the readme.
And I'm glad that you are enjoying using the gem!
The naming convention hasn't changed. So the first sitemap will be sitemap1.xml.gz. However, I could add a simple namer that will generate sitemap.xml.gz, sitemap1.xml.gz etc but I wanted to get the main stuff out first.
from sitemap_generator.
Soo... you're going to hate me. But now I'm going into client mode. This wasn't 100% how I envisioned and doesn't quite accomplish my goals due to the naming.
I need a static place that I can always point my robots / 302 (which I use on my root to point to where they are on my s3 bucket) to where I don't need to worry if I'm above or below 50k.
So I envisioned create_sitemap = :auto to create a "sitemap.xml.gz" for its first sitemap. If we get more than 50k, that sitemap.xml.gz would become the index and point to the sitemap[1-9]+.xml.gz and so on...
The problem with status quo is I still need to know where I'm pointing, /sitemap1.xml.gz or /sitemap_index.xml.gz
from sitemap_generator.
Damn, you had to go there :/ Yeah that bit is more tricky. But I'm glad that I broke them into separate chunks because they are conceptually different features. Thinking about it I really like the idea though. In fact I would like it to be the default for the next major version. That way you always point to sitemap.xml.gz, no matter how many links your site has.
I can see a way to do it, but it is tricky because of the way things happen now. The index has to be written out last. And currently each sitemap is written out as it becomes full so that it doesn't occupy memory. What would have to happen is we need to delay writing out the first sitemap until we know with certainty whether there will be two sitemaps (or more, but two is enough to trigger the index). Then there's the issue that the index and the sitemaps will share the same namer instance. So I'm thinking there needs to be another configuration option that indicates that this naming convention is being used (any ideas of what to call this convention? 'zero-based' or 'simple' is all that comes to mind). However, whether or not the special naming convention is being used it doesn't hurt to adopt the same logic for all cases, which simplifies things a bit. It just means that the first sitemap sits in memory for a teeny bit longer.
So I think the sitemap index will need to keep an object reference to the sitemaps added to it, at least the first one. When those sitemaps become full they are not automatically written out. They are 'finalized' and added to the index but not written out. When the index has more than one sitemap, it reserves the first name, assigns the other names to the added sitemaps and writes them out. From then on new sitemaps get names and are written out when full as per usual. Then when the index is finalized there needs to be some handling of the different values for create_index.
So there's keeping track of sitemaps added to the index and keeping track of whether a name has been reserved. As well as delaying writing out of some sitemaps, so some tracking of whether writing has occurred. And a new namer instance to handle the new sequence.
from sitemap_generator.
Okay, so this is sounding like a 4.0 release. I think :simple could work. Since it's looking like it's not happening until the next major release, you could also just make this the new default and put a massive bold warning in the readme. That way, doesn't matter what the name of the configuration option is. No one will ever be specifying it...
And semver clearly states you're allow to do this...
I guess just give me an estimate on when you think you'll be able to do this. And hope I didn't ruin your vacation...
from sitemap_generator.
Hey man, i've made some progress on this already. I think I'll be done in a week or so, haven't got too much on the go. I'll probably get out a prerelease and then we can see where we're at. Adding unit tests will take a bit longer.
from sitemap_generator.
Can't wait to see it. Just holding off on some of my multi-sitemaps until it's released. Thanks again!
from sitemap_generator.
Okay try out 4.0.alpha.
You'll need these two config settings:
SitemapGenerator::Sitemap.create_index = :auto
SitemapGenerator::Sitemap.sitemaps_namer = SitemapGenerator::Sitemap.sitemap_index_namer = SitemapGenerator::SimpleNamer.new(:sitemap)
From my testing it seems to work as it should! I've not made these the defaults yet, just want to get it out and get some more people trying it out.
from sitemap_generator.
It works! Beautifully!!
from sitemap_generator.
Hi guys, the 4.0 release is finally out! It was more work than I anticipated making this change official. I also want to feel confident that it was working correctly and that I had covered all the cases. I think I have and it feels like it does the intuitive thing.
So @ehoch you can remove the create_index
and namer lines from your config and things should be working the same for you now, since that is the new default behaviour. The thing I like most about this is that it's clearer for everyone: point search engines to sitemap.xml.gz in all cases. Doesn't matter how many links you have. And now it is even more standards compliant because that is the filename that search engines expect. So win-win.
Also the ability to turn off index creation opens up some opportunities for more complex setups where one wants to generate sitemaps without affecting the index. The index can even be built "manually" by adding links to it using add_to_index
and no sitemaps need to be written even. So one could separate the two tasks or do a hybrid setup.
from sitemap_generator.
Related Issues (20)
- Ping failed for Bing: #<OpenURI::HTTPError: 410 Gone> HOT 2
- Feature Request: Remove unwanted options HOT 6
- Is it possible to override the default link options? HOT 3
- How to remove default priority, lastmod and changefreq from all urls HOT 1
- GoogleStorageAdapter - Can't create sitemap on buckets with uniform bucket-level access enabled HOT 3
- Allow disable acl for S3 bucket HOT 1
- Rails on Heroku with S3 sitemap hosting - google didn't like it HOT 2
- 【Worked】Ping failed for Bing: #<OpenURI::HTTPError: 410 Gone> HOT 4
- Is it possible to add additional metadata fields to the xml HOT 5
- Feature request : Make `default_host` optional / Allow full URL instead of path HOT 2
- Entity escaping in XML output
- google search console couldn't fetch sitemap but index is ok HOT 7
- Google search correct ping url HOT 2
- Readme install instructions fail HOT 1
- [BUG] Mobile sitemap link in Readme.md is giving 404
- TimeZone is ignored
- sitemap.xml.gz file not found on server
- Testing with frozen time does not work
- should sitemap show up automatically in google search console? HOT 2
- Should we drop default `lastmod` of `Time.now` as Google says "it needs to consistently match reality"? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sitemap_generator.