Comments (19)
Let me do a dry run on some of the tests in Lucene.Net.Tests._A-D to get a sense of the time requirement for this. Lucene.Net has 7735 tests, which is no small amount. I need some sense of the time it's gonna take before I can know what I can sign up for. I'll report back here after I review some and see what level of effort is required.
from lucenenet.
I will begin working on the Lucene.Net.Tests.Facet
tests.
from lucenenet.
I'll review Lucene.Net.Tests.Expressions
from lucenenet.
That's great. However, IMO it might make sense to hold off on reviewing Lucene.Net
tests until after we have converted the exceptions and catch blocks in #446.
If you have already started, just continue to the end of the tests in that assembly, but we certainly should wait on the index reader/writer tests until after this is done. There may also be issues that are resolved by #446 in codecs.
from lucenenet.
@paulirwin Super cool. I'm also very interested in LLMs and RAGs. Well, welcome back. Lucene and the community have missed you! :-)
from lucenenet.
@paulirwin - Thanks for your help on this.
This issue is certainly a good place to start after being inactive for awhile. But do note we have set up a Slack channel #lucenenet-dev where you can review some (more) of the recent activity or discuss other aspects of the project that are off-topic for a specific GitHub issue. The dev mailing list works, also, but I find it is easier to share code on Slack.
from lucenenet.
The first paragraph and the 2nd paragraph don't seem quite congruent. Is the primary issue here that some lines of the tests may be commented out because the functionality didn't exist at the time the test was ported. If so, then I suggest the course of action is to visually inspect all tests for the projects listed to ensure no needed lines of code are commented out. If any commented lines are found then uncomment them and run the test to ensure it still passes. If that approach is sufficient for this issue then I'm willing to do that work and this issue can be assigned to me.
from lucenenet.
I have updated the text of the original task to make it clearer. I have also broken down the task by test project. If necessary, we can further break it down by namespace to divide up the work between multiple people.
Let me know if you are still on board with doing (at least some of) this task.
from lucenenet.
During #411 I found 2 API issues and 4 bugs (places where the code diverged from Java or were converted incorrectly to .NET). I also timed my progress and it took an average of about 3 minutes per test to do a line-by-line review. For someone less familiar with the project, that number can probably be adjusted to 3.5-4 minutes per test for an estimate of the remaining projects.
from lucenenet.
@NightOwl888 thanks, that's excellent info. I think I will learn in the test reviewing process. I see that you can check off the test set you reviewed in the list contained in this issue. That's cool.
from lucenenet.
I will begin working on the Lucene.Net.Tests.Join
tests.
from lucenenet.
I will begin working on the Lucene.Net.Tests.Queries
tests.
from lucenenet.
I'll work on Lucene.Net.Tests._A-D
from lucenenet.
@rclabo - #446 has been merged, so we can resume reviewing Lucene.Net (core) tests.
from lucenenet.
Thanks for the nudge. Also THANK YOU for the huge work you did in reviewing all the exceptions and the catch clauses to make sure they are faithfully following the behavior of Java Lucene. That was a huge amount of super important work.
from lucenenet.
I have begun reviewing the core T-Z tests. So far I've made it through Automaton and it was all good apart from some very minor formatting that I'll PR once complete.
- Util.Automaton
- Util.Fst
- Util.Packed
- Util B-TestF
- Util TestI-TestQ
- Util TestR-TestW
Update 1/12: Reviewed Util.Fst, also mostly a few minor formatting changes
Update 1/12: Reviewed Util.Packed, same
Update 1/15: Reviewed files with names starting with B-TestF in Util
Update 1/17: Reviewed through TestQ in Util
Update 1/18: Finished last segment of tests, PR coming soon
from lucenenet.
@paulirwin Thank you Paul, much appreciated! This is such a great project, so I get excited when I see others jumping in to help. Thank you!
from lucenenet.
@rclabo haha I'm jumping in again after a very long absence. My colleagues and I had done a lot of work to finish getting the core of 4.8 ported about 10 years ago, then another group came along with a branch and that ended up being the current work we see today. We used my branch to create an app that supported ~1B document indexes under heavy load in production, so it was pretty rock solid at the time, but was stuck on .NET Framework so I'm happy to see the state of things today and being able to build this on arm64 macOS in .NET 6! I haven't had a need for Lucene.net in a while so I have been pretty quiet the last several years. Now, though, I have a project that is a local emulator for Azure Search which uses 4.8 beta, as well as I'm getting into RAG with LLMs so I'd love to see if I can help move the project along to be able to use the new vector database fields in modern Java Lucene.
The tests we ported at the time can be referenced if desired: https://github.com/paulirwin/lucene.net/tree/branch_4x
from lucenenet.
T-Z done and merged, moving backwards to the J-S monster (1,158 tests):
- Search/Payloads
- Search/Similarities
- Search/Spans
- Search/B-TestB
- Search/TestC
- Search/TestD-TestE
- Search/TestF-TestL
- Search/TestM-TestN
- Search/TestP-TestQ
- Search/TestR-TestSh
- Search/TestSi-TestSu
- Search/TestT-TestW
- Store/TestB-TestH
- Store/TestL-TestW
- Support/Codecs
- Support/Diagnostics
- Support/Document/Extensions
- Support/IO
- Support/Threading
- Support/Util
- Support
from lucenenet.
Related Issues (20)
- Support for Ordering of Indexing with SeqNo HOT 4
- Random Query Parser Error HOT 1
- The type initializer for "Lucene.Net.Diagnostics.Debugging" threw an exception HOT 1
- Scarce Documentation for OpenNLP Integration HOT 10
- Add a link and info about the Lucene.NET Slack channel HOT 4
- Investigate Failing Test: Lucene.Net.Index.TestIndexWriterOnJRECrash::TestNRTThreads_Mem()
- Investigate Failing Test: Lucene.Net.Analysis.Miscellaneous.TestStemmerOverrideFilter::TestRandomRealisticWhiteSpace() HOT 1
- Task: Finish [SuppressTempFileChecks] attribute functionality
- Failure when parsing phrases HOT 3
- Alternative for SetNextReader to return all strings HOT 1
- Docs: DocFx Build Failure for API Docs HOT 4
- Lucene.Net: 4.8 SetNextReader executes repeatedly and returns only one result HOT 1
- Replace Lucene.Net.Support.Arrays.Empty<T> with System.Array.Empty<T>
- Audit use of AtomicInt32 and AtomicInt64 methods
- Improve ICollector usage
- Simplify IndexReader constructor
- Meta: Add Support unit tests HOT 1
- Review formatting of boolean strings (in ToString() methods and similar)
- Add cancellation support to IndexSearcher
- Fix test name reporting when test is in a base class
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lucenenet.