AI Note: Ditched RAG and Go to Grep
You can read this My AI note system . I mention that I was doing notes reached for a vector database. I had ChromaDB set up before. But after a while I ripped it out and replaced it with grep.
Why I dropped RAG
Running a vector database, keeping an embedding model warm, indexing every note on write — that’s a lot to ask from a $5 VPS. The thing would just fall over. I want this tool to be lightweight enough that it runs comfortably on the smallest server as possible so RAG was not that. I also just wanted simplicity.
And for research, I found out using “Grep” it’s actually what a lot of AI companies doing. Tools like Cursor, GPT, copilot, they all have tools doing code/doc files search to use grep.
How the current grep search works
Step 1 — Keyword extraction
Before grepping anything, we will tokenize the query:
python code snippet start
words = re.split(r'[^\w]+', query)python code snippet end
It tokenises on non-word characters, lowercases everything, drops stop words and then runs a simple suffix stemmer on each word.
The stemmer handles the basics:
running→run(stripsing)noted→note(stripsed)notes→note(stripss)
In fact, currenly it stemmer keyword was hardcoded, I think we can impove this give this to LLM parse it, I think more better.
Step 2 — Scanning and scoring
This is key part and interesting part. We can a lot approach to do this in better ways.
We loops over every .md file in the notes directory sorted newest a and scores each one:
code snippet start
score = (body_hits / word_count * 100) + title_hits + tag_hits
code snippet end
The weights:
| Field | Weight |
|---|---|
| Title keyword hit | 5× per keyword |
| Tag keyword hit | 5× per keyword |
| Body keyword hit | 1× per occurrence, normalised by word count |
The body normalization is important — without it, a long note with 10 casual mentions of a word would beat a short note that’s actually about that topic. Dividing by word count and scaling by 100 keeps things comparable across note sizes.
Noted: Files that score 0 are skipped entirely.
Step 3 — Matched_on explanation
Each result gets a matched_on field that explains why it was returned — something like "tags: python, body (4 hits)" or "title". I pass this to the LLM so it understands the context of each result. Without it the LLM would sometimes doubt results that matched on tags and look for something else unnecessarily.
Step 4 — Snippet extraction
For each matched note, it tries to show a useful preview. It scans the body for the first keyword hit and returns 300 characters starting 60 characters before that hit, so you get a bit of context around the match. Falls back to the start of the note if nothing is found.
Then top n results (default 5) sorted by score descending go back to the LLM as JSON.
What’s actually pretty good about this approach
- Zero dependencies for the search itself. No running services, no index files to manage, no sync issues.
- Transparent — I can read the code and understand exactly why a note was returned.
- Fast enough — for a personal note collection (I have ~200 notes) it’s basically instant.
What I’d improve
1. Build a proper indexing
Right now every search scans every file. That’s fine at 200 notes but at 20000 it’ll start to feel slow.(Actually I don’t think I can note that much HAHA So probably fine.)
2. Better stemming
As I said, I am hard coding some stemmer. So some vocab maynot hit. In the future, we can imporve this by given LLM to handle it.
3. Phrase matching
Right now keywords are matched independently. If I search "machine learning", the system looks for machine and learning separately. A note that mentions machine in one paragraph and learning in another would score the same as one that talks about ML throughout.
4. Typo tolerance
Type "pytohn" and you get nothing. Fuzzy matching with something like wrod distance approach for the keyword lookup would help. It’s a nice-to-have.
5. Smarter snippet selection
The current snippet just grabs 300 chars around the first keyword hit. That works but it could be smarter — score each paragraph by keyword density and show the most relevant passage, not just the first match. Kind of like how Google shows excerpts.
6. Cache the last N results
Good to have. Search results for the same query don’t need to be recomputed every time.