Skip to main content TerryFunggg Blog

AI Note: Ditched RAG and Go to Grep

You can read this My AI note system . I mention that I was doing notes reached for a vector database. I had ChromaDB set up before. But after a while I ripped it out and replaced it with grep.

Why I dropped RAG

Running a vector database, keeping an embedding model warm, indexing every note on write — that’s a lot to ask from a $5 VPS. The thing would just fall over. I want this tool to be lightweight enough that it runs comfortably on the smallest server as possible so RAG was not that. I also just wanted simplicity.

And for research, I found out using “Grep” it’s actually what a lot of AI companies doing. Tools like Cursor, GPT, copilot, they all have tools doing code/doc files search to use grep.

How the current grep search works

Step 1 — Keyword extraction

Before grepping anything, we will tokenize the query:

python code snippet start

words = re.split(r'[^\w]+', query)

python code snippet end

It tokenises on non-word characters, lowercases everything, drops stop words and then runs a simple suffix stemmer on each word.

The stemmer handles the basics:

  • runningrun (strips ing)
  • notednote (strips ed)
  • notesnote (strips s)

In fact, currenly it stemmer keyword was hardcoded, I think we can impove this give this to LLM parse it, I think more better.

Step 2 — Scanning and scoring

This is key part and interesting part. We can a lot approach to do this in better ways.

We loops over every .md file in the notes directory sorted newest a and scores each one:

code snippet start

score = (body_hits / word_count * 100) + title_hits + tag_hits

code snippet end

The weights:

Field Weight
Title keyword hit 5× per keyword
Tag keyword hit 5× per keyword
Body keyword hit 1× per occurrence, normalised by word count

The body normalization is important — without it, a long note with 10 casual mentions of a word would beat a short note that’s actually about that topic. Dividing by word count and scaling by 100 keeps things comparable across note sizes.

Noted: Files that score 0 are skipped entirely.

Step 3 — Matched_on explanation

Each result gets a matched_on field that explains why it was returned — something like "tags: python, body (4 hits)" or "title". I pass this to the LLM so it understands the context of each result. Without it the LLM would sometimes doubt results that matched on tags and look for something else unnecessarily.

Step 4 — Snippet extraction

For each matched note, it tries to show a useful preview. It scans the body for the first keyword hit and returns 300 characters starting 60 characters before that hit, so you get a bit of context around the match. Falls back to the start of the note if nothing is found.

Then top n results (default 5) sorted by score descending go back to the LLM as JSON.

What’s actually pretty good about this approach

  • Zero dependencies for the search itself. No running services, no index files to manage, no sync issues.
  • Transparent — I can read the code and understand exactly why a note was returned.
  • Fast enough — for a personal note collection (I have ~200 notes) it’s basically instant.

What I’d improve

1. Build a proper indexing

Right now every search scans every file. That’s fine at 200 notes but at 20000 it’ll start to feel slow.(Actually I don’t think I can note that much HAHA So probably fine.)

2. Better stemming

As I said, I am hard coding some stemmer. So some vocab maynot hit. In the future, we can imporve this by given LLM to handle it.

3. Phrase matching

Right now keywords are matched independently. If I search "machine learning", the system looks for machine and learning separately. A note that mentions machine in one paragraph and learning in another would score the same as one that talks about ML throughout.

4. Typo tolerance

Type "pytohn" and you get nothing. Fuzzy matching with something like wrod distance approach for the keyword lookup would help. It’s a nice-to-have.

5. Smarter snippet selection

The current snippet just grabs 300 chars around the first keyword hit. That works but it could be smarter — score each paragraph by keyword density and show the most relevant passage, not just the first match. Kind of like how Google shows excerpts.

6. Cache the last N results

Good to have. Search results for the same query don’t need to be recomputed every time.