Idea: Crawlee-friendly “docs → clean Markdown for RAG” pipeline (example + config) #3302
Replies: 2 comments 1 reply
-
|
Hi @HEDELKA and thank you for using Crawlee! We'll be happy to accept a PR with such an example. The location depends on the format of the article, but both a blog post (https://github.com/apify/crawlee/tree/master/website/blog/2025) or a documentation guide (https://github.com/apify/crawlee/tree/master/docs/guides) are viable options. If you're going to go end-to-end from crawling to integration with LangChain/..., perhaps a blog post would be the best fit. Would you mind joining us at Discord and going over the details with @souravjain540? |
Beta Was this translation helpful? Give feedback.
-
|
@HEDELKA i am waiting for your reply on discord :) cheers! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Crawlee community!
I’m building a docs-focused scraper that outputs clean Markdown for RAG pipelines (navigation stripped, code blocks preserved), with metadata + optional chunking.
Proposal: publish a Crawlee-friendly example pipeline:
Questions:
I can share a minimal working config and a small sample output to validate the approach.
Beta Was this translation helpful? Give feedback.
All reactions