Universal web content extraction — any URL to LLM-ready markdown. HTML, YouTube, PDF, DOCX.
Universal web content extraction — any URL to LLM-ready markdown. HTML — BeautifulSoup + content density filtering (removes nav, sidebar, ads) YouTube — transcript extraction with timestamps PDF — text extraction with page structure DOCX — paragraph and heading extraction Auto-fallback — tries lightweight httpx first, falls back to Playwright for JS-heavy pages Async-first — built on httpx and…
Verification confirms publisher identity (repo ownership), not code safety. The security scan covers known CVEs and suspicious install scripts — it cannot prove the absence of malicious code.
Universal web content extraction — any URL to LLM-ready markdown. HTML — BeautifulSoup + content density filtering (removes nav, sidebar, ads) YouTube — transcript extraction with timestamps PDF — text extraction with page structure DOCX — paragraph and heading extraction Auto-fallback — tries lightweight httpx first, falls back to Playwright for JS-heavy pages Async-first — built on httpx and Playwright async APIs Optional extras for specific content types: For HTML pages, if the initial httpx…