LLM Scraper is a powerful TypeScript library engineered to transform unstructured web page data into structured formats using the advanced capabilities of Large Language Models (LLMs). It leverages LLM function calling, allowing seamless integration with popular LLM platforms such as OpenAI, Groq, Ollama, and GGUF. A key feature is its code generation capability that allows for the creation of reusable Playwright scripts, tailored to specific data schemas for efficient and consistent scraping. Developers seeking a flexible and efficient tool for web data extraction, benefit from its streaming support (Vercel AI SDK) and active open-source community.
This library provides a robust foundation for data extraction. LLM Scraper is well-suited for data scientists, web developers, and AI enthusiasts who need a reliable method to automatically convert HTML pages into structured data. Its open-source nature fosters community-driven improvements, ensuring a constantly evolving tool that stays abreast of emerging best practices. The ability to integrate with multiple LLM providers and the ability to stream results provides unparalleled flexibility.