A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing

📂Crawling Frameworks

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python38.4 k

pyspider

A Powerful Spider(Web Crawler) System in Python.

Python13.91 k

colly

Elegant Scraper and Crawler Framework for Golang

Go12.05 k

scrapy-redis

Redis-based components for Scrapy.

Python3.19 k

📂Spider Application

SinaSpider

新浪微博爬虫(Scrapy、Redis)

Python2.87 k

p2pspider

DHT Spider + BitTorrent Client = P2P Spider

Go2.81 k
Python2.46 k

webporter

基于 webmagic 的 Java 爬虫应用

Java2.09 k

scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python38.4 k

huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

Ruby29.83 k

annie

👾 Fast, simple and clean video downloader

Go12.61 k

colly

Elegant Scraper and Crawler Framework for Golang

Go12.05 k

proxy_pool

Python爬虫代理IP池(proxy pool)

Python10.61 k

newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Python10.09 k

webmagic

A scalable web crawler framework for Java.

Java9.01 k

examples-of-web-crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

Python8.5 k

twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

Python8.32 k

crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Go6.72 k

Ⓒ2020 GitHub Index - 🔨Under Construction
📧 admin@githubs.cn - Forum