This is a blog about web crawling (also known as web scraping) subject and all of that in Python programming language when possible. This is result of a severe lack of public resources for web crawling and intended to cover web-crawling as an educational subject.
Web crawling gets a lot of bad reputation and is often the discussions about this subject are avoided in public because it's a misunderstood medium that is heavily damaged by big corporations that want the benefits of public data without having it public.
The purpose of this blog is to cover all range of web crawling subjects - from crawling methods to avoiding crawler detection and reverse engineering public websites as well as specific interesting cases
Hi, I'm Bernard, better known as Granitosaurus on stackexchange and I'm very passionate about web-crawling and Python. I've been working with web crawling for over 5 years now and I still learn new things everyday! I'm the most active on stackoverflow forum on web-scraping subjects, I love to help people out whenever I have the chance.
Currently I'm writing a book for Packt publishing on web-crawling projects, it should be out by the end of 2019!
You can find more about me at the links in the footer. I'm currently living in Chiang Mai, Thailand so if you want to grab a beer feel free to email me :)
This blog open source and publicly available on https://gitlab.com/granitosaurus/crawl.blog
If you're interested in contributing or fixing my mistakes feel free to open up an issue/MR or just email.