Building html parser can be quite time consuming and often complicated. parsel is a python library for parsing html with css or xpath selectors, while parselcli is an interactive shell wrapper around it.

Disclaimer: I wrote parselcli

In short parselcli allows you to do this:

At the moment version 0.31 parselcli has these features:

My Workflow

I build parselcli to have faster and more convenient workflow for building html parsers.

First thing I do is find a product/item that has the higher coverage. For example if I'm crawling a clothing shop I look for a product that has all of the fields like variants, colours, sizes and multiple prices. Shoes in often meet this criteria.
This will be my genesis html that I will use to build my parser.

I pass this url to my parsel and cache it in case I need to run it again in the future.

$ parsel "" --cache
using cached version

Afterwards if the website functions without javascript I use -view command to open up source in my browser or -open command to open up live url in my browser.

In my browsers (chromium or qutebrowser) I open up inspector and click around html code and identify my css selectors if they are possible, xpath if something more complicated is required.
I pop my ideas to parsel shell and see how it looks.

> h1::text
['Herman Melville - Moby-Dick']

If I'm satisfied I save the selectors in my crawler code and move on.

Quite simple really!

Contribution is Welcome!

Parselcli is still in rather early development but I've been using it in production for a while now. Nevertheless any contributions are welcome!

parsel cli releases