Parsing HTML with Go using stream processing
Over a month ago I’ve created a handy tool to create sitemaps only with URLs matching specific rules. I want to talk about problems and challenges I faced, but mainly explain the art of stream processing. Data scraping and more precisely web scraping involves parsing documents which access to is often quite slow. More often than not the flow looks like this: Make a request Parse whole body into html....