Internet crawling could have several use cases, but the basics would be: Get information from a diverse pool of websites and use that information without the need for manual searching between all the websites crawled but, WHAT FOR? You can turn that data into information to make a decision, or maybe not, but your visitors.

For example, you want to buy a pair of sneakers, but you don’t know which store gives you the best price or sale. In this case, you can crawl a group of websites of stores that sell sneakers and have all the prices from each store to show you which store gives you the best deal. (Have you ever heard about google shopping? https://www.google.com/shopping?hl=es). To automate this, you can create a program that visits the page, fills in the  fields, makes a few clicks, retrieves the information, and saves it, but how easy is it?

It can be challenging because it varies depending on the technologies used for each website, depending on the purpose of the website, it could or could not be using technologies that difficult the access to the data, for example asking for a user and password, having fancy stuff that keeps blocking your IP if you try to make a specified number of requests in a defined period, even something as simple as putting reCAPTCHA element can increase the difficulty of the task. It is because some markets depend on you going directly to their site and use it, so they work directly on trying to deny you from crawling their site. And there are others which do not know how to block it or even do not  realize that they need to do it, in this case, crawling can be easy.

But to know how easy it is going to be, you need time and effort to analyze each website and try to come up with the best solution  to get the information you require. Does this bring you a business idea?

If you are interested to know more about web crawling and considering this Coronavirus situation, please contact us and we can give you support with zero expectations…

admin

admin

Leave a Replay

become part of our TEAM