This article explains the behavior difference between worker types, how to choose the right one for your project
What is the difference between Browser Worker and Code Worker?
- Browser Workers:
- can simulate a user's interaction with the website via a headless browser
- Browser worker is more expensive to use, in terms of CPM (Cost Per Thousand page loads)
- Handles complex scraping tasks like filling forms, and dynamic content loading.
- Code workers:
- Roughly equivalent to doing a curl or python `requests.get(url)`
- Work by sending HTTP requests to the target website
- Much cheaper
- Can only work in situations that don't require interacting with the website UI
How to choose the optimal type for the scraper
You should choose the right worker type based on the technology used by the website you want to scrape, and the navigation needed for scraping the data you need.
It's good to start with the cheaper code workers and only change if you find that you cant' get the data you want.
- If you need to click on element to load some more data
- If you need to use scroll for load more elements
- If you need to use tag_script, tag_response (capture network traffic from inside the browser)
- if you need to type some text to get data on the website to do a search
Your code should be aligned with the worker type
Some functions in our library are only available when using browsers and will throw an error if you try to use them from code workers.
Below is a list of function that you can only use from browser workers:
- wait_* (any wait function)
- scroll_* (any scroll function)
- tag_* (any tag function)