IDE - API Documentation

Available commands

navigate

Navigate the browser to a URL. A 404 status code will throw a dead_page error by default. Use opt.allow_status to override this.

Examples

navigate(<url>);
navigate(input.url);
navigate('https://example.com');
navigate(<url>, {wait_until: 'domcontentloaded'}); // waits until DOM content loaded event is fired in the browser
navigate(<url>, {referer: <url>}); // adds a referer to the navigation
navigate(<url>, {timeout: 45000}); // the number of milliseconds to wait for. Default is 30000 ms
// Don't throw an error if this URL sends a 404 status code
navigate(<url>, {allow_status: [404]});
// Specify browser width/height
navigate(<url>, {
    fingerprint: {screen: {width: 400, height: 400}},
});

Arguments

url string | URL required
A URL to navigate to
'https://example.com'
new URL("https://example.com")
options object optional
navigate options
{
  "timeout": 20000
}
{
  "wait_until": "domcontentloaded"
}
{
  "allow_status": [
    404
  ]
}
allow_status array | string optional
Allow navigation to URLs with these status codes
'all'
[
  404,
  /400/
]
timeout number optional
The number of milliseconds to wait for.

Default: 30000

20000
wait_until array | string optional

Default: [ "domcontentloaded" ]

'domcontentloaded'
'navigate'
'load'
'networkidle0'
'networkidle2'
fingerprint object optional deprecated
Specify browser fingerpint
{
  "screen": {
    "width": 400,
    "height": 400
  }
}
{
  "hide_scrollbars": true
}
{
  "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
hide_scrollbars boolean optional
Hide scrollbars in the browser
user_agent string optional
Specify browser user agent
action_delay number optional
Action delay
screen object optional
Specify browser screen size
width number required
height number required
device_scale_factor number optional
solve_captcha boolean optional
referer string optional

Return value

Type: undefined

close_popup

Popups can appear at any time during a crawl and it's not always clear when you should be waiting for or closing them. Add close_popup() at the top of your code to add a background watcher that will close the popup when it appears. If a popup appears multiple times, it will always be closed

Examples

close_popup('.popup', '.popup_close');
close_popup('iframe.with-popup', '.popup_close', {click_inside: 'iframe.with-popup'});

Arguments

popup_selector string required
A valid CSS selector
close_btn_selector string required
A valid CSS selector
options object optional
close_popup options
click_inside string optional
An iframe selector which contains the close button selector

input

Global object available to the interaction code. Provided by trigger input or next_stage() calls

Examples

navigate(input.url);

tag_html

Save the response data from a browser request

Examples

tag_html('html');
tag_html('html', {iframe: true});

Arguments

name string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key. You can also access an URL of the page by using parser['some_key_url'] or parser.some_key_url, we add _url suffix to the field name
options object optional
tag options
iframe boolean optional
Set to true to get the html of all iframes in the page

tag_request

Make a direct HTTP request

Examples

tag_request('req1', 'http://www.example.com');
tag_request('req1', 'http://www.example.com');
let response_body = wait_for_parser_value('req1');
tag_request('req1', {
    url: 'http://www.example.com',
    method: 'POST',
    headers: {'Content-type': 'application/json; charset=utf-8'},
    body: {hello: 'world'},
})

Arguments

name string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
url_or_object string,URL | object required
the url to make the request to, or request options (see examples)

tag_response

Save the response data from a browser request

Examples

tag_response(<field>, <pattern>, <options>);
tag_response('resp', /url/, {jsonp: true});
tag_response('resp', /url/, {allow_error: true});
tag_response('resp', (req, res)=>{
    if (req.url.includes('/api/'))
    {
        let request_body = req.body;
        let request_headers = req.headers;
        let response_body = res.body;
        let response_headers = res.headers;
    }
});
tag_response('teams', /\/api\/teams/);
navigate('https://example.com/sports');

Arguments

name string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
pattern RegExp | function required
The URL pattern to match. Can be a regex or a function that takes a request and response and returns a boolean
options object optional
Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible
jsonp boolean optional
Set to true to parse response bodies that are in jsonp format.
json boolean optional
Set to true to parse response bodies that are in json format.
allow_error boolean optional
Set to true to allow responses with status code = 400 to be saved as null

tag_graphql

Capture, replay graphql requests with changed variables and save their responses

Examples

let q = tag_graphql('my_graphql', {
    payload: {id: 'ProfileQuery'},
    // you may need to pass url opt as RegExp in case when graphql
    // endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b|\bgql\b/ // default
});
navigate('https://example.com');
let [first_query, first_response] = q.wait_captured();
let second = q.replay({
    variables: {other_id: 2},
});
// in parser
console.log(parser.my_graphql);
// [{data}, {data}]
let q = tag_graphql('my_graphql', {
    payload: {id: 'SearchQuery'},
    // you may need to pass url opt as RegExp in case when graphql
    // endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b|\bgql\b/ // default
});
navigate('https://example.com');
if (!q.is_captured())
    click('#load_more');
let [first_query, first_response] = q.wait_captured();
let second = q.replay({
    variables: {other_id: 2},
});
// in parser
let profiles = parser.my_graphql.map(v=>v.data.profile);

Arguments

name string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
options object optional
Params to control graphql request to capture (see examples)
url RegExp optional
payload object | array optional
json payload to send with the graphql

tag_all_responses

Save the responses from all browser request that match

Examples

tag_all_responses(<field>, <pattern>, <options>);
tag_all_responses('resp', /url/, {jsonp: true});
tag_all_responses('resp', /url/, {allow_error: true});
tag_all_responses('profiles', /\/api\/profile/);
navigate('https://example.com/sports');
let profiles = wait_for_parser_value('profiles');

// parser code
return {profiles: parser.profiles};

Arguments

field string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
pattern RegExp required
The URL pattern to match
options object optional
Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible
jsonp boolean optional
Set to true to parse response bodies that are in jsonp format.
json boolean optional
Set to true to parse response bodies that are in json format.
allow_error boolean optional
Set to true to allow responses with status code = 400 to be saved as null

Return value

Type: undefined

tag_script

Extract some JSON data saved in a script on the page

Examples

tag_script(<field>, <selector>);
tag_script('teams', '#preload-data');
tag_script('ssr_state', '#__SSR_DATA__');
navigate('https://example.com/');

Arguments

name string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
selector string required
A selector of the script to tag

tag_window_field

Tag a javascript value from the browser page

Examples

tag_window_field(<field>, <key>);
tag_window_field('initData', '__INIT_DATA__');

Arguments

field string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
key string required
The path to the relevant data

tag_sitemap

Save a list of urls from a sitemap xml (supports sitemap indexes and .gz compressed sitemaps. See examples.)

Examples

tag_sitemap('sitemap_key', {url: 'https://example.com/sitemap.xml.gz'});
// in parser:
let {pages} = parser.sitemap_key;
tag_sitemap('sitemap_key', {url: 'https://example.com/sitemap-index.xml'});
// in parser:
let {children} = parser.sitemap_key;

Arguments

field string required
The name of the tagged field. Must be alphanumeric. You can access the value in interaction using wait_for_parser_value('some_key'). To access the value in parser code use parser['some_key'] or parser.some_key.
sitemap_options object required
Object containing url of sitemap xml
url string | URL required

Return value

Type: object

wait_for_parser_value

Wait for a parser field to contain a value. This can be useful after you click something to wait for some data to appear

Examples

wait_for_parser_value(<field>[, <validate_fn>][, opt]);
wait_for_parser_value('profile');
navigate('https://example.com');
tag_html('page1')
let html = wait_for_parser_value('page1');
let $ = load_html(html);
let title = $('title').text();
wait_for_parser_value('listings.0.price', v=>{
    return parseInt(v)>0;
}, {timeout: 5000});

Arguments

field string required
The parser value path to wait on
validate_fn function | object optional
An optional callback function to validate that the value is correct
opt object optional
Extra options
timeout number optional
Timeout in milliseconds

load_html

Load html and return Cheerio instance

Examples

load_html(html())
load_html('<div>Text</div>');

Arguments

html string required
HTML string

load_more

Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites

Examples

load_more(<selector>);
load_more('.search-results');
load_more('.search-results', {children: '.result-item', trigger_selector: '.btn-load-more', timeout: 10000});

Arguments

selector string required
Selector for the element that contains the lazy-loaded items
options object optional
load_more options
direction string optional
scroll direction, default is 'end'
inline string optional
same as native el.scrollIntoView({inline})
children string optional
Selector for the children of the lazy-loaded items
trigger_selector string optional
Selector for the element that triggers loading more items
timeout number optional
Maximum time to wait for the loading to complete

Default: 30000

click

Click on an element (will wait for the element to appear before clicking on it)

Examples

click(<selector>);
click('#show-more');
$('#show-more').click()
// Click the closest match to the passed coordinates
// (relative to the page).
// For example, clicking the center pin in a map
let box = bounding_box('#map')
let center = {x: (box.left+box.right)/2, y: (box.top+box.bottom)/2};
click('.map-pin', {coordinates: center});

Arguments

selector string | array | object | any required
Element selector
timeout_options object optional
Timeout options
timeout number optional
Timeout in milliseconds. Time to wait for element to appear
visible boolean optional
Wait for element to be visible

Default: true

coordinates object optional
Click coordinates on the target element. Default is the center of the element
x number required
X coordinate
y number required
Y coordinate
click_options object optional
Click options
delay number optional
Delay in milliseconds
button string optional
Mouse button to use

Default: left

clickCount number optional
Number of clicks

Default: 1

timeout number optional
Timeout in milliseconds. Time to wait for click

Default: 60000

animation_timeout number optional
Timeout in milliseconds. Time to wait for animation to finish

Default: 5000

drag_dx number optional
Drag distance in pixels
drag_dy number optional
Drag distance in pixels
steps number optional
Number of steps for drag

hover

hover on an element (will wait for the element to appear before hovering on it). An alias for click(selector, {}, {clickCount: 0})

Examples

hover(<selector>);
hover('#item');

Arguments

selector string required
Element selector
timeout_options object optional
Timeout options, same as "click" command
click_options object optional
Click options, same as "click" command

select

Pick a value from a select element

Examples

select(<selector>, <value>);
select('#country', 'Canada');

Arguments

selector string | array required
Element selector
value string optional
Value to select
options object optional
Select options
timeout number optional
Timeout in milliseconds

type

Enter text into an input (will wait for the input to appear before typing)

Examples

type(<selector>, <text>);
type('#location', 'New York');
type(<selector>, <text>, {replace: true}); // replacing text in input if it is not empty
type('[id$=input-box]', <text>); // type text to an element with id ending "input-box" (e.g. <input id="c2E57-input-box">)
type(<selector>, ['Enter']); // dispatching 'Enter' key press
type(<selector>, ['Some text', 'Enter']); // typing text and then dispatching 'Enter' key press
type(<selector>, ['Backspace']); // deleting 1 char from input

Arguments

selector string | array required
Element selector
text string | array optional
Text to enter. Can be a string or an array of strings
options object optional
Type options
timeout number optional
Timeout in milliseconds
replace boolean optional
Replace the text in the input if it is not empty

press_key

Type special characters like Enter or Backspace in the currently focused input (usually used after typing something in a search box)

Examples

press_key(<key>);
press_key('Enter');
press_key('Backspace');

Arguments

key string required
Key to press

wait

Wait for an element to appear on the page

Examples

wait(<selector>, <options>);
wait('#welcome-splash');
wait('.search-results .product');
wait('[href^="/product"]');
wait(<selector>, {timeout: 5000});
wait(<selector>, {hidden: true});

Arguments

selector string | number | array | function | object | object required
Element selector
options object optional
wait options (see examples)
{timeout: 5000}
{hidden: true}
{visible: false}
timeout number optional
Timeout in milliseconds
timeout_message string optional
Error message on timeout

Default: timeout

hidden boolean optional
Wait for element to be hidden
visible boolean optional
Wait for element to be visible
inside string optional
Iframe selector if target element is inside an iframe
iframe#my-iframe
inside_nested array optional
[
  "iframe#parent",
  "iframe#child"
]

wait_hidden

Wait for an element to not be visible on the page (removed or hidden)

Examples

wait_hidden(<selector>);
wait_hidden('#welcome-splash');
wait_hidden(<selector>, {timeout: 5000});

Arguments

selector string | number | function required
Element selector
options object optional
wait options (see examples)

wait_visible

Wait for an element to be visible on the page. Alias for wait(selector, {visible: true})

Examples

wait_visible(<selector>);
wait_visible('#welcome-splash');
wait_visible(<selector>, {timeout: 5000});

Arguments

selector string | number | function required
Element selector
options object optional
wait options (see examples)

wait_for_text

Wait for an element on the page to include some text

Examples

wait_for_text(<selector>, <text>);
wait_for_text('.location', 'New York');
wait_for_text('.location', /new york/i, {timeout: 5000});

Arguments

selector string required
Element selector
text string | object required
The text to wait for
options object optional
wait options (see examples)

wait_network_idle

Wait the browser network has been idle for a given time

Examples

wait_network_idle();
wait_network_idle({
    timeout: 1e3,
    ignore: [/long_request/, 'https://example.com'],
});

Arguments

options object optional
options for wait_network_idle
timeout number optional
Wait for browser network to be idle for X milliseconds
whitelist RegExp | string | array optional deprecated
An array of patterns to exclude requests from monitoring
ignore RegExp | string | array optional
An array of patterns to exclude requests from monitoring

wait_page_idle

Wait until no changes are being made on the DOM tree for a given time

Examples

wait_page_idle();
wait_page_idle(5000);
wait_page_idle({
    ignore: [<selector1>, <selector2>],
    idle_timeout: 1000,
});

Arguments

timeout_or_options number | object optional
Milliseconds to wait for no changes
options object optional
An object, which can accept a ignore argument to exclude some elements from monitoring
timeout number optional
idle_timeout number optional
ignore array optional
whitelist array optional

el_exists

Check if an element exists on page, and return a boolean accordingly

Examples

el_exists('#example');
el_exists('.does_not_exist'); // => false
el_exists('.does_not_exist', 5e3); // => false (after 5 seconds)

Arguments

selector string required
Element selector
timeout number optional
Timeout duration to wait for the element to appear on the page. Default is 2 seconds
options object optional
Wait options

Return value

Type: boolean

el_is_visible

Check if element is visible on page

Examples

el_is_visible('#example');
el_is_visible('.is_not_visible', 5e3);

Arguments

selector string required
Element selector
timeout number optional
Timeout duration to wait for the element to be visible on the page. Default is 2s

Return value

Type: boolean

next_stage

Run the next stage of the crawler with the specified input

Examples

next_stage({url: 'http://example.com', page: 1});
next_stage({caregory_id: 15}, 'self');

Arguments

input object required
Input object to pass to the next browser session
type string optional
When "self" is provided, it will rerun the same stage

rerun_stage

Run this stage of the crawler again with new input

Examples

rerun_stage({url: 'http://example.com/other-page'});

Arguments

input object required
Input object to pass to the next browser session

Return value

Type: undefined

run_stage

Run a specific stage of the crawler with a new browser session

Examples

run_stage(2, {url: 'http://example.com', page: 1});

Arguments

stage number required
Which stage to run (1 is first stage)
input object required
Input object to pass to the next browser session

dead_page

Mark a page as a dead link so you can filter it from your future collections (error_code=dead_page)

Examples

dead_page('Product was removed');
dead_page();

Arguments

message string optional
A specific error message

Default: Dead page detected

bad_input

Mark the collector input as bad. Will prevent any crawl retries (error_code=bad_input)

Examples

bad_input();
bad_input('Missing search term');

Arguments

message string optional
A specific error message

blocked

Mark the page as failed because of the website refusing access (error_code=blocked)

Examples

blocked();
blocked('Login page was shown');

Arguments

message string optional
A specific error message

detect_block

Detects a block on the page

Examples

detect_block({selector: '.foo'}, {exists: true});
detect_block({selector: '.bar'}, {has_text: 'text'});
detect_block({selector: '.baz'}, {has_text: /regex_pattern/});

Arguments

Selector object required
Object with a selector field
{
  "selector": ".foo"
}
selector string required
Condition object required
Object with condition, at least 1 'exists' or 'has_text' is required
{
  "exists": true
}
{
  "has_text": "text"
}
{
  "has_text": /regex_pattern/
}
exists boolean optional
Whether the element should exist
has_text string | RegExp optional
Whether the element should have text
'text'
/[a-z]/

country

Configure your crawl to run from a specific country

Examples

country(<code>);
country('us');

Arguments

country_code string required
2-character ISO country code

proxy_location

Configure your crawl to run from a specific location. Unless you need high resolution control over where your crawl is running from, you probably want to use `country(code)` instead

Examples

proxy_location({country: 'us'});
proxy_location({lat: 37.7749, long: 122.4194});
proxy_location({lat: 37.7749, long: 122.4194, country: 'US', radius: 100});

Arguments

configuration object optional
Object with a desired proxy location, check examples for more info
lat number optional
Latitude in range [-85, 85]
long number optional
Longitude in range [-180, 180]
country string optional
2-character ISO country code
radius number optional
Radius in KM

scroll_to

Scroll the page so that an element is visible If you're doing this to trigger loading some more elements from a lazy loaded list, use load_more() Defaults to scrolling in a natural way, which may take several seconds. If you want to jump immediatley, use {immediate: true}

Examples

scroll_to(<selector>);
scroll_to('.author-profile');
scroll_to('top'); // scroll to the top of the page
scroll_to('bottom'); // scroll to the bottom of the page
scroll_to('top', {immediate: true}); // jump to top of page immediately

Arguments

selector string required
Selector for the element you want to scroll to
options object optional deprecated
scroll options
immediate boolean optional
duration any optional

verify_requests

Monitor failed requests with a callback function

Examples

verify_requests(({url, error, type, response})=>{
    if (response.status!=404 && type=='Font')
        throw new Error('Font failed to load');
});

Arguments

callback function required
A function which will be called on each failed request with an object in format: {url, error, type, response}

console

Log messages from the interaction code

Examples

console.log(1, 'brightdata', [1, 2], {key: value});
console.error(1, 'brightdata', [1, 2], {key: value});
Sets a cookie with the given cookie data; may overwrite equivalent cookies if they exist

Examples

set_session_cookie(<domain>, <name>, <value>);

Arguments

domain string required
Cookie domain
name string required
Cookie name
value string required
Cookie value

set_session_headers

Set extra headers for all the HTTP requests

Examples

set_session_headers({'HEADER_NAME': 'HEADER_VALUE'});

Arguments

headers object required
Object with extra headers in key-value format

click_at

Move the mouse to the specified (x,y) position and click

Examples

click_at(<x position>, <y position>);
click_at(100, 100);

Arguments

target_x_position number required
Target x position
target_y_position number required
Target y position
click_options object optional
Click options
delay number optional
Delay in milliseconds
button string optional
Mouse button to use

Default: left

clickCount number optional
Number of clicks

Default: 1

timeout number optional
Timeout in milliseconds. Time to wait after move before click

Default: 300

html

Get the HTML of the current page

Examples

html();
html({iframe: true});

Arguments

options object optional
html options
iframe boolean optional
Set to true to get the html of all iframes in the page, an array will be returned

solve_captcha

Solve any captchas shown on the page

Examples

solve_captcha();
solve_captcha({type: 'simple', selector: '#image', input: '#input'});
solve_captcha({type: 'hcaptcha', url: 'https://hcaptcha.com', captcha_key: '{hash_string}'});

URL

URL class from NodeJS standard "url" module

Examples

let u = new URL('https://example.com');

Arguments

url optional
URL string

location

Object with info about current location. Available fields: href

Examples

navigate('https://example.com');
location.href; // "https://example.com/"

capture_graphql

Capture and replay graphql requests with changed variables

Examples

let q = capture_graphql({
    payload: {id: 'ProfileQuery'},
    // you may need to pass url opt as RegExp in case when graphql
    // endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b|\bgql\b/ // default
});
navigate('https://example.com');
let [first_query, first_response] = q.wait_captured();
let second = q.replay({
    variables: {other_id: 2},
});
let q = capture_graphql({
    payload: {id: 'SearchQuery'},
    // you may need to pass url opt as RegExp in case when graphql
    // endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b|\bgql\b/ // default
});
navigate('https://example.com');
if (!q.is_captured())
    click('#load_more');
let [first_query, first_response] = q.wait_captured();
let second = q.replay({
    variables: {other_id: 2},
});

Arguments

options optional
Params to control graphql request to capture

scroll_to_all

Scroll through the page so that all the elements matching the selector will be visible on screen

Examples

scroll_to_all(<selector>);
scroll_to_all('.author-profiles');

Arguments

selector optional
Selector of the elements you want to scroll through

bounding_box

The box of coordinates that describes the area of an element (relative to the page, not the browser viewport). Only the first element matched will be measured

Examples

let box = bounding_box('.product-list');
// box == {
//   top: 10,
//   right: 800,
//   bottom: 210,
//   left: 200,
//   x: 200,
//   y: 10,
//   width: 600,
//   height: 200,
// }

Arguments

Selector optional
A valid CSS selector for the element

$

Helper for jQuery-like expressions

Examples

$(<selector>);
wait($('.store-card'))

Arguments

selector optional
Element selector

request

Make a direct HTTP request

Examples

let res = request('http://www.example.com');
let res = request({url: 'http://www.example.com', method: 'POST', headers: {'Content-type': 'application/json; charset=utf-8'}, body: {hello: 'world'}})

Arguments

url | options optional
the url to make the request to, or request options (see examples)

Was this article helpful?