There are many authentication schemes on the web, but two of the most common are username+password HTML forms and HTTP basic authentication.
Form-based authentication works by the setting a cookie in your browser using the Set-Cookie header. Here is a full tutorial on how to use login cookies to access content behind login walls in individual APIs. Follow the same procedure for retrieving the login cookie.
If you're using the old Diffbot dashboard to create the crawljob, place the cookie value into the Cookie field:
If you're using the new dashboard, use the "Custom headers" text field and add the Cookie as a single line, like so:
Save the crawljob and it will use this cookie when crawling.
For HTTP Basic based login, the browser will send an Authorization header that is calculated based on the values of the username and password. The header will be of the format
Authorization: Basic $hash where the
$hash is computed as the Base 64 encoding of the string
Once you have the Authorization header, as above, you can then supply this via the Custom Headers field in crawlbot's UI or via the Crawlbot API in order to perform authenticated crawling.