Sometimes a single site needs multiple custom rules, perhaps due to template differences or because you wish to extract different data from different types of pages.
If you’re creating a completely custom API, you can always create multiple APIs for the same site. For instance:
/api/categoriesfor category extraction
/api/itemfor item extraction
These APIs could then be used where needed on the sites that have been customized.
If, however, you need to apply the same API to different parts of the same site, you can customize where your rule is in effect by tailoring your rule’s Domain Regex (URL pattern) in the API Toolkit:
By default when you create a new rule, the Domain Regex will apply it to the entire domain. By writing a customized regular expression, you can determine which subset of the web site will be affected by your rule. For example:
- Adjusting the default Domain Regex to
(http(s)?://)?(.*\.)?diffbot.com/products.*will restrict rules from being applied unless a URL contains
- Adjusting a Domain Regex to
(http(s)?://)?(.*\.)?diffbot.com/company.*will restrict rules to those URLs that contain
Using these different Domain Regex values will allow you to apply multiple rulesets within the same API to the same site.