Diffbot’s Article API attempts to concatenate (string together) multiple-page articles, returning up to twenty pages of content in a single response.
On some sites or pages our automated algorithm is unable to concatenate automatically. In these cases you can use the Custom API to create a
nextPage rule, providing the CSS selector of an article’s next page link.
To do this:
- Create a new custom field named nextPage.
- Select the element that contains the link to the next page.
The Article API will subsequently use this value to concatenate up 20 pages, creating a single text (and html) field response.
A note on tricky selectors
Sometimes sites don’t identify the next page link using unique CSS selectors (particularly on sites that have links to individually-numbered pages).
For instance, an older layout of Slate.com used the same class —
.sl-art-pag-link — for all links to individual pages, even pages prior to the current page. Using this class alone could result in multiple ‘nextPage’ values and an infinite processing loop.
Our concatenation algorithm will generally prevent infinite loops and repeated content, but writing better CSS selectors will ensure the best performance. In this case, using the following selector will ensure that only the correct next page is identified:
.sl-art-curpage + .sl-art-pag-link
This uses the plus-sign combinator to identify only the page link that is immediately preceded by the current page (.sl-art-curpage). This ensures that only the next page — if it exists — is identified.