patch(report_news): added tericcabrel blog as source#127
Conversation
| xml_raw = ET.fromstring(body) | ||
| articles = [] | ||
|
|
||
| for url in xml_raw.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url')[-MAX_ARTICLES:]: |
There was a problem hiding this comment.
Feels like restraining it to MAX_ARTICLES, would make us skip a bunch of articles published on that blog. Maybe we can have an upper limit of 100 ? The MAX_ARTICLES is the maximum amount of article we want to publish on the telegram message, not the max of articles we want to choose from.
There was a problem hiding this comment.
The sitemap contains a huge amount of articles. Parse all these articles to extract their content will take time and can overload the server.
It's why I opted to take the last n articles.
There was a problem hiding this comment.
This makes sense howerver, I still want us be drawing from a bigger set of articles than 10, as this can make us skip during an initial run on a blog that has more 10 articles, potential interesting articles.
The articles are listed from latest to oldest
Related issues
report_newsbot #126Preview
Change
Added the tericcabrel's blog as source