fix : article pages, refactor and better test data generation
crawlers:
- Remove constructor when it is not necessary
- Move
get_or_create_sourceandget_or_create_periodeinto static methods - Use
article.fpageandarticle.lpagewhen possible instead ofarticle.page_range
cleanup_str:
- Normalize strings using
NFKC(Unicode) - Remove some characters (\xf7, \r) in strings
tests:
- Update test data
- Sort test data jsons by key
- Allow regenerating the same dataset: use
--keep
Edited by Nathan Tien You