Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorconfig & Format chapter number to %03d #30

Merged
merged 6 commits into from
Jan 5, 2017
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[*.py]
indent_style = tab
indent_size = 4

[*.rst]
indent_style = space
indent_size = 3
17 changes: 16 additions & 1 deletion comiccrawler/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,15 @@ def analyze_info(mission, mod):

while True:
duplicate = False
for e in reversed(mod.get_episodes(html, url)):

# format title number on analyzed
eps = mod.get_episodes(html, url)
format = mod.config.get("titlenumberformat")
if format:
for e in eps:
e.title = format_number(e.title, format)

for e in reversed(eps):
if e.url in old_urls or e.title in old_titles:
duplicate = True
continue
Expand Down Expand Up @@ -672,3 +680,10 @@ def analyze_info(mission, mod):
remove_duplicate_episode(mission)

print("Analyzing success!")

def format_number(title, format):
"""第3卷 --> 第003卷"""
def replacer(match):
number = match.group()
return format.format(int(number))
return re.sub("\d+", replacer, title)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may cause wrongly replacement when the title itself contains digits.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should try to trim the mission title from the episode title first?

Some title examples:

泰安路47号 第006回
http://www.dm5.com/manhua-taianlu-47-hao/

原作版108(7)
第28-29話
http://tw.ikanman.com/comic/9637/

Copy link
Contributor Author

@kuanyui kuanyui Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every site has different rules, so I use 第(\d+)[話话卷]? such a conservative pattern. Or maybe each site has its own digits formatter?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new API to format title is the safest, but let's consider if there is an universal solution first.

With above solution,

泰安路47号 第006回
原作版108(7)
第28-29話

becomes

泰安路47号 第006回
原作版108(007)
第028-029話

which looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look of this extreme example: http://www.dm5.com/manhua-bianchengnageta/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://i.imgur.com/LzP609B.png
I didn't see the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my fault. I misunderstood something.