Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add custom regular expression #10

Merged
merged 3 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,28 @@ def main():
md_converter.convert(md_folders, output_directory="converted")


if __name__ == "__main__":
main()
```

### Custom Regular Expression
`imarkdown` use regular expression to find your images. It supports `![](image_url)` and `<img src="image_url"/>` format, but there are still some other format `imarkdown` can not find it.

At this point, `imarkdown` supports custom regular expression to address this issue. You can customize a regular expression which can find your markdown image url and pass it to MdImageConverter. The following example show how to use it.

```python
from imarkdown import MdImageConverter, LocalFileAdapter, MdFolder


def main():
custom_re = r"(?:!\[(.*?)\]\((.*?)\))|<img.*?src=[\'\"](.*?)[\'\"].*?>"
adapter = LocalFileAdapter()
md_converter = MdImageConverter(adapter=adapter)

md_folder = MdFolder(name="mds")
md_converter.convert(md_folder, output_directory="output_mds", re_rule=custom_re)


if __name__ == "__main__":
main()
```
Expand Down
26 changes: 26 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,30 @@ if __name__ == "__main__":
main()
```

### 自定义正则表达式

`imarkdown`是使用正则表达式对image的url进行识别,当前支持`![](image_url)`和`<img src="image_url"/>`两种图片url的格式,当然,如果你的图片url很奇怪,有的时候`imarkdown`默认的正则表达式也无法识别出来。

这个时候,你可以自定义一个可以识别你的图片的正则表达式,传入`imarkdown`进行识别,下面的示例展示了怎么使用自定义的正则表达式来识别图片。

```python
from imarkdown import MdImageConverter, LocalFileAdapter, MdFolder


def main():
custom_re = r"(?:!\[(.*?)\]\((.*?)\))|<img.*?src=[\'\"](.*?)[\'\"].*?>"
adapter = LocalFileAdapter()
md_converter = MdImageConverter(adapter=adapter)

md_folder = MdFolder(name="mds")
md_converter.convert(md_folder, output_directory="output_mds", re_rule=custom_re)


if __name__ == "__main__":
main()
```


## 开发计划

- [ ] 添加客户端支持
Expand All @@ -282,6 +306,8 @@ if __name__ == "__main__":
- [ ] 提供文件自定义命名
- [ ] 提供图片自定义格式化命名方式
- [ ] 构建PDF转换器
- [ ] 提供markdown其他元素的替换


## FAQ

Expand Down
23 changes: 20 additions & 3 deletions imarkdown/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ class BaseMdImageConverter(BaseModel):
md_file_output_directory: Optional[str] = None
"""The storage directory of converted markdown file."""
converted_md_file_name: Optional[str] = None
"""The converted markdown file name."""
re_rule: str = r"(?:!\[(.*?)\]\((.*?)\))|<img.*?src=[\'\"](.*?)[\'\"].*?>"
"""Default regular expression to find images, you can custom re_rule."""

@root_validator(pre=True)
def variables_check(
Expand Down Expand Up @@ -167,6 +170,7 @@ def convert(
image_local_storage_directory: Optional[str] = None,
output_md_directory: Optional[str] = None,
is_local_images: Optional[bool] = None,
re_rule: Optional[str] = None,
**kwargs,
):
"""Convert Markdown image url and generate a new Markdown file.
Expand All @@ -176,7 +180,8 @@ def convert(
image_local_storage_directory(Optional[str]): Specified image storage path. You can pass an absolute or a
relative path. Default image directory path is the Markdown directory named `markdown_dir/images`.
output_md_directory(Optional[str]): The storage directory of converted markdown file.
is_local_images:
re_rule(Optional[str]): Regular expression to find images, you can custom re_rule.
is_local_images: It is a local images.
**kwargs:
enable_rename(bool): Default is true, it means the generated markdown file will receive a new name.
name_prefix(Optional[str]): Prefix name of generated markdown file.
Expand All @@ -189,6 +194,9 @@ def convert(
return
if is_local_images:
self.is_local_images = is_local_images
if re_rule:
logger.debug(f"[imarkdown] reset regular expression <{re_rule}>")
self.re_rule = re_rule

self.set_converted_md_file_name(md_file_path, **kwargs)
self.set_md_file_original_directory(md_file_path)
Expand All @@ -212,7 +220,7 @@ def convert(
_write_data(converted_md_path, modified_data)
logger.info(f"[imarkdown] <{md_file_path}> converted task end")

def _find_img_and_replace(self, md_str: str) -> str:
def _find_img_and_replace(self, md_str: str, re_rule: Optional[str] = None) -> str:
"""Input original markdown str and replace images address
It can find `[]()` type image url and `<img/>` type image url

Expand All @@ -223,7 +231,7 @@ def _find_img_and_replace(self, md_str: str) -> str:
Markdown data for the image url has been changed.
"""
_images = re.findall(
r"(?:!\[(.*?)\]\((.*?)\))|<img.*?src=[\'\"](.*?)[\'\"].*?>", md_str
self.re_rule, md_str
)

images = []
Expand Down Expand Up @@ -311,6 +319,15 @@ def convert(
enable_save_images: bool = True,
**kwargs,
):
"""Markdown Image convert.

Args:
mediums(Union[MdFile, MdFolder, List[Union[MdFile, MdFolder]]]): MdFile or MdFolder you need to convert.
output_directory(Optional[str]): output directory
enable_save_images(bool): It is save image?
**kwargs:
re_rule(Optional[str]): custom regular expression to find specified element like image.
"""
def check_warning(medium: Union[MdFile, MdFolder]):
if not output_directory and isinstance(medium, MdFolder):
raise ValueError(
Expand Down
4 changes: 2 additions & 2 deletions imarkdown/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
class BaseMdMedium(BaseModel):
name: str
"""markdown medium name"""
absolute_path_name: str
absolute_path_name: str = ""
"""path + name"""
image_directory: Optional[str] = None
"""image storage path if it exists"""
Expand All @@ -32,7 +32,7 @@ class BaseMdMedium(BaseModel):


class MdFile(BaseMdMedium):
absolute_path: str
absolute_path: str = ""
"""absolute path of markdown file"""

def update_config(self, **kwargs):
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

setuptools.setup(
name="imarkdown",
version="1.1.2",
version="1.2.1",
author="Zeeland",
author_email="zeeland@foxmail.com",
description="A practical Markdown image url converter",
Expand Down