Releases: Joungkyun/libchardet
Releases · Joungkyun/libchardet
1.0.6
Security Issues
On 1.0.5 and before, a bug that has accessed deleted heap memory in chardet and chardet_r api has been fixed. (#18) Thanks for @gaoxiang-ut
Changes:
-
fixed #9 configure.ac needs subdir-objects
-
fixed #10 autogen failure because AM_PROG_AR with automake 1.11.1
-
fixed #12 No include guard
-
fixed #13 bom member has been added to the DetectObj structure
- New unicode language model : BOCU-1, GB-18030, SCSU, UTF-1, UTF-7, UTF-EBCDIC
diff --git a/src/chardet.h b/src/chardet.h index 84975a3..f603a37 100644 --- a/src/chardet.h +++ b/src/chardet.h @@ -89,6 +89,7 @@ extern "C" { typedef struct DetectObject { char * encoding; float confidence; + short bom; } DetectObj; CHARDET_API char * detect_version (void);
#ifdef CHARDET_BOM_CHECK printf ("#1 %s : %s : %f : %d\n", string, obj->encoding, obj->confidence, obj->bom); #else printf ("#1 %s : %s : %f\n", string, obj->encoding, obj->confidence); #endif
-
fixed #14 can't detect short euc-kr
-
fixed #15 support automake style 'make check'
-
fixed #18 SECURITY! Invalid memory approach (heap-use-after-free) (@gaoxiang-ut)
1.0.5
Changes:
-
#8 fixed can not detect UTF-16/32.
- This is binary safe problems
- In order to solve this problems, support _detect_r_ and _detect_handledata_r_ API.
- Support _CHARDET_BINARY_SAFE_ consantant whether support _detect_r_ or _detect_handledata_r_
#ifdef CHARDET_BINARY_SAFE if ( detect_r (str[i], strlen (str[i]), &obj) == CHARDET_OUT_OF_MEMORY ) #else if ( detect (str[i], &obj) == CHARDET_OUT_OF_MEMORY ) #endif { fprintf (stderr, "On handle processing, occured out of memory\n"); return CHARDET_OUT_OF_MEMORY; } #ifdef CHARDET_BINARY_SAFE if ( detect_handledata_r (&d, str[i], strlen (str[i]), &obj) == CHARDET_OUT_OF_MEMORY ) #else if ( detect_handledata (&d, str[i], &obj) == CHARDET_OUT_OF_MEMORY ) #endif { fprintf (stderr, "On handle processing, occured out of memory\n"); return CHARDET_OUT_OF_MEMORY; }
-
Merge uchardet's improves
- #6 fixed extended character range on EUT-KR and EUC-TW
- can detect CP949 (for example, "똠방각하", "뷁")
- can detect extended EUC-TW ("灣,是,台" and so on)
- #2, #5 Improve single-byte charset detection confidence algorithm
- #4 New single-byte language model
- Arabic
- Danash
- Esperanto
- German
- Spanish
- Turkish
- Vietnamese
- #6 fixed extended character range on EUT-KR and EUC-TW
-
#3 Update language model of Greek, Hungarian and Thai
-
fixed man pages wrong macro bug (martin.gansser@gmail.com)
1.0.4
1.0.3
- add chardet.pc (Lee ByungYoung <darklin20@gamil.com>) @2fd56976
- applied automake @0fdd1e28
- add english man page @da00ca24
- fixed comparison on JpCntx.cpp @7f624fb2