Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unicode upcase, downcase #2547

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions COPYING
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,99 @@ STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.



jq uses parts of the open source C library "utf8proc", which is distributed
under the following license:

**utf8proc** is a software package originally developed
by Jan Behrens and the rest of the Public Software Group, who
deserve nearly all of the credit for this library, that is now maintained by the Julia-language developers. Like the original utf8proc,
whose copyright and license statements are reproduced below, all new
work on the utf8proc library is licensed under the [MIT "expat"
license](http://opensource.org/licenses/MIT):

*Copyright © 2014-2021 by Steven G. Johnson, Jiahao Chen, Tony Kelman, Jonas Fonseca, and other contributors listed in the git history.*

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

## Original utf8proc license ##

*Copyright (c) 2009, 2013 Public Software Group e. V., Berlin, Germany*

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

## Unicode data license ##

This software contains data (`utf8proc_data.c`) derived from processing
the Unicode data files. The following license applies to that data:

**COPYRIGHT AND PERMISSION NOTICE**

*Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
under the Terms of Use in http://www.unicode.org/copyright.html.*

Permission is hereby granted, free of charge, to any person obtaining a
copy of the Unicode data files and any associated documentation (the "Data
Files") or Unicode software and any associated documentation (the
"Software") to deal in the Data Files or Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, and/or sell copies of the Data Files or Software, and
to permit persons to whom the Data Files or Software are furnished to do
so, provided that (a) the above copyright notice(s) and this permission
notice appear with all copies of the Data Files or Software, (b) both the
above copyright notice(s) and this permission notice appear in associated
documentation, and (c) there is clear notice in each modified Data File or
in the Software as well as in the documentation associated with the Data
File(s) or Software that the data or software has been modified.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS
INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR
CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF
USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written
authorization of the copyright holder.

Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
registered in some jurisdictions. All other trademarks and registered
trademarks mentioned herein are the property of their respective owners.
34 changes: 31 additions & 3 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ LIBJQ_SRC = src/builtin.c src/bytecode.c src/compile.c src/execute.c \
src/jv_dtoa.c src/jv_file.c src/jv_parse.c src/jv_print.c \
src/jv_unicode.c src/linker.c src/locfile.c src/util.c \
src/decNumber/decContext.c src/decNumber/decNumber.c \
src/jv_dtoa_tsd.c \
src/jv_dtoa_tsd.c \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Unexpected tab spaces.

${LIBJQ_INCS}

### C build options
Expand Down Expand Up @@ -175,13 +175,37 @@ endif
jq.1: jq.1.prebuilt
$(AM_V_GEN) cp $(srcdir)/jq.1.prebuilt $@

SUBDIRS =

AM_CFLAGS += ${utf8proc_CFLAGS}

if BUNDLE_UTF8PROC
LIBJQ_SRC += modules/utf8proc/utf8proc.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modules/utf8proc/** should be marked as vendored in .gitattributes (ref: #2705)

AM_CFLAGS += -I${srcdir}/modules/utf8proc
AM_CPPFLAGS += -I$(srcdir)/modules/utf8proc
else
if BUILD_UTF8PROC
BUILT_SOURCES += $(builddir)/libutf8proc.a
CLEANFILES += $(builddir)/libutf8proc.a
jq_LDADD += $(builddir)/libutf8proc.a

AM_CFLAGS += -I${srcdir}/modules/utf8proc
AM_CPPFLAGS += -I$(srcdir)/modules/utf8proc

$(builddir)/libutf8proc.a:
$(MAKE) $(AM_MAKEFLAGS) -C $(srcdir)/modules/utf8proc builddir=$(shell cd $(builddir) && pwd) libutf8proc.a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using non-POSIX feature may cause portability problems. Hope FreeBSD users could test this.

Makefile.am:196: warning: shell cd $(builddir: non-POSIX variable name
Makefile.am:196: (probably a GNU make extension)

else
jq_LDADD += ${utf8proc_LIBS}
endif
endif

CLEANFILES += jq.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Put this line above the added lines.


### Build oniguruma

if BUILD_ONIGURUMA
libjq_la_LIBADD += modules/oniguruma/src/.libs/libonig.la
SUBDIRS = modules/oniguruma
SUBDIRS += modules/oniguruma
endif

AM_CFLAGS += $(onig_CFLAGS)
Expand All @@ -202,6 +226,7 @@ DOC_FILES = docs/content docs/public docs/templates docs/site.yml \
EXTRA_DIST = $(DOC_FILES) $(man_MANS) $(TESTS) $(TEST_LOG_COMPILER) \
jq.1.prebuilt jq.spec src/lexer.c src/lexer.h src/parser.c \
src/parser.h src/version.h src/builtin.jq scripts/version \
modules/utf8proc \
libjq.pc \
tests/base64.test tests/jq-f-test.sh tests/jq.test \
tests/modules/a.jq tests/modules/b/b.jq tests/modules/c/c.jq \
Expand All @@ -220,7 +245,7 @@ EXTRA_DIST = $(DOC_FILES) $(man_MANS) $(TESTS) $(TEST_LOG_COMPILER) \
tests/onig.test tests/base64.test tests/utf8-truncate.jq \
tests/jq-f-test.sh

AM_DISTCHECK_CONFIGURE_FLAGS=--disable-maintainer-mode --with-oniguruma=builtin
AM_DISTCHECK_CONFIGURE_FLAGS=--disable-maintainer-mode --with-oniguruma=builtin --with-utf8proc=builtin

# README.md is expected in GitHub projects, good stuff in it, so we'll
# distribute it and install it with the package in the doc directory.
Expand All @@ -237,3 +262,6 @@ rpm: dist jq.spec
rpmbuild -tb --define "_topdir ${PWD}/rpm" --define "_prefix /usr" --define "myver $(VERSION)" --define "myrel ${RELEASE}" rpm/SOURCES/jq-$(VERSION).tar.gz
find rpm/RPMS/ -name "*.rpm" -exec mv {} ./ \;
rm -rf rpm

dist-hook:
make -C $(distdir)/modules/utf8proc clean
65 changes: 64 additions & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -289,9 +289,72 @@ AC_SUBST(onig_CFLAGS)
AC_SUBST(onig_LDFLAGS)

AM_CONDITIONAL([BUILD_ONIGURUMA], [test "x$build_oniguruma" = xyes])

dnl utf8proc
AC_ARG_WITH([utf8proc],
[AS_HELP_STRING([--with-utf8proc=prefix],
[specify the location of custom-installed utf8proc library to use, 'builtin' to force built-in, 'bundled' to use built-in AND bundle into libjq, or 'auto' to use default prefix or fallback to builtin])], ,
[with_utf8proc=auto])

utf8proc_CFLAGS=-DUTF8PROC_STATIC
utf8proc_LIBS=
build_utf8proc=yes
bundle_utf8proc=no

AS_IF([test "x$with_utf8proc" = "xbundled" ], [
bundle_utf8proc=yes
], [
AS_IF([test "x$with_utf8proc" = "xauto" ], [
test_prefix=
if test "x$prefix" != xNONE ; then
test_prefix=$prefix
elif test "x$ac_default_prefix" != x ; then
test_prefix=$ac_default_prefix
fi
if test "x$test_prefix" != "" ; then
save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS -I$test_prefix/include"
AC_CHECK_HEADER("utf8proc.h", [
with_utf8proc=$test_prefix
AC_MSG_NOTICE([utf8proc.h found in $test_prefix])
], [
with_utf8proc=builtin
])
CFLAGS="$save_CFLAGS"
fi
])
AS_IF([test "x$with_utf8proc" != xyes -a "x$with_utf8proc" != xbuiltin -a "x$with_utf8proc" != "x" ], [
save_CFLAGS="$CFLAGS"
save_LDFLAGS="$LDFLAGS"

utf8proc_CFLAGS="$utf8proc_CFLAGS -I$with_utf8proc/include"
# utf8proc_LIBS="-L$with_utf8proc/lib -l:libutf8proc.a" # -l: not supported by some compilers
utf8proc_LIBS="$with_utf8proc/lib/libutf8proc.a"

CFLAGS="$CFLAGS $utf8proc_CFLAGS"
AC_CHECK_HEADER("utf8proc.h", [
build_utf8proc=no
], [
AC_MSG_NOTICE([utf8proc.h not found in $with_utf8proc. Will use the packaged utf8proc.])
])

CFLAGS="$save_CFLAGS"
LDFLAGS="$save_LDFLAGS"
])
AS_IF([test "x$build_utf8proc" = xyes ], [
# utf8proc_LIBS used only for libjq.pc
utf8proc_LIBS=`pwd`/libutf8proc.a
])
])

AC_SUBST(utf8proc_CFLAGS)
AC_SUBST(utf8proc_LIBS)

AM_CONDITIONAL([BUILD_UTF8PROC], [test "x$build_utf8proc" = xyes])
AM_CONDITIONAL([BUNDLE_UTF8PROC], [test "x$bundle_utf8proc" = xyes])

AC_SUBST([BUNDLER], ["$bundle_cmd"])

AC_CONFIG_MACRO_DIRS([config/m4 m4])
AC_CONFIG_FILES([Makefile libjq.pc])
AC_OUTPUT

11 changes: 11 additions & 0 deletions docs/content/manual/manual.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1726,6 +1726,17 @@ sections:
input: '"useful but not for é"'
output: '"USEFUL BUT NOT FOR é"'

- title: "`downcase`, `upcase`"
body: |

Emit a copy of the input string with its characters (unicode) converted to the
specified case.

example:
- program: 'upcase'
input: '"useful for é"'
output: '"USEFUL FOR É"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output of manual example needs to be an array (make sure to make tests/man.test)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why they did not show up in jq.1. These examples were just copied from the existing examples for ascii_downcase and ascii_upcase, so I suppose this comment would apply to those as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those examples should be fixed as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those ascii_upcase/ascii_downcase examples as well as a bunch of other examples are not even generated because tehy are using example: instead of examples:.

You should put examples in the examples:, not example:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any definition of the schema that manual.yml should conform to? If so (or if not, and it's easy to create), I will run a validator on it to check for any other issues (and the validation steps would be available for future reuse as well...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a schema for manual files in #2745, and validate both on building the manual locally and on GitHub Actions.


- title: "`while(cond; update)`"
body: |

Expand Down
3 changes: 3 additions & 0 deletions jq.1.prebuilt

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions libjq.pc.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ URL: https://jqlang.github.io/jq/
Description: Library to process JSON using a query language
Version: @VERSION@
Libs: -L${libdir} -ljq
Libs.private: @utf8proc_LIBS@
Cflags: -I${includedir}
113 changes: 113 additions & 0 deletions modules/utf8proc/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
cmake_minimum_required (VERSION 3.0.0)

include (utils.cmake)

disallow_intree_builds()

if (POLICY CMP0048)
cmake_policy (SET CMP0048 NEW)
endif ()
project (utf8proc VERSION 2.8.0 LANGUAGES C)

# This is the ABI version number, which may differ from the
# API version number (defined in utf8proc.h and above).
# Be sure to also update these in Makefile and MANIFEST!
set(SO_MAJOR 2)
set(SO_MINOR 6)
set(SO_PATCH 0)

option(UTF8PROC_INSTALL "Enable installation of utf8proc" On)
option(UTF8PROC_ENABLE_TESTING "Enable testing of utf8proc" Off)
option(LIB_FUZZING_ENGINE "Fuzzing engine to link against" Off)

add_library (utf8proc
utf8proc.c
utf8proc.h
)

# expose header path, for when this is part of a larger cmake project
target_include_directories(utf8proc PUBLIC .)

if (BUILD_SHARED_LIBS)
# Building shared library
else()
# Building static library
target_compile_definitions(utf8proc PUBLIC "UTF8PROC_STATIC")
if (MSVC)
set_target_properties(utf8proc PROPERTIES OUTPUT_NAME "utf8proc_static")
endif()
endif()

target_compile_definitions(utf8proc PRIVATE "UTF8PROC_EXPORTS")

if (NOT MSVC)
set_target_properties(
utf8proc PROPERTIES
COMPILE_FLAGS "-O2 -std=c99 -pedantic -Wall"
)
endif ()

set_target_properties (utf8proc PROPERTIES
POSITION_INDEPENDENT_CODE ON
VERSION "${SO_MAJOR}.${SO_MINOR}.${SO_PATCH}"
SOVERSION ${SO_MAJOR}
)

if (UTF8PROC_INSTALL)
include(GNUInstallDirs)
install(FILES utf8proc.h DESTINATION "${CMAKE_INSTALL_FULL_INCLUDEDIR}")
install(TARGETS utf8proc
ARCHIVE DESTINATION "${CMAKE_INSTALL_FULL_LIBDIR}"
LIBRARY DESTINATION "${CMAKE_INSTALL_FULL_LIBDIR}"
RUNTIME DESTINATION "${CMAKE_INSTALL_FULL_BINDIR}"
)
configure_file(libutf8proc.pc.cmakein libutf8proc.pc @ONLY)
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/libutf8proc.pc" DESTINATION "${CMAKE_INSTALL_FULL_LIBDIR}/pkgconfig")
endif()

if(UTF8PROC_ENABLE_TESTING)
enable_testing()
file(MAKE_DIRECTORY data)
set(UNICODE_VERSION 15.0.0)
file(DOWNLOAD https://www.unicode.org/Public/${UNICODE_VERSION}/ucd/NormalizationTest.txt ${CMAKE_BINARY_DIR}/data/NormalizationTest.txt SHOW_PROGRESS)
file(DOWNLOAD https://www.unicode.org/Public/${UNICODE_VERSION}/ucd/auxiliary/GraphemeBreakTest.txt ${CMAKE_BINARY_DIR}/data/GraphemeBreakTest.txt SHOW_PROGRESS)
add_executable(case test/tests.h test/tests.c utf8proc.h test/case.c)
target_link_libraries(case utf8proc)
add_executable(custom test/tests.h test/tests.c utf8proc.h test/custom.c)
target_link_libraries(custom utf8proc)
add_executable(iterate test/tests.h test/tests.c utf8proc.h test/iterate.c)
target_link_libraries(iterate utf8proc)
add_executable(misc test/tests.h test/tests.c utf8proc.h test/misc.c)
target_link_libraries(misc utf8proc)
add_executable(printproperty test/tests.h test/tests.c utf8proc.h test/printproperty.c)
target_link_libraries(printproperty utf8proc)
add_executable(valid test/tests.h test/tests.c utf8proc.h test/valid.c)
target_link_libraries(valid utf8proc)
add_test(utf8proc.testcase case)
add_test(utf8proc.testcustom custom)
add_test(utf8proc.testiterate iterate)
add_test(utf8proc.testmisc misc)
add_test(utf8proc.testprintproperty printproperty)
add_test(utf8proc.testvalid valid)

if (NOT WIN32)
# no wcwidth function on Windows
add_executable(charwidth test/tests.h test/tests.c utf8proc.h test/charwidth.c)
target_link_libraries(charwidth utf8proc)
add_test(utf8proc.testcharwidth charwidth)
endif()
add_executable(graphemetest test/tests.h test/tests.c utf8proc.h test/graphemetest.c)
target_link_libraries(graphemetest utf8proc)
add_executable(normtest test/tests.h test/tests.c utf8proc.h test/normtest.c)
target_link_libraries(normtest utf8proc)
add_test(utf8proc.testgraphemetest graphemetest data/GraphemeBreakTest.txt)
add_test(utf8proc.testnormtest normtest data/NormalizationTest.txt)

if(LIB_FUZZING_ENGINE)
add_executable(fuzzer utf8proc.h test/fuzzer.c)
target_link_libraries(fuzzer ${LIB_FUZZING_ENGINE} utf8proc)
else()
add_executable(fuzzer utf8proc.h test/fuzz_main.c test/fuzzer.c)
target_link_libraries(fuzzer utf8proc)
endif()
endif()
Loading