-
Notifications
You must be signed in to change notification settings - Fork 24
/
TODO
157 lines (120 loc) · 6.18 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
----------------------------------------------
save .git/hooks/pre-commit
#!/bin/bash
FILE=release_version.h
CUR_NUM=`git show HEAD:./release_version.h | awk '{print $3}' | sed 's/"//g'`
NEW_NUM=`cat ./release_version.h | awk '{print $3}' | sed 's/"//g'`
if [ $NEW_NUM -le $CUR_NUM ]; then
echo '***********************************************************************'
echo 'Failure: increment RELEASE_VERSION before committing (release_version.h)'
echo ' do `git commit --no-verify` to skip this check and commit anyway'
echo '***********************************************************************'
exit 1
fi
exit 0
----------------------------------------------
NEW TODO 2018-12-07
- TEXINFO documentation (like CGDB)
- mailing list
- man page
- move player_ao over to new format, just to do it.
- Windows Build
- gprof, valgrind
s change project name to NanoTTS (exe is still nanotts)
X write Sound Playback Interface
- OpenAudioDevice()
- SubmitSamples()
- CloseAudioDevice()
X write Alsa module that exports this interface
S add ./configure script that detects alsa, --enable-alsa, --disable-alsa,
- the configure script determines
which one to build. The main code always calls "SubmitSamples()", the dummy
does nothing, Alsa plays the samples.
s -v, --version flag reports the MAJOR.MINOR.PATH and OUTPUT_MODULE_LINKED_IN
X all inputs and outputs should be explicitly specified.
x WAV output optional
x PLAYBACK optional, must be explicit
X make install, make uninstall
X dynamic lingware detection (install won't really work without this)
X PLAYBACK: stream PCM to audio device instead of writing file and reading it
S make libao detection dynamic
- compile player_ao as a dynamic lib
- (can this be made to allow failure, or check deps first?)
- loader function that if dlopen("ao") succeeds, calls dlopen("player_ao") to load player class
- PlayFile() wrapper that can send bytes to multiple Output modules
=============================================================================
- playback keys: spacebar, left+right arrow, ESC, +/- (playback speed)
- some kind of playback callback that returns a number, a percentage counter of how far
we are in the playback, for instance, if there are 110,000 samples, it returns the
last sample number, so we can ... write other software and display a progress bar.
- gprof, valgrind
- write man page
- autonaming func
- look for more Lingware/voices. Are there proprietary ones?
- submit to google code, other places
X free() pico buffs,
X separate out playAudio class,
X add wav header code,
- extend playAudio class with possible modules
- libmp3lame
- oggvorbis
- libfaad
- flac
- monkeys audio
==============================================================================
- new design. due to the potentially large outputs, because output is in 16-bit WAV,
we are going to go with a design built around WRITING_FILES. and not just sending
data in memory to PCM. Due to this,
we need to have:
- DEFAULT BYTE SIZE to segment files = 50mb
- takes optional prefix
- auto-file naming routine:
PREFIX=PREFIX_DEFAULT
if ( user_prefix )
PREFIX=user_prefix
<PREFIX>-001.wav (50mb)
<PREFIX>-002.wav (50mb)
- argument to set OUTPUT_DIRECTORY (default: ".")
- argument to set SEGMENT_SIZE (default: "50mb")
- argument to set PREFIX (default: "nanotts00")
- argument to playback in PCM (default: "off")
- So we write these files; they set to a directory; the program knows where it wrote
them. Then if we had told it to playback the result, it will just start loading them
after the WAV generation is complete. It's neat and simple. We could also set it to
delete the WAVS after it plays, but I think that we leave that to bash instead.
- code to read common size suffix:
(B | b) = bytes
(K | k) = kilobytes
(M | m) = Megabytes
(G | g) = gigabytes
- code to check current directory for "prefix*", to prevent overwrites
if (last two digits of prefix are [0-9][0-9], it increments and makes that the new prefix)
else it adds a '02' to end of prefix, and makes prefix02 the new prefix;
So we could reasonably have audiobook-00x.wav, and then audiobook02-00x.wav, audiobook03-00x.wav
-----------------------------------------------------------------------------
- 16-bit per sample (mono) 16
- 16000 per second 16 * 16000
- 60 per minute 16 * 16000 * 60 15360000 = 15-megabytes per minute
- 60 per hour 16 * 16000 * 3600 921600000 = not quite a gigabyte per hour
-----------------------------------------------------------------------------
Ok, I take it back, the data space requirements are huge. We do need a codec. Since it's already
lo-fidelity spoken-word, you could just head straight for 32kbps mp3 as the target.
of course, as usual, I know next to nothing about codecs--I should know more. the other thing
is, you should be able to drop the object file into a folder and load it into the program, so
that I am not bound to one codec over another.
------------------------------------------------------------------
lame:
echo "eenie meany miny moh" | ./nanotts -c | lame -r -s 16 --bitwidth 16 --signed --little-endian -m m -b 32 -h - out.mp3
------------------------------------------------------------------
20141111
lightbulb idea!
* What if you integrate a video game like sound mixer system in with the voice output
* So that not only can you play back text, but you can format text in such a way that
small, unobtrusive sound-effects are inserted as cues.
* Better yet, you augment the markup style already supported in Pico <speed level=200>words</speed>
but to include sound effects. And you set the path at the beginning
<meta sfxpath="/path/to/files">
- the system loads these files, as well as any that are packaged with the system.
Then, in markup, you can set spaces and sfx cues IN THE TEXT. And all the parser-system has to do is play it back.
<s name="bing"/><a name="bong"/>Good Morning Sir.
------------------------------------------------------------------