PdfText2NormalText

Creator: Frank_Xiao
Time: 2023/12
Update Time: 2024/4

Origin

When you want to copy a text from a pdf file, the pdf often generates redundant '\r\n' (in Windows) at the end of each line causing troubles to paste on online translation like Google Translate or paste in a Word document.

Don't worry, with the help of this simple java program, you can easily get texts without unnecessary new line character so that you can paste to any place you want.

Usage

Prepare java runtime environment and an IDE for java like Intellij IDEA.
Download the project and import to java IDEs.
Copy text to file original.txt, run the program, check out the output.txt
If you want to start a new paragraph, add a new line before the starting line by yourself in original.txt, and the program will keep the new line for you.
Copy the text from output.txt to any place you want.

Example

Input original.txt like:

The basic method for implementing paging involves breaking physical memory (BREAK!)
into fixed-sized blocks called frames and breaking logical memory into (BREAK!)
blocks of the same size called pages.When a process is to be executed, its pages
are loaded into any available memory frames from their source (a file system or
the backing store). The backing store is divided into fixed-sized blocks that are
the same size as the memory frames or clusters of multiple frames. This rather
simple idea has great functionality and wide ramifications. For example, the
logical address space is now totally separate from the physical address space,
so a process can have a logical 64-bit address space even though the system has
less than 264 bytes of physical memory. (NO LINE BELOW)
Every address generated by the CPU is divided into two parts: a page
number (p) and a page offset (d):(ONE LINE BELOW)

The page number is used as an index into a per-process page table. This is
illustrated in Figure 9.8. The page table contains the base address of each frame
in physical memory, and the offset is the location in the frame being referenced.
Thus, the base address of the frame is combined with the page offset to define
the physical memory address. The paging model of memory is shown in Figure
9.9.

The following outlines the steps taken by the MMU to translate a logical
address generated by the CPU to a physical address:

Extract the page number p and use it as an index into the page table.

Extract the corresponding frame number f from the page table.

Replace the page number p in the logical address with the frame number
f .

As the offset d does not change, it is not replaced, and the frame number and
offset now comprise the physical address.

The page size (like the frame size) is defined by the hardware. The size
of a page is a power of 2, typically varying between 4 KB and 1 GB per page,
depending on the computer architecture. The selection of a power of 2 as a
page size makes the translation of a logical address into a page number and
page offset particularly easy. If the size of the logical address space is 2m, and a
page size is 2n bytes, then the high-order m−n bits of a logical address designate
the page number, and the n low-order bits designate the page offset. Thus, the
logical address is as follows:

You will get output.txt like the following:

The basic method for implementing paging involves breaking physical memory into fixed-sized blocks called frames and breaking logical memory into blocks of the same size called pages.When a process is to be executed, its pages are loaded into any available memory frames from their source (a file system or the backing store). The backing store is divided into fixed-sized blocks that are the same size as the memory frames or clusters of multiple frames. This rather simple idea has great functionality and wide ramifications. For example, the logical address space is now totally separate from the physical address space, so a process can have a logical 64-bit address space even though the system has less than 264 bytes of physical memory. Every address generated by the CPU is divided into two parts: a page number (p) and a page offset (d):

The page number is used as an index into a per-process page table. This is illustrated in Figure 9.8. The page table contains the base address of each frame in physical memory, and the offset is the location in the frame being referenced. Thus, the base address of the frame is combined with the page offset to define the physical memory address. The paging model of memory is shown in Figure 9.9.

The following outlines the steps taken by the MMU to translate a logical address generated by the CPU to a physical address:

Extract the page number p and use it as an index into the page table.

Extract the corresponding frame number f from the page table.

Replace the page number p in the logical address with the frame number f .

As the offset d does not change, it is not replaced, and the frame number and offset now comprise the physical address.

The page size (like the frame size) is defined by the hardware. The size of a page is a power of 2, typically varying between 4 KB and 1 GB per page, depending on the computer architecture. The selection of a power of 2 as a page size makes the translation of a logical address into a page number and page offset particularly easy. If the size of the logical address space is 2m, and a page size is 2n bytes, then the high-order m−n bits of a logical address designate the page number, and the n low-order bits designate the page offset. Thus, the logical address is as follows:

Then you can copy the text in the output.txt directly to online translation websites or Word document.

Check out the original.txt and output.txt as the above example.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
res		res
src/pack		src/pack
.gitignore		.gitignore
PdfText2TxtText.iml		PdfText2TxtText.iml
README.md		README.md
original.txt		original.txt
output.txt		output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PdfText2NormalText

Origin

Usage

Example

About

Releases

Packages

Languages

Frank-Xiao2002/PdfText2TxtText

Folders and files

Latest commit

History

Repository files navigation

PdfText2NormalText

Origin

Usage

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages