Skip to content

Frank-Xiao2002/PdfText2TxtText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PdfText2NormalText

  • Creator: Frank_Xiao
  • Time: 2023/12
  • Update Time: 2024/4

Origin

copy text from pdf file

When you want to copy a text from a pdf file, the pdf often generates redundant '\r\n' (in Windows) at the end of each line causing troubles to paste on online translation like Google Translate or paste in a Word document.

annoying line breaks!

Don't worry, with the help of this simple java program, you can easily get texts without unnecessary new line character so that you can paste to any place you want.

Usage

  1. Prepare java runtime environment and an IDE for java like Intellij IDEA.
  2. Download the project and import to java IDEs. copy to original.txt
  3. Copy text to file original.txt, run the program, check out the output.txt run and get result
  4. If you want to start a new paragraph, add a new line before the starting line by yourself in original.txt, and the program will keep the new line for you.
  5. Copy the text from output.txt to any place you want. enjoy!

Example

Input original.txt like:

The basic method for implementing paging involves breaking physical memory (BREAK!)
into fixed-sized blocks called frames and breaking logical memory into (BREAK!)
blocks of the same size called pages.When a process is to be executed, its pages
are loaded into any available memory frames from their source (a file system or
the backing store). The backing store is divided into fixed-sized blocks that are
the same size as the memory frames or clusters of multiple frames. This rather
simple idea has great functionality and wide ramifications. For example, the
logical address space is now totally separate from the physical address space,
so a process can have a logical 64-bit address space even though the system has
less than 264 bytes of physical memory. (NO LINE BELOW)
Every address generated by the CPU is divided into two parts: a page
number (p) and a page offset (d):(ONE LINE BELOW)

The page number is used as an index into a per-process page table. This is
illustrated in Figure 9.8. The page table contains the base address of each frame
in physical memory, and the offset is the location in the frame being referenced.
Thus, the base address of the frame is combined with the page offset to define
the physical memory address. The paging model of memory is shown in Figure
9.9.

The following outlines the steps taken by the MMU to translate a logical
address generated by the CPU to a physical address:

  1. Extract the page number p and use it as an index into the page table.

  2. Extract the corresponding frame number f from the page table.

  3. Replace the page number p in the logical address with the frame number
    f .

As the offset d does not change, it is not replaced, and the frame number and
offset now comprise the physical address.

The page size (like the frame size) is defined by the hardware. The size
of a page is a power of 2, typically varying between 4 KB and 1 GB per page,
depending on the computer architecture. The selection of a power of 2 as a
page size makes the translation of a logical address into a page number and
page offset particularly easy. If the size of the logical address space is 2m, and a
page size is 2n bytes, then the high-order m−n bits of a logical address designate
the page number, and the n low-order bits designate the page offset. Thus, the
logical address is as follows:

You will get output.txt like the following:

The basic method for implementing paging involves breaking physical memory into fixed-sized blocks called frames and breaking logical memory into blocks of the same size called pages.When a process is to be executed, its pages are loaded into any available memory frames from their source (a file system or the backing store). The backing store is divided into fixed-sized blocks that are the same size as the memory frames or clusters of multiple frames. This rather simple idea has great functionality and wide ramifications. For example, the logical address space is now totally separate from the physical address space, so a process can have a logical 64-bit address space even though the system has less than 264 bytes of physical memory. Every address generated by the CPU is divided into two parts: a page number (p) and a page offset (d):

The page number is used as an index into a per-process page table. This is illustrated in Figure 9.8. The page table contains the base address of each frame in physical memory, and the offset is the location in the frame being referenced. Thus, the base address of the frame is combined with the page offset to define the physical memory address. The paging model of memory is shown in Figure 9.9.

The following outlines the steps taken by the MMU to translate a logical address generated by the CPU to a physical address:

  1. Extract the page number p and use it as an index into the page table.

  2. Extract the corresponding frame number f from the page table.

  3. Replace the page number p in the logical address with the frame number f .

As the offset d does not change, it is not replaced, and the frame number and offset now comprise the physical address.

The page size (like the frame size) is defined by the hardware. The size of a page is a power of 2, typically varying between 4 KB and 1 GB per page, depending on the computer architecture. The selection of a power of 2 as a page size makes the translation of a logical address into a page number and page offset particularly easy. If the size of the logical address space is 2m, and a page size is 2n bytes, then the high-order m−n bits of a logical address designate the page number, and the n low-order bits designate the page offset. Thus, the logical address is as follows:

Then you can copy the text in the output.txt directly to online translation websites or Word document.

Check out the original.txt and output.txt as the above example.

About

transform a part of pdf text into '\n'-free text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages