kalle07
/

pdf2txt_parser_converter

Model card Files Files and versions

kalle07 commited on May 30

Commit

3bc61ac

·

verified ·

1 Parent(s): 2778ed1

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -32,8 +32,10 @@ I work with "<b>pdfplumber/pdfminer</b>" none OCR, so its very fast!<br>
 <li>Intelligent multiprocessing</li>
 <li>Error tolerant, that means if your PDF is not convertible, it will be skipped, no special handling</li>
 <li>Instant view of the result, hit one pdf on top of the list</li>
-<li>Converts some common tables as json inside the txt file</li>
-<li>It adds the absolute PAGE number to each page</li>
 <li>All txt files will be created in original folder of PDF</li>
 <li>All previous txt files are overwritten</li>
 <li>aprox 5 to 20 Pages/sec - depends on complexity</li>

 <li>Intelligent multiprocessing</li>
 <li>Error tolerant, that means if your PDF is not convertible, it will be skipped, no special handling</li>
 <li>Instant view of the result, hit one pdf on top of the list</li>
+<li>Converts some common tables as json-foramt inside the txt file, readable for embedder</li>
+<li>Adds the absolute PAGE number to each page</li>
+<li>Adds the label “Chapter” for large font and/or “important” for bold font</li>
+<li>tested on 300 PDF files ~30000 pages</li>
 <li>All txt files will be created in original folder of PDF</li>
 <li>All previous txt files are overwritten</li>
 <li>aprox 5 to 20 Pages/sec - depends on complexity</li>