• Home
  • PHP
  • MySQL
  • Demos
  • HTML
  • CSS
  • jQuery
  • Framework
  • Social
  • Request Tutorial
PHP Lift
  • Home
  • Demos
  • Advertisement
PHP Lift
  • Home
  • PHP
  • MySQL
  • Demos
  • HTML
  • CSS
  • jQuery
  • Framework
  • Social
  • Request Tutorial
  • Follow
    • Facebook
    • Twitter
    • Google+
    • Pinterest
    • Youtube
    • Instagram
    • RSS
How to Extract Text from PDF using PHP
Home
PHP

How to Extract Text from PDF using PHP

July 3rd, 2022 Huzoor Bux PHP 0 comments

Facebook Twitter Google+ LinkedIn Pinterest

You can save text/image data to PDF (Portable Doc Format) files for offline use. A PDF file can be used to display text/graphics content online. A web viewer can be used to embed PDF files in the browser. The PDF file embedded on a web page does not include the text/graphics content. SEO suffers from the inability to render the PDF content on the page. Extract text from PDF to overcome this problem and add it to the web page.

The PDF Parser library can be used to extract elements from PDF files with PHP. This PHP library pulls the text content from all pages and parses PDF files. PHP can parse the PDF file to extract text, headers, and metadata. This tutorial will demonstrate how to extract the text from PDF files with PHP.

DEMO
DOWNLOAD CODE

This example script will show you how to use the PDF Parser library for extracting text from PDF using PHP. We will also show you how to upload PDF files and extract data on-the-fly using PHP.

Install PDF Parser Library

Use the following command to install the PDF Parser library with the composer.

composer require smalot/pdfparser

Note: You don’t have to install the PDF Parser library on its own, as all required files are provided within the code source. You can download the source code if are looking to install and run PDF Parser with a composer.

Incorporate autoloader for loading PDF Parser library and helper functions within a PHP script. PHP script.

include 'vendor/autoload.php';

Extract Text from PDF

The following code snippet extracts all the text content from a PDF file using PHP.

  • Initialize and load PDF Parser library.
  • Specify the source PDF file from which the text content will retrieve.
  • Parse PDF file using parseFile() the function of the PDF Parser class.
  • Extract text from PDF using getText() the method of the PDF Parser class.
<?php
$parser = new \Smalot\PdfParser\Parser();
$PDFfile = 'test.pdf';
$PDF = $parser->parseFile($PDFfile);
$PDFContent = $PDF->getText();
echonl2br($PDFContent);
?>

Here is the PDF Parser library documentation you can explore more features.

Upload PDF File and Extract Text

This code snippet will show how to upload PDFs and extract the text with PHP.

PDF Form for Uploading Files:

Define HTML elements for forms for uploading files.

<form action="parse.php" method="POST" enctype="multipart/form-data">
   <div class="pdf-input"> 
      <label for="pdf">PDF File</label> 
      <input type="file" id="pdf" name="pdf" placeholder="Select a PDF file" required=""> 
   </div> 
   <input type="submit" name="submit" class="btn btn-large" value="Submit"> 
</form>

When you submit the form the file selected is uploaded to the server script to process further.

Server-side script (parse.php) to extract text from PDF File:

The code below can be used for uploading the document and extracting the information from the PDF.

  • Retrieve the name of the file through “$_FILES” inside PHP.
  • Extend the file by using the Pathinfo() function with PATHINFO_EXTENSION Filter.
  • Verify the file to determine whether it’s an official PDF file.
  • Find the path to the file by using tmp_name inside $_FILES.
  • Parse the PDF file you have uploaded and extract text content with the help of the pdf Parser library.
  • Format text content by replacing newlines (\n) with a line break (<br>) employing the nl2br() function within PHP.
$PDFContent = '';
if(isset($_POST['submit'])){
   if(!empty($_FILES["pdf"]["name"])){
      $PDFfileName = basename($_FILES["pdf"]["name"]);
      $PDFfileType = pathinfo($PDFfileName, PATHINFO_EXTENSION);
      $allowTypes = array('pdf');
      if(in_array($PDFfileType, $allowTypes)){
         include 'vendor/autoload.php';
         $parser = new \Smalot\PdfParser\Parser();

         // Source file
         $PDFfile = $_FILES["pdf"]["tmp_name"];
         $PDF = $parser->parseFile($PDFfile);
         $fileText = $PDF->getText();

         // line break
         $PDFContent = nl2br($fileText);
      }
      else
      {
         $PDFContent = '<p>only PDF file is allowed to upload.</p>';
      }
   }
   else
   {
      $PDFContent = '<p>Please select a file.</p>';
   }
}
// Display content
echo $PDFContent;
DEMO
DOWNLOAD CODE

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)

Related

  • Tags
  • Extract PDF
  • Extract Text
  • PHP
  • Text from PDF
Facebook Twitter Google+ LinkedIn Pinterest
Previous article How to Create PDFs from HTML with PHP and Dompdf

Huzoor Bux

I am a PHP Developer

Related Posts

How to Create PDFs from HTML with PHP and Dompdf HTML
June 29th, 2022

How to Create PDFs from HTML with PHP and Dompdf

Is PHP dead in 2021? Is PHP still relevant or worth the effort? PHP
May 22nd, 2022

Is PHP dead in 2021? Is PHP still relevant or worth the effort?

Simple PHP REST API with Slim, PHP & MySQL API
March 3rd, 2022

Simple PHP REST API with Slim, PHP & MySQL

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Advertisement
    Like us
    Recent Posts
    • How to Extract Text from PDF using PHP
    • How to Create PDFs from HTML with PHP and Dompdf
    • How to create a screen recorder in JavaScript
    • Best 10 Programming Languages that will rule in 2022
    • Top 7 Websites To Get Your First Paid Internship
    Categories
    • API
    • Bootstrap
    • Bot
    • CSS
    • CSS 3
    • Database
    • Designing
    • Framework
    • Guide
    • HTML
    • HTML 5
    • JavaScript
    • jQuery
    • MySQL
    • Node.js
    • oAuth
    • Payment
    • PHP
    • Python
    • Social
    • Tips
    • WordPress
    Weekly Tags
    • PHP
    • How to
    • javascript
    • api
    • MYSQL
    • jQuery
    • HTML to PDF
    • PHP Basics
    • Programming Habits
    • HTML5
    • About
    • Privacy Policy
    • Back to top
    © PHPLift 2021. All rights reserved.