> > I download the latest itext and itextsharp and find a bug. If I burst
> > a PDF file into pages and merge them into one PDF file again using
> > pdfsam (
http://sourceforge.net/projects/pdfsam), getPageContent will
> > not return the correct content of the remerged PDF file, while
> > ExtractText from PDFbox(
www.pdfbox.org <
http://www.pdfbox.org>
<
http://www.pdfbox.org>) can
> > extract all the text correctly from the same PDF file.
>
> It's not a bug.
> You are mixing two different concepts.
> 1. you DO get the correct content of the remerged PDF,
> but it's different from the content of the original PDF.
> In the merged PDF the content is added as a PDF Form XObject.
> 2. The text extracted with PDFBox is the text that is in the
> Form XObject. PDFBox parses the page content and discovers
> that the real content is in a different object. It gets
> that object to retrieve the text.
System.IO.EndOfStreamException: Trying to read content after the end of the stream
iTextSharp.text.pdf.RandomAccessFileOrArray.ReadFully(Byte[] b, Int32 off, Int32 len)
iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw(PRStream stream,RandomAccessFileOrArray file)
iTextSharp.text.pdf.PdfReader.GetStreamBytes(PRStream stream,RandomAccessFileOrArray file)
iTextSharp.text.pdf.PdfReader.GetPageContent(Int32 pageNum,RandomAccessFileOrArray file)
But there IS a picture(and nothing else) in the page and GetImportedPage runs well.