A string tokeniser that recognises PDF grammar. When passed an IO stream or a string, repeated calls to token() will return the next token from the source.
This is very low level, and getting the raw tokens is not very useful in itself.
This will usually be used in conjunction with PDF:Reader::Parser, which converts the raw tokens into objects we can work with (strings, ints, arrays, etc)
Creates a new buffer.
Params:
io - an IO stream or string with the raw data to tokenise
options:
:seek - a byte offset to seek to before starting to tokenise :content_stream - set to true if buffer will be tokenising a content stream. Defaults to false
# File lib/pdf/reader/buffer.rb, line 54 def initialize (io, opts = {}) @io = io @tokens = [] @in_content_stream = opts[:content_stream] @io.seek(opts[:seek]) if opts[:seek] @pos = @io.pos end
return true if there are no more tokens left
# File lib/pdf/reader/buffer.rb, line 65 def empty? prepare_tokens if @tokens.size < 3 @tokens.empty? end
return the byte offset where the first XRef table in th source can be found.
# File lib/pdf/reader/buffer.rb, line 116 def find_first_xref_offset @io.seek(-1024, IO::SEEK_END) rescue @io.seek(0) data = @io.read(1024) # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both. lines = data.split(%r[\n\r]+/).reverse eof_index = lines.index { |l| l.strip == "%%EOF" } raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil? raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1 lines[eof_index+1].to_i end
return raw bytes from the underlying IO stream.
bytes - the number of bytes to read
options:
:skip_eol - if true, the IO stream is advanced past a CRLF or LF that is sitting under the io cursor.
# File lib/pdf/reader/buffer.rb, line 80 def read(bytes, opts = {}) reset_pos if opts[:skip_eol] @io.seek(-1, IO::SEEK_CUR) str = @io.read(2) if str.nil? return nil elsif str == "\r\n" # do nothing elsif str[0,1] == "\n" @io.seek(-1, IO::SEEK_CUR) else @io.seek(-2, IO::SEEK_CUR) end end bytes = @io.read(bytes) save_pos bytes end
return the next token from the source. Returns a string if a token is found, nil if there are no tokens left.
# File lib/pdf/reader/buffer.rb, line 105 def token reset_pos prepare_tokens if @tokens.size < 3 merge_indirect_reference prepare_tokens if @tokens.size < 3 @tokens.shift end