jerous' ::1

Trying to find out how pdf's pdfloc works.

Findings fo far:

  • pdfloc(docid,page,line,x1,x2,a,b,c), where
  • docid: sort of hash, same in the whole document
  • page: page number, starting at 0
  • line: indicator. To get real line, do (line-1)/8
  • x1: nr of characters from start of line
  • x2: update x1:=x1+2*x2
  • a: some number, always zero?
  • b: starting marker (0) or ending marker (1)
  • c: always 1

Simple bash script to fetch those annotations and do stuff (though not all works, yet :()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/env bash 
 
# 9,0 9,6 Rosario 
# 9,8 9,14 Tijeras
# 17,0 17,4 Jorge
# 33,0 33,6 Oración
# 49,13 49,15 que
# 57,20 57,27 alcancen
# 979,0 97,4 sabes
# 121,0 121,3 Amén
# 169,10 169,19 UNO
 
grep fragment "/tmp/foo.anot" | grep -o "pdfloc\([^)]*\))" | grep 'pdfloc(cfa0,0'> /tmp/annots
 
while read start; do
 read end
 x1_=0
 x2_=0
 # docid,page,line,x1,x2,0,0,1
 loc1=$(echo $start | sed 's/pdfloc(//' | sed 's/)//')
 loc2=$(echo $end | sed 's/pdfloc(//' | sed 's/)//')
 id1=$[ $(echo $loc1|cut -f1 -d,)+1 ]
 p1=$[ $(echo $loc1|cut -f2 -d,)+1 ]
 l1=$[ ($(echo $loc1|cut -f3 -d,)-1)/8 ]
 #x1_=$(echo $loc1|cut -f5 -d,)
 x1=$[ $(echo $loc1|cut -f4 -d,)+2*$x1_ ]
 
 p2=$[ $(echo $loc2|cut -f2 -d,)+1 ]
 l2=$[ ($(echo $loc2|cut -f3 -d,)-1)/8 ]
 #x2_=$(echo $loc2|cut -f5 -d,)
 x2=$[ $(echo $loc2|cut -f4 -d,)+2*$x2_+2*$x1_ ]
 
 #unkn1=$(echo $loc|cut -f5 -d,)
 #annot_start=$(echo $loc|cut -f6 -d,) # indicate start (0) or stop (1) of annotation marker
 #unkn2=$(echo $loc|cut -f6 -d,)
 
 printf "$loc1\t$loc2\t(P$p1,$(printf "%03i" $l1),$x1)\t-> (P$p2,$l2,$x2)\t"
 sel=$(awk "NR==$l1" /tmp/text)
 sel=${sel:$x1:$[ $x2-$x1+1 ]} # Don't use cut here! No unicode support!
 echo $sel
done < /tmp/annots

@ Sun, 15 Apr 2012 09:23:52 -0700
interesting!

Add a comment

Name (required)
Email (optional, not shown)
Comment (max 1000 characters)

[printer friendly] [static version] [Post listing] [Page listing] [Tags: music tab jazz travel wdb europe code live programming youtube record ]