グリッドの分割

この節ではグリッドの分割による平面処理への影響について検証する．異なる分割の処理時間と平面化処理結果を比較する．図4.4 は入力画像のサイズ2400X3200に対して，150分割と600 分割の結果である．図4.5は300X400入力画像対して，150分割と15000分割の結果である．

図4.4 分割比較結果a

図4.5 分割比較結果b

図4.6 直線の結果比較

図4.4では150分割と600分割の結果は大きな差がないが平面化処理の時間は8.0秒と18.2秒であった．これに対し，湾曲程度が大きい300X400入力画像に対して，平面化処理の時間は1.8 秒と44.9秒であった．100倍の分割数に対して，処理時間が大きく伸びた．図4.6 は元々は直線の部分を平面化処理後の結果比較である．15000分割の結果は直線に似ているが，150分割の方がより滑らかな線形となっていた.

4.4 ^文字認識

この節では画像内の文書に対して，文字認識OCRと比較評価を行う．本研究ではOCRとし

てGoogle で公開されているCloud Vision[30] を使用した．評価の対象について，撮影した入力

画像と提案手法での平面化処理後の結果であり，それらに対する性能比較を行う対象とした．

図4.7はCloud Visionを利用し，処理前の入力画像と処理後の画像をOCRによって認識した

結果である．水色の枠は認識した文書ブロックであり，緑色の枠は段落に認識した内容である．

黄色の線は認識したの文字列である．

Cloud Visionでは少し湾曲した文字列が検出できるが，処理前ではタイトルの部分は曲がって

いるので，認識しづらいところがあった．処理前の文字認識の正確率73.9％に対して，処理後の正確率は82.6％であり，認識精度が上がることが分かった．

図4.7 Cloud VisionによるOCR結果

第 5 ^章

まとめ

本研究では，カメラで撮影した1枚の見開き書籍画像に対し，ページ領域の分割が難しいという問題点に対して，機械学習による解決方法に着目した．提案手法では，RGB書籍画像を入力，

手動で作成したページ領域をFCNに学習し，画像中のピクセルごどに前景（右ページと左ページ）か背景かの領域を予測するセグメンテーションを行う．学習の成果として，ページ分割ができるが，輪郭部分の精度が低いことが分かった．より精度高いの輪郭を検出するため，FCNの出力結果に対して画像処理方法を使ってページごどを分離し，ROI範囲内の画面に対して輪郭検出を行う．検出した輪郭に対して，輪郭にある4つの頂点を検出し，輪郭を上下左右4つの線に分ける．4つの線を曲線か直線かを判別する．誤った頂点検出結果を修正することができる．輪郭の曲線部分を利用し，グリッド生成する．書籍ページ領域をグリッドで小さい矩形を分割し，矩形画像に対して，射影変換を行うという手法を提案した．

今後の課題として，FCNで画像を出力結果はぼやけた画像という問題があり，現状の機械学習の手法ではこの問題を根本的に解決していない．解決案として，本研究ではモルフォロジー処理という後処理を利用した．その他，領域分割でよく利用する手法であるCRF(Conditional random

field) [31]を後処理として，FCNと組み合わせば，より平滑な輪郭を検出することが可能である

と考える．ROI内の輪郭検出では，指と背景の影響で，正しく検出できないという問題がまだ解決していない．指の問題に対して，指を分類クラスに追加すると考える．また，学習データ拡張が必要である．現在のグリッドは，上下の曲線を指定した均等分割することで生成するが，今後は曲線の湾曲程度によって，不均等分割する手法が望まれる．

今後の展望として，深層学習にようる平面処理は本研究のように，2段階の処理手法ではなく，

end-to-endでの手法実現を期待する．

謝辞

本研究を締めくくるにあたり，ご指導ならびに適切なご助言を下さいました先生方に感謝の意を表します．また，様々な相談に応じて下さった，研究室のメンバーに深く感謝致します.特に柿本研の王さん，FCNの実装や貴重な意見を頂きありがとうございます。

参考文献

[1] インプレス. 2017年度の市場規模は電子書籍、電子雑誌合わせて2500億円を突破！2022 年には 3500 億円規模に『電子書籍ビジネス調査報告書 2018』7 月30 日発売. https:

//www.impress.co.jp/newsrelease/2018/07/. 参照:2019.2.22.

[2] 国立国会図書館. 資料デジタル化基本計画 2016-2020. https://www.ndl.go.jp/jp/

preservation/digitization/. 参照:2019.02.22.

[3] Michael S Brown and W Brent Seales. Document restoration using 3d shape: a general deskewing algorithm for arbitrarily warped documents. InComputer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, Vol. 2, pp. 367–374. IEEE, 2001.

[4] Li Zhang, Yu Zhang, and Chew Tan. An improved physically-based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, No. 4, pp. 728–734, 2008.

[5] Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. Active flattening of curved document images via two structured beams. 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3890–3897, 2014.

[6] Adrian Ulges, Christoph H. Lampert, and Thomas M. Breuel. Document capture using stereo vision. In ACM Symposium on Document Engineering, 2004.

[7] Atsushi Yamashita, Atsushi Kawarago, Toru Kaneko, and Kenjiro T. Miura. Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 1, pp. 482–485 Vol.1, 2004.

[8] Yau-Chat Tsoi and Michael S. Brown. Multi-view document rectification using boundary.

2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007.

[9] Hyung Il Koo, Jinho Kim, and Nam Ik Cho. Composition of a dewarped and enhanced document image from two view images. IEEE Transactions on Image Processing, Vol. 18, No. 7, pp. 1551–1562, 2009.

[10] Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi.

Multiview rectification of folded documents. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, No. 2, pp. 505–511, 2018.

[11] Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book. International Journal of Computer Vision, Vol. 24, No. 2, pp. 125–135, 1997.

[12] Fr´ed´eric Courteille, Alain Crouzil, Jean-Denis Durou, and Pierre Gurdjos. Shape from shading for the digitization of curved documents. Machine Vision and Applications, Vol. 18, No. 5, pp. 301–316, 2007.

[13] Li Zhang, Andy M Yip, Michael S Brown, and Chew Lim Tan. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognition, Vol. 42, No. 11, pp. 2961–2978, 2009.

[14] Huaigu Cao, Xiaoqing Ding, and Changsong Liu. A cylindrical surface model to rectify the bound document image. In null, p. 228. IEEE, 2003.

[15] Jian Liang, Daniel DeMenthon, and David Doermann. Geometric rectification of camera-captured document images. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, Vol. 30, No. 4, pp. 591–605, 2008.

[16] Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot. The common fold:

Utilizing the four-fold to dewarp printed documents from a single image. InProceedings of the 2017 ACM Symposium on Document Engineering, pp. 125–128. ACM, 2017.

[17] Fujitsu. スキャナー ScanSnapSV600. http://scansnap.fujitsu.com/jp/product/

sv600/. 参照:2019.02.22.

[18] Adobe. Adobe Scan アプリで、文書をスキャンして PDF に変換. https://acrobat.

adobe.com/jp/ja/mobile/. 参照:2019.2.22.

[19] Syed Ammar Abbas and Sibt ul Hussain. Recovering homography from camera captured documents using convolutional neural networks. CoRR, Vol. abs/1709.03524, , 2017.

[20] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.

[21] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[22] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.

[23] wkentaro. Image Polygonal Annotation with Python. https://github.com/wkentaro/

labelme/. 参照:2019.2.22.

[24] Nikolaos Stamatopoulos, Basilios Gatos, Ioannis Pratikakis, and Stavros J. Perantonis.

A two-step dewarping of camera document images. 2008 The Eighth IAPR International Workshop on Document Analysis Systems, pp. 209–216, 2008.

[25] Satoshi Suzuki, et al. Topological structural analysis of digitized binary images by border following. Computer vision, graphics, and image processing, Vol. 30, No. 1, pp. 32–46, 1985.

ドキュメント内 FCN による文書画像の平面化処理に関する研究 (ページ 41-52)

4.4 文字認識

第 5 章

まとめ

謝辞

参考文献

4.4 ^文字認識

第 5 ^章