How To Create Ocr Application In Android
Optical Character Recognition (OCR) gives a computer the ability to read text that appears in an image, letting applications make sense of signs, articles, flyers, pages of text, menus, or any other place that text appears as part of an image. The Mobile Vision Text API gives Android developers a powerful and reliable OCR capability that works with most Android devices and won't increase the size of your app.
In this codelab, you will build an app that shows a live camera preview and speaks any text it sees there. Along the way, you'll learn how to use the Mobile Vision API to delight and empower your users.
What you'll learn
- Initializing the Mobile Vision TextRecognizer.
- Setting up a Processor to receive frames from a camera as they come in and look for text.
- Rendering out that text to the screen at its location.
- Sending that text to Android's TextToSpeech engine to speak it aloud.
What you'll need
- Android Studio version 3.1+
- The sample code.
- A test device with Android 4.1+ and a rear-facing camera.
- A USB micro/USB-C (whichever your device requires) cable.
How will you use this tutorial?
How would rate your experience with building Android apps?
You can download all the sample code to your computer here:
You may need to update your installed version of Google Repository in order to use the Mobile Vision Text API.
Open Android Studio, and then open the SDK Manager:
There, make sure your version of Google Repository is up to date. It should be at least version 26.
Now you're ready to open the starter project.
- Select the
ocr-reader-start
directory from your sample code download (File > Open >ocr-codelab/ocr-reader-start
). - Add the Google Play Services dependency to the app. Without this dependency, the Text API won't be available and you won't be able to build.
Open the build.gradle
file in the app
module and change the dependencies block to include the play-services-vision dependency. When you're done, it should look like this:
dependencies { implementation fileTree(dir: 'libs', include: ['*.jar']) implementation 'com.android.support:support-v4:26.1.0' implementation 'com.android.support:design:26.1.0' implementation 'com.google.android.gms:play-services-vision:15.0.0' }
- Enable USB debugging on your Android device.
- Click the Gradle sync button.
- Click the Run button.
After a few seconds, you should see the Read Text screen come up, but it's just a black screen.
Right now, it doesn't do anything, and the CameraSource
isn't set up. But we're going to change that next.
Frequently Asked Questions
- How do I enable USB debugging?
- Why doesn't Android Studio see my device?
- Android error: Failed to install *.apk on device *: timeout?
To start things out, we're going to create our TextRecognizer
. This detector object processes images and determines what text appears within them. Once it's initialized, a TextRecognizer
can be used to detect text in all types of images. Find the createCameraSource
method and build a TextRecognizer
.
OcrCaptureActivity.java
private void createCameraSource(boolean autoFocus, boolean useFlash) { Context context = getApplicationContext(); // TODO: Create the TextRecognizer TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build(); // TODO: Set the TextRecognizer's Processor. // TODO: Check if the TextRecognizer is operational. // TODO: Create the mCameraSource using the TextRecognizer. }
Just like that, the TextRecognizer
is built. However, it might not work yet. If the device does not have enough storage, or Google Play Services can't download the OCR dependencies, the TextRecognizer
object may not be operational. Before we start using it to recognize text, we should check that it's ready. We'll add this check to createCameraSource
after we initialized the TextRecognizer
.
OcrCaptureActivity.java
// TODO: Check if the TextRecognizer is operational. if (!textRecognizer.isOperational()) { Log.w(TAG, "Detector dependencies are not yet available."); // Check for low storage. If there is low storage, the native library will not be // downloaded, so detection will not become operational. IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW); boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null; if (hasLowStorage) { Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show(); Log.w(TAG, getString(R.string.low_storage_error)); } }
Now that we've checked that the TextRecognizer
is operational, we could use it to detect individual frames. But we want to do something a little more interesting: read text live in the camera view. In order to do that, we'll create a CameraSource
, which is a camera manager pre-configured for Vision processing. We're going to set the resolution high and turn autofocus on, because that's a good match for recognizing small text. If you knew your users would be looking at large blocks of text, like signage, you might use a lower resolution, which would be able to process frames more quickly.
OcrCaptureActivity.java
// TODO: Create the cameraSource using the TextRecognizer. cameraSource = new CameraSource.Builder(getApplicationContext(), textRecognizer) .setFacing(CameraSource.CAMERA_FACING_BACK) .setRequestedPreviewSize(1280, 1024) .setRequestedFps(15.0f) .setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null) .setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO : null) .build();
Here's what the complete createCameraSource
should look like when you're done:
OcrCaptureActivity.java
private void createCameraSource(boolean autoFocus, boolean useFlash) { Context context = getApplicationContext(); // Create the TextRecognizer TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build(); // TODO: Set the TextRecognizer's Processor. // Check if the TextRecognizer is operational. if (!textRecognizer.isOperational()) { Log.w(TAG, "Detector dependencies are not yet available."); // Check for low storage. If there is low storage, the native library will not be // downloaded, so detection will not become operational. IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW); boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null; if (hasLowStorage) { Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show(); Log.w(TAG, getString(R.string.low_storage_error)); } } // Create the cameraSource using the TextRecognizer. cameraSource = new CameraSource.Builder(getApplicationContext(), textRecognizer) .setFacing(CameraSource.CAMERA_FACING_BACK) .setRequestedPreviewSize(1280, 1024) .setRequestedFps(15.0f) .setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null) .setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO : null) .build(); }
If you build the app now, you should see a live camera view! But in order to process images from the camera, we need to handle that last TODO in createCameraSource
: create a Processor
to handle the text detections as they come in. We'll do that in the next step.
By now, your app could detect text on individual frames using the detect method on the TextRecognizer
. That's what you would do if you wanted to find text in a photograph or other image file. But in order to read text straight from the camera, it's useful to implement a Processor
, which will handle detections as often as they become available.
Go to the OcrDetectorProcessor
class and have it implement Detector.Processor<TextBlock>
.
OcrDetectorProcessor.java
public class OcrDetectorProcessor implements Detector.Processor<TextBlock> { private GraphicOverlay<OcrGraphic> graphicOverlay; OcrDetectorProcessor(GraphicOverlay<OcrGraphic> ocrGraphicOverlay) { graphicOverlay = ocrGraphicOverlay; } }
That interface requires two methods be implemented. The first, receiveDetections
, will receive TextBlocks
from the TextRecognizer
as they become available. The second, release
, can be used to cleanly get rid of resources when the TextRecognizer
is disposed of. In this case, we only have to clear the graphic overlay, which cleans up all the OcrGraphic
objects.
We'll get the TextBlocks
from the detection and create OcrGraphic
objects for each text block that the processor detects. For now, they won't render; we'll implement their draw behavior in the next step.
OcrDetectorProcessor.java
@Override public void receiveDetections(Detector.Detections<TextBlock> detections) { graphicOverlay.clear(); SparseArray<TextBlock> items = detections.getDetectedItems(); for (int i = 0; i < items.size(); ++i) { TextBlock item = items.valueAt(i); if (item != null && item.getValue() != null) { Log.d("Processor", "Text detected! " + item.getValue()); OcrGraphic graphic = new OcrGraphic(graphicOverlay, item); graphicOverlay.add(graphic); } } } @Override public void release() { graphicOverlay.clear(); }
Now that the processor is ready, we have to set the textRecognizer
to use it. Head back to the last remaining TODO in the createCameraSource
method in OcrCaptureActivity
:
OcrCaptureActivity.java
// Create the TextRecognizer TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build(); // TODO: Set the TextRecognizer's Processor. textRecognizer.setProcessor(new OcrDetectorProcessor(graphicOverlay));
Now, build and run the app. At this point, you should be able to point the camera at text and see the "Text detected!" debug messages appear in Android Monitor Logcat a few times a second! But that's not a very intuitive way to visualize what the TextRecognizer
is seeing.
In the next step, we'll put that text on screen.
The debug message tells us that text is being recognized. We'd like the user to see that, too, by drawing the text on top of the camera preview.
Let's implement the OcrGraphic
draw method. We want to see if the graphic has text, translate its bounding box to the appropriate coordinates for the canvas, and then draw the box and text.
OcrGraphic.java
@Override public void draw(Canvas canvas) { // TODO: Draw the text onto the canvas. if (text == null) { return; } // Draws the bounding box around the TextBlock. RectF rect = new RectF(text.getBoundingBox()); rect = translateRect(rect); canvas.drawRect(rect, rectPaint); // Render the text at the bottom of the box. canvas.drawText(text.getValue(), rect.left, rect.bottom, textPaint); }
Build and try it out on our sample text:
You should see the box appear on screen with the text in it! Feel free to play with the color values using TEXT_COLOR
.
But how about this one?
The bounding box looks right, but the text is all at the bottom of the box.
That's because the engine puts all the text it recognizes in a TextBlock
into one complete sentence, even if it sees the sentence broken over multiple lines. If you want the complete sentence, that's very useful. But what if you want to know where each individual line of text actually is?
You can get the Lines
from a TextBlock
by calling getComponents
, and then you can iterate over each line to get the location and values of the text within it. This lets you put the text in the place it actually appears.
OcrGraphic.java
@Override public void draw(Canvas canvas) { // TODO: Draw the text onto the canvas. if (text == null) { return; } // Draws the bounding box around the TextBlock. RectF rect = new RectF(text.getBoundingBox()); rect = translateRect(rect); canvas.drawRect(rect, rectPaint); // Break the text into multiple lines and draw each one according to its own bounding box. List<? extends Text> textComponents = text.getComponents(); for(Text currentText : textComponents) { float left = translateX(currentText.getBoundingBox().left); float bottom = translateY(currentText.getBoundingBox().bottom); canvas.drawText(currentText.getValue(), left, bottom, textPaint); } }
Try that again with the text:
Better! You can choose how granular you want to go based on your application's needs. If you like, you can call getComponents
on each Line
and get the positions of the actual Elements
(words, in Latin languages). You can also customize the textSize
to fill as much space as the actual text does on screen.
Now that we can see where the text is and what it says, let's do something with it.
Now, text from the camera has been converted to useful, structured Strings
, and these Strings
are being displayed on screen. Let's do something else with them.
By using the TextToSpeech API built into Android, and implementing the contains
method in OcrGraphic
, we can make the app speak the text out loud when the TextBlock
graphic is touched.
First, let's implement the contains method in OcrGraphic
. The getGraphicAtLocation
method in GraphicOverlay
is already moving the x and y values into the View
's coordinates, so what's left for us to do it check if those coordinates happen to be within the bounds of this graphic's displayed bounding box.
OcrGraphic.java
public boolean contains(float x, float y) { // TODO: Check if this graphic's text contains this point. if (text == null) { return false; } RectF rect = new RectF(text.getBoundingBox()); rect = translateRect(rect); return rect.contans(x, y); }
Now that we can check for whether text is contained at a location, let's go to the onTap
method of OcrCaptureActivity
and handle the tap action by checking for a graphic.
OcrCaptureActivity.java
private boolean onTap(float rawX, float rawY) { // TODO: Speak the text when the user taps on screen. OcrGraphic graphic = graphicOverlay.getGraphicAtLocation(rawX, rawY); TextBlock text = null; if (graphic != null) { text = graphic.getTextBlock(); if (text != null && text.getValue() != null) { Log.d(TAG, "text data is being spoken! " + text.getValue()); // TODO: Speak the string. } else { Log.d(TAG, "text data is null"); } } else { Log.d(TAG,"no text detected"); } return text != null; }
With that in place, you should be able to build and check via Android Monitor (adb logcat) that text is being tapped when you expect it to.
How about speaking the text out loud? Go to the top of the Activity
and find the onCreate
method. When we start the app, we should initialize the TextToSpeech
engine so it's ready when we need it. There's already a TextToSpeech
class variable, so we just need to initialize it with the context and a generic OnInitListener
.
OcrCaptureActivity.java
@Override public void onCreate(Bundle bundle) { // (Portions of this method omitted) // TODO: Set up the Text To Speech engine. TextToSpeech.OnInitListener listener = new TextToSpeech.OnInitListener() { @Override public void onInit(final int status) { if (status == TextToSpeech.SUCCESS) { Log.d("TTS", "Text to speech engine started successfully."); tts.setLanguage(Locale.US); } else { Log.d("TTS", "Error starting the text to speech engine."); } } }; tts = new TextToSpeech(this.getApplicationContext(), listener); }
Okay, all that's left is to add the code to speak the string out loud in the onTap
method.
OcrCaptureActivity.java
private boolean onTap(float rawX, float rawY) { // TODO: Speak the text when the user taps on screen. OcrGraphic graphic = graphicOverlay.getGraphicAtLocation(rawX, rawY); TextBlock text = null; if (graphic != null) { text = graphic.getTextBlock(); if (text != null && text.getValue() != null) { Log.d(TAG, "text data is being spoken! " + text.getValue()); // Speak the string. tts.speak(text.getValue(), TextToSpeech.QUEUE_ADD, null, "DEFAULT"); } else { Log.d(TAG, "text data is null"); } } else { Log.d(TAG,"no text detected"); } return text != null; }
Now, when you build the app and tap on detected text, it should be spoken aloud. Try it out!
You've got an app that can read text straight from the camera and speak it out loud!
From here, you're well set up to explore lots of other possible uses for Text Detection in your own apps. Read addresses and phone numbers from business cards, make images of documents useful or searchable, and assist with translation or accessibility. Apply OCR wherever you want to understand the text contained in an image.
What we've covered
- We set up and used the Mobile Vision Text API to read the text visible from the device's camera.
- We rendered that text to screen, and read it out loud using the Android TextToSpeech.
Next Steps
- Explore other features of the Text API: render the individual words at their locations on screen, or make a real-time "redacter" that renders a black box on top of a particular word wherever it appears.
- The bounding boxes around text give you the information you need to approximate the size of the text as well as its location. Try modifying the
OcrGraphic
to make the text sizes match their real-life counterparts. - Try using the other Vision APIs and find ways to combine these capabilities to enable entirely new uses.
Learn More
- Learn about other Vision capabilities in the Mobile Vision developer documentation.
- Post questions and find answers on Stackoverflow under the android-vision tag.
How To Create Ocr Application In Android
Source: https://kiosk-dot-codelabs-site.appspot.com/codelabs/mobile-vision-ocr/index.html?index=..%2F..index
Posted by: carterthreatin1945.blogspot.com
0 Response to "How To Create Ocr Application In Android"
Post a Comment