Comments (17)
Hi @SeaDude, not sure if you still need a solution, but I found one using the Computer Vision SDK. We use the Batch Read File API.
This sample will take in a list of images with text, upload them, crop them around the last line of text (location is adjustable) and then run the API on the list of cropped images. I'll include the images here if you wanted to download and use, but feel free to use your own as well. The images are identical for the sake of testing (and showing a crop in the exact same place).
Add the images (named differently) to an Image folder you create in your working directory.
Here is the sample:
https://github.com/Azure-Samples/cognitive-services-quickstart-code/blob/master/python/ComputerVision/ExtractText.py
from cognitive-services-rest-api-samples.
Wow @wiazur, thank you for this information! This looks like exactly what I needed. I'm going to dive in and test today.
Take care
from cognitive-services-rest-api-samples.
Any takers?
from cognitive-services-rest-api-samples.
Anyone?
from cognitive-services-rest-api-samples.
Hi @SeaDude , thanks for the query. Sorry this repo was not being maintained too often. But if you still have your question, I can help you. Are you trying to avoid the time it would take to scan the entire image? Or share more details... thanks.
from cognitive-services-rest-api-samples.
Hi @wiazur, all good, thanks for reaching out.
The use case involves thousands of images that have LOTS of handdrawn text all over the image (very old engineering drawings). I just want to focus on a particular area (the title block) to grab the metadata about the print (who drew it, date of drawing, date of revision, etc.).
When I run the whole image through the API, the results are too much. Ideally, I'd like to say "read only from X pixel to X1 pixel and Y pixel to Y1 pixel" (like a bounding box).
Do you think this is possible with the API or would I need to do some preprocessing and only send the titleblock image?
from cognitive-services-rest-api-samples.
Hi @SeaDude ,
This feature isn't built into the Computer Vision API, but you should be able to use a third-party library to programmatically crop each image before performing OCR.
from cognitive-services-rest-api-samples.
@SeaDude which API call were you using for OCR, Recognize Text or Batch Read File? And is the title block always at the top of the image or could it potentially be anywhere?
from cognitive-services-rest-api-samples.
Hi @wiazur, using Recognize Text. Title block is always in the bottom, right corner of the image.
from cognitive-services-rest-api-samples.
Hi @wiazur, thanks for continuing the chat. I'm using Recognize Text. Title block is always in the bottom, right corner of the image within a predicatable bounding box size.
Hi @PatrickFarley, the challenge is that the entire image needs to be associated with the metadata pulled from the title block. I'm trying to get away from creating a complex pipeline of crop image-->store cropped image-->associate cropped image with original-->send cropped image to API-->Get metadata--> Associate metadata with original image--> etc.
Challenging!
from cognitive-services-rest-api-samples.
Hi @SeaDude, what language are you using and would you consider using an SDK or does it need to be a REST call?
from cognitive-services-rest-api-samples.
Believe it or not, I was using Power Automate (the artist formerly known as "Flow"), another Microsoft product to make things happen. Being a citizen dev, I'm not formally trained in any particular language...unless "hardcore low-code" is considered a lingo :)
I'm up for scripting a solution in python if need be.
from cognitive-services-rest-api-samples.
Hm. The OCR API docs are quite different from say, the Azure Maps API docs where all available query parameters are shown, which ones are required, etc. are shown.
Does anyone know where the OCR API Batch Read query parameters are listed?
Thanks again for the continued engagement!
from cognitive-services-rest-api-samples.
Hi @wiazur ,
I'm working through the example you provided (thanks again!).
I'm no python dev, but so far I have been able to get:
- A Jupyter notebook spun up and your code sample pasted in
- The
computervision
pip
package installed and imported - A Computer Vision instance in Azure spun up and my
key
andendpoint
authenticated in your sample
Stuck on what to do for the images_list
, cropped_images_path
, working_directory
and cropped_images_list
in the script...
Will you guide me here?
'''
Load and crop images
'''
images_list = [] <-----
cropped_images_path = [] <-----
working_directory = os.path.dirname(__file__) <-----
# Create an Image object from each image in a directory
for filename in glob.glob('Images\*.jpg'): # assuming all images are jpg
imageObject = Image.open(filename)
images_list.append(imageObject)
path = os.path.join(working_directory, filename.replace('Images\\', 'CroppedImages\\'))
cropped_images_path.append(path)
# Optional, draw bounding box around desired line of text, show image
# original_image = Image.open('Images\coffee1.jpg').convert("RGBA")
# draw = ImageDraw.Draw(original_image)
# draw.rectangle(((110, 540), (425, 630)), outline="red")
# original_image.show()
# Crop each image in your list at the same place
cropped_images_list = [] <-----
for image in images_list:
# Don't exceed your image height and width
# w, h = image.size
# print('Image width & height:', w, h)
cropped = image.crop((110,540,425,630)) # edges: left, top, right, bottom
cropped_images_list.append(cropped)
# Optional, to display cropped image
cropped.show()
# Save the cropped images
for i in range(len(cropped_images_list)):
# Convert cropped images back to PIL.JpegImagePlugin.JpegImageFile type
b = BytesIO()
cropped_images_list[i].save(b, format="jpeg")
b.close()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-19-45102c6bc387> in <module>()
65 cropped_images_list[i].save(b, format="jpeg")
66
---> 67 b.close()
68
69 '''
NameError: name 'b' is not defined
from cognitive-services-rest-api-samples.
Hi @SeaDude, yes the API site is not as informative as some others. I think this is because the Batch Read File is still fairly new. I would expect it to be added to that document you sent the link for, at some point. We are using the SDK though, so much of the variance is through the batch_read_file_in_stream
parameters, which are explained a bit more in the SDK reference and the SDK source code--see the batch_read_file_in_stream
function.
If I understand you correctly, you are curious to know what to use for images_list
, cropped_images_path
, working_directory
and cropped_images_list
. The images list would be the images you have, your engineering images, the full image (you can always change the image type to PNG if needed in the code, it's currently set to use JPGs). The cropped images are created by the script, so you don't need to provide anything, just create a folder for them if you haven't already. If you've created and placed the folders named Images and CroppedImages in your working directory, the code is all set to withdraw from and put images into these folders, so you would not need to change any of the path variables. You just need to add your own images to the Images folder you created.
The working directory is the name of the directory your python script is in and where you'll put your 2 images folders.
Concerning the stacktrace error, my guess is the IDE you are using is not seeing the b
because it considers it out of scope. So if you move the b = BytesIO()
line back into the for loop, this will fix it. Like this:
# Save the cropped images
for i in range(len(cropped_images_list)):
# Convert cropped images back to PIL.JpegImagePlugin.JpegImageFile type
b = BytesIO()
cropped_images_list[i].save(b, format="jpeg")
b.close()
Btw, is there a better language you would like to use? This SDK is available in C#, Java, and Node too. Since we create all our samples in these languages, I could create specific Batch Read File samples for them too.
Let me know if you need more information, thanks!
from cognitive-services-rest-api-samples.
Awesome info @wiazur, you're the best!
- I'll add Images and CroppedImages folders to my working directory
- I'll populate Images with some sample images (they're actually multipage PDF's, which will be interesting, but I'll figure it out)
- I'll move the
b = BytesIO()
into the loop and see if that helps. I'm using Azure Notebooks so still getting my directory structure down
RE: ... is there a better language you would like to use?
:
- I'm a PowerPlatform (low-ish code) guy (PowerApps, PowerAutomate, PowerBI) so I'm used to creating a Custom Connector (API wrapper) and hitting API's then displaying responses in PowerApps/parsing responses in PowerAutomate.
- I'm also very proficient with PostMan for testing API's and chaining responses-requests together to perform quasi-app workflows before implementing.
- I've dabbled with Jupyter Notbooks and the associated libraries (
[requests](https://2.python-requests.org/en/master/)
) for making api calls and manipulating responses in data vis tools likeAltair
, but I'm rudimentary at best in Python.
Will report back on progress~! Thanks again!
from cognitive-services-rest-api-samples.
Glad to help!
There is also a way to add a PDF stream, if that helps. See code below.
'''
Read and extract from the image
'''
def pdf_text():
# Images PDF with text
filepath = open('TextImages.pdf','rb')
# Async SDK call that "reads" the image
response = client.batch_read_file_in_stream(filepath, raw=True)
# Don't forget to close the file
filepath.close()
# Get ID from returned headers
operation_location = response.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
# SDK call that gets what is read
while True:
result = client.get_read_operation_result(operation_id)
if result.status not in ['NotStarted', 'Running']:
break
time.sleep(1)
return result
'''
Display extracted text and bounding box
'''
# Displays text captured and its bounding box (position in the image)
result = pdf_text()
if result.status == TextOperationStatusCodes.succeeded:
for textResult in result.recognition_results:
for line in textResult.lines:
print(line.text)
#print(line.bounding_box)
from cognitive-services-rest-api-samples.
Related Issues (19)
- Output searchable PDF or hOCR HOT 4
- python example is not working HOT 1
- Error shown in Ink recogniser java app HOT 1
- Getting error message "3001- Preprocessing failed" and additional "Unable to download blob" HOT 1
- Speech Translation with REST API
- Can we add Cognitive Services Topic to repio
- Incorrect endpoint in Bing Image Search Tutorial
- Required minimum Node.js bump for Azure SDK
- Required minimum Node.js bump for Azure SDK
- BingWebSearchv7.java API Error
- Invalid url, should not contain bing/
- Request to update guide on how to export/download Custom QA knowledge bases via REST API.
- SearchResults not found anywhere HOT 2
- endpoint is changed from "api.cognitive.microsoft.com/bing to "api.bing.microsoft.com
- Update endpoint for Bing Search HOT 1
- Sending Batch request to azure cognitive API for TEXT-OCR HOT 2
- Train request- Error 1002 HOT 7
- Tags changed for description of a person laid on the floor. Description used to contain "lay" if image comprised a person on the floor. This is no longer the case. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cognitive-services-rest-api-samples.