Git Product home page Git Product logo

Comments (17)

wiazur avatar wiazur commented on July 19, 2024 1

Hi @SeaDude, not sure if you still need a solution, but I found one using the Computer Vision SDK. We use the Batch Read File API.

This sample will take in a list of images with text, upload them, crop them around the last line of text (location is adjustable) and then run the API on the list of cropped images. I'll include the images here if you wanted to download and use, but feel free to use your own as well. The images are identical for the sake of testing (and showing a crop in the exact same place).

Add the images (named differently) to an Image folder you create in your working directory.

Here is the sample:
https://github.com/Azure-Samples/cognitive-services-quickstart-code/blob/master/python/ComputerVision/ExtractText.py

coffee1
coffee2

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024 1

Wow @wiazur, thank you for this information! This looks like exactly what I needed. I'm going to dive in and test today.
Take care

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Any takers?

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Anyone?

from cognitive-services-rest-api-samples.

wiazur avatar wiazur commented on July 19, 2024

Hi @SeaDude , thanks for the query. Sorry this repo was not being maintained too often. But if you still have your question, I can help you. Are you trying to avoid the time it would take to scan the entire image? Or share more details... thanks.

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Hi @wiazur, all good, thanks for reaching out.

The use case involves thousands of images that have LOTS of handdrawn text all over the image (very old engineering drawings). I just want to focus on a particular area (the title block) to grab the metadata about the print (who drew it, date of drawing, date of revision, etc.).

When I run the whole image through the API, the results are too much. Ideally, I'd like to say "read only from X pixel to X1 pixel and Y pixel to Y1 pixel" (like a bounding box).

Do you think this is possible with the API or would I need to do some preprocessing and only send the titleblock image?

from cognitive-services-rest-api-samples.

PatrickFarley avatar PatrickFarley commented on July 19, 2024

Hi @SeaDude ,
This feature isn't built into the Computer Vision API, but you should be able to use a third-party library to programmatically crop each image before performing OCR.

from cognitive-services-rest-api-samples.

wiazur avatar wiazur commented on July 19, 2024

@SeaDude which API call were you using for OCR, Recognize Text or Batch Read File? And is the title block always at the top of the image or could it potentially be anywhere?

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Hi @wiazur, using Recognize Text. Title block is always in the bottom, right corner of the image.

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Hi @wiazur, thanks for continuing the chat. I'm using Recognize Text. Title block is always in the bottom, right corner of the image within a predicatable bounding box size.

Hi @PatrickFarley, the challenge is that the entire image needs to be associated with the metadata pulled from the title block. I'm trying to get away from creating a complex pipeline of crop image-->store cropped image-->associate cropped image with original-->send cropped image to API-->Get metadata--> Associate metadata with original image--> etc.

Challenging!

from cognitive-services-rest-api-samples.

wiazur avatar wiazur commented on July 19, 2024

Hi @SeaDude, what language are you using and would you consider using an SDK or does it need to be a REST call?

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Believe it or not, I was using Power Automate (the artist formerly known as "Flow"), another Microsoft product to make things happen. Being a citizen dev, I'm not formally trained in any particular language...unless "hardcore low-code" is considered a lingo :)

I'm up for scripting a solution in python if need be.

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Hm. The OCR API docs are quite different from say, the Azure Maps API docs where all available query parameters are shown, which ones are required, etc. are shown.
Does anyone know where the OCR API Batch Read query parameters are listed?
Thanks again for the continued engagement!

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Hi @wiazur ,
I'm working through the example you provided (thanks again!).
I'm no python dev, but so far I have been able to get:

  • A Jupyter notebook spun up and your code sample pasted in
  • The computervision pip package installed and imported
  • A Computer Vision instance in Azure spun up and my key and endpoint authenticated in your sample

Stuck on what to do for the images_list, cropped_images_path, working_directory and cropped_images_list in the script...

Will you guide me here?

'''
Load and crop images
'''
images_list = []  <-----
cropped_images_path = []  <-----
working_directory = os.path.dirname(__file__)  <-----

# Create an Image object from each image in a directory

for filename in glob.glob('Images\*.jpg'):  # assuming all images are jpg
    imageObject = Image.open(filename)
    images_list.append(imageObject)
    path = os.path.join(working_directory, filename.replace('Images\\', 'CroppedImages\\'))
    cropped_images_path.append(path)

# Optional, draw bounding box around desired line of text, show image
# original_image = Image.open('Images\coffee1.jpg').convert("RGBA")
# draw = ImageDraw.Draw(original_image)
# draw.rectangle(((110, 540), (425, 630)), outline="red")
# original_image.show()
 
# Crop each image in your list at the same place
cropped_images_list = []  <-----
for image in images_list:
    # Don't exceed your image height and width
    # w, h = image.size
    # print('Image width & height:', w, h)
    cropped = image.crop((110,540,425,630)) # edges: left, top, right, bottom
    cropped_images_list.append(cropped)

    # Optional, to display cropped image
    cropped.show()

# Save the cropped images
for i in range(len(cropped_images_list)):
    # Convert cropped images back to PIL.JpegImagePlugin.JpegImageFile type
    b = BytesIO()
    cropped_images_list[i].save(b, format="jpeg")
b.close()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-45102c6bc387> in <module>()
     65     cropped_images_list[i].save(b, format="jpeg")
     66 
---> 67 b.close()
     68 
     69 '''

NameError: name 'b' is not defined

from cognitive-services-rest-api-samples.

wiazur avatar wiazur commented on July 19, 2024

Hi @SeaDude, yes the API site is not as informative as some others. I think this is because the Batch Read File is still fairly new. I would expect it to be added to that document you sent the link for, at some point. We are using the SDK though, so much of the variance is through the batch_read_file_in_stream parameters, which are explained a bit more in the SDK reference and the SDK source code--see the batch_read_file_in_stream function.

If I understand you correctly, you are curious to know what to use for images_list, cropped_images_path, working_directory and cropped_images_list. The images list would be the images you have, your engineering images, the full image (you can always change the image type to PNG if needed in the code, it's currently set to use JPGs). The cropped images are created by the script, so you don't need to provide anything, just create a folder for them if you haven't already. If you've created and placed the folders named Images and CroppedImages in your working directory, the code is all set to withdraw from and put images into these folders, so you would not need to change any of the path variables. You just need to add your own images to the Images folder you created.

The working directory is the name of the directory your python script is in and where you'll put your 2 images folders.

Concerning the stacktrace error, my guess is the IDE you are using is not seeing the b because it considers it out of scope. So if you move the b = BytesIO() line back into the for loop, this will fix it. Like this:

# Save the cropped images
for i in range(len(cropped_images_list)):
    # Convert cropped images back to PIL.JpegImagePlugin.JpegImageFile type
    b = BytesIO()
    cropped_images_list[i].save(b, format="jpeg")
    b.close()

Btw, is there a better language you would like to use? This SDK is available in C#, Java, and Node too. Since we create all our samples in these languages, I could create specific Batch Read File samples for them too.

Let me know if you need more information, thanks!

from cognitive-services-rest-api-samples.

SeaDude avatar SeaDude commented on July 19, 2024

Awesome info @wiazur, you're the best!

  1. I'll add Images and CroppedImages folders to my working directory
  2. I'll populate Images with some sample images (they're actually multipage PDF's, which will be interesting, but I'll figure it out)
  3. I'll move the b = BytesIO() into the loop and see if that helps. I'm using Azure Notebooks so still getting my directory structure down

RE: ... is there a better language you would like to use? :

  • I'm a PowerPlatform (low-ish code) guy (PowerApps, PowerAutomate, PowerBI) so I'm used to creating a Custom Connector (API wrapper) and hitting API's then displaying responses in PowerApps/parsing responses in PowerAutomate.
  • I'm also very proficient with PostMan for testing API's and chaining responses-requests together to perform quasi-app workflows before implementing.
  • I've dabbled with Jupyter Notbooks and the associated libraries ([requests](https://2.python-requests.org/en/master/)) for making api calls and manipulating responses in data vis tools like Altair, but I'm rudimentary at best in Python.

Will report back on progress~! Thanks again!

from cognitive-services-rest-api-samples.

wiazur avatar wiazur commented on July 19, 2024

Glad to help!

There is also a way to add a PDF stream, if that helps. See code below.

'''
Read and extract from the image
'''
def pdf_text():
    # Images PDF with text
    filepath = open('TextImages.pdf','rb')

    # Async SDK call that "reads" the image
    response = client.batch_read_file_in_stream(filepath, raw=True)

    # Don't forget to close the file
    filepath.close()

    # Get ID from returned headers
    operation_location = response.headers["Operation-Location"]
    operation_id = operation_location.split("/")[-1]

    # SDK call that gets what is read
    while True:
        result = client.get_read_operation_result(operation_id)
        if result.status not in ['NotStarted', 'Running']:
            break
        time.sleep(1)
    return result

'''
Display extracted text and bounding box
'''
# Displays text captured and its bounding box (position in the image)
result = pdf_text()
if result.status == TextOperationStatusCodes.succeeded:
    for textResult in result.recognition_results:
        for line in textResult.lines:
            print(line.text)
            #print(line.bounding_box)

from cognitive-services-rest-api-samples.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.