Getting full control over your files at any time is vital to alleviate your daily tasks and increase your efficiency. Achieve any goal with DocHub tools for papers management and practical PDF file editing. Gain access, modify and save and incorporate your workflows along with other safe cloud storage services.
DocHub provides you with lossless editing, the chance to use any formatting, and securely eSign documents without the need of searching for a third-party eSignature option. Maximum benefit of the document management solutions in one place. Consider all DocHub features right now with your free account.
[Music] hello everyone im chirag and welcome to part 4 of the tutorial series on amazon tech strike so guys in this video i will take you through on how to extract the text from a multi-page pdf file and how we can save the output as a csv and as a part of the csv we will have two columns which are page number and the text so we will store the text of each page as an individual row along with the page number so lets get started so we will go through this diagram first as you can see on my screen so i will quickly explain you the overall flow so here we have the pdf file so the user will upload a pdf file to the s3 bucket within async hyphen doc hyphen text directory or a folder which will in turn trigger the lambda function which would be responsible for creating the amazon text track job so here we are going to perform simple ocr or detect document text but since uh we are dealing with the multi-page pdf file this is going to be the asynchronous invocation so earlier what was happe