HW4 recovery help

C
CameronR (220 points)
0 2 4
asked Jan 6, 2021 in HW4 2nd chance by CameronR (220 points)
Some help on the recovery homework would be greatly appreciated.

I am able to read in and condition all the files, translate the Greek (text) to English and English to Italian but then I have the problem that this can't be done with the punctuation included so it would not count as a valid output.

If I try to translate the Greek text to English and then Italian text to English I run into the issue that the lexicon-it-en translates phrases to phrases so this link can only be looked up by testing if each phrase of the dictionary is in the paragraph if Italian text that I am searching. I feel like this would take too much time but also still doesn't enable me to include the original punctuation of the Italian text.

I would like some ideas of how to go about making the translations and checking their presence in the texts whilst being able to output them with punctuation still. Also, whether or not all of the tests check for perfect accuracy in which case maybe I could pass enough tests for a passing grade is possible?

I want to say that this homework is too difficult but, of course, I could have missed something very obvious which is why I am asking here - and hoping a teacher could offer advice as it doesn't seem many people have found the solution yet!

Thanks in advanced
152 views

1 Answer

answered Jan 7, 2021 by Claudio.DiCiccio (2,770 points)

Dear CameronR,

If I got your point correctly, you are capable to solve the exercise with your first strategy, but that implies you lose the punctuation along the way – am I correct? If that is the problem, you may want to consider the following strategy – though we can’t exchange details on the solution: how about preparing auxiliary data structures that, for every paragraph, map the indices of the words in the punctuation-free texts to the indices in the original texts? By doing so, you should be able to revert the list of words in the punctuation-free text to (so to speak) slices of the original one. Maybe this helps.

Best,
CDC

C
CameronR (220 points)
0 2 4
commented Jan 7, 2021 by CameronR (220 points)
Thank you! I will give that idea a go.