Should Everything be Transformed in spaces?

Luigi Pizza (6120 points)
11 20 65
asked Nov 12, 2021 in HW4 by Luigi Pizza (6,120 points)
recategorized Nov 19, 2021 by Luigi Pizza

I've written a code to sum all the occurrences of every character in each file we have been given, and this is the result:

('\n', 1596)
(' ', 9117)
('!', 108)
('"', 155)
("'", 438)
('(', 12)
(')', 12)
(',', 1157)
('-', 29)
('.', 353)
(':', 101)
(';', 90)
('?', 42)
('[', 3)
(']', 3)
('`', 20)
('—', 1)

Should every character above be transformed into spaces, even, for example, the quotes and the grave accent('`')? 

Note: There are no numbers in any of the text files we have been given

2 Answers

gianluca5539 (9820 points)
3 6 44
answered Nov 12, 2021 by gianluca5539 (9,820 points)
There are some characters that need to be transformed into a space, others that should be removed. This obviously depends on how you separate the words after that. You could also just replace everything with a space, as I said it depends on how you then separate the words. Do what you think is best, just be sure to change your code accordingly so as not to get for example 3 spaces in a row which then you feed to the pronouncing functions, so 3 zeros in a computed sequence of accents, for example.
Luigi Pizza (6120 points)
11 20 65
commented Nov 12, 2021 by Luigi Pizza (6,120 points)
I understand what you are saying, but because (as can be seen in line 44 and 45 of test_01) the matrix we output is checked, we might find ourselves with different result, which will give an error. I'm just wondering if there is a general rule to this.
gianluca5539 (9820 points)
3 6 44
commented Nov 12, 2021 by gianluca5539 (9,820 points)
If you write the code according to how you handle these special characters, the output should be the same. Remember: a word is any length of alphabetical characters, as it's written in the assignment.
laertleba (2840 points)
8 26 41
answered Nov 19, 2021 by laertleba (2,840 points)
Some characters you can just remove, for example '?' or '!', some others need to be turned into a space (example: '-', " ' "). To simplify the code you can just replace all non alphabetical characters into spaces. It wont have negative consequences in any test. From the definition given for a word this is the way it should be done, every character that is not alphabetical can be a separator of words.