Page 2 of 2

Re: Make the changelog parser more lenient, please!

Posted: Thu Apr 11, 2019 8:05 am
by Pi-C
badtouchatr wrote:
Thu Apr 11, 2019 7:32 am
Pi-C: I agree 99.9999993% (sorry, rounding error ;) )

I'm going to reserve judgment until I make the next update, but I'm leaning towards one of these possibilities:

- something doesn't work right for the very first time you upload a new mod, or
- the parser is just acting moody.
:-)

Re: Make the changelog parser more lenient, please!

Posted: Thu Apr 11, 2019 8:06 am
by Pi-C
Bilka wrote:
Thu Apr 11, 2019 7:34 am
As a sidenote, some of the errors have already been improved a bit. I changed the "no colon after category error" to instead say "category line does not end in colon", because that is what it is actually checking - a space after the colon will throw this error. The "duplicate date" error now says "duplicate date or version" and the basically empty "error on line x" now says what it expects the line to start with.
Thanks, that will help a lot!

Re: Make the changelog parser more lenient, please!

Posted: Thu Apr 11, 2019 5:51 pm
by orzelek
I looked at the RSO changelog that is maintained on mod portal description page and then at the changelog tutorial.... and I'm sorry thats not happening.
Game has more strict requirements for changelog then any of programming languages I'm using. It's actually easier to code mod then a changelog.

While I see benefit of changelogs in game and I might start a new one from some version - conversion of whole thing seems like a lot of work and error messages with similar level of helpfulness as in C++ compiler when using nested templates and a lot of std types.

Anyone plannign to write some kind of changelog preparation tool where you could enter versions and text and it would format the whole thing?

Re: Make the changelog parser more lenient, please!

Posted: Thu Apr 11, 2019 5:59 pm
by badtouchatr
orzelek wrote:
Thu Apr 11, 2019 5:51 pm
Anyone plannign to write some kind of changelog preparation tool where you could enter versions and text and it would format the whole thing?
That actually sounds like an awesome idea, and I would even be willing to do that, or help with that. With the stipulation, of course, that we get complete documentation from the devs on the changelog parser rules, which I believe Bilka is working on. :)

Re: Make the changelog parser more lenient, please!

Posted: Fri Apr 12, 2019 5:32 pm
by JAetherwing
Rseding91 wrote:
Wed Apr 10, 2019 10:20 pm
So, I read all of the posts so far here. I agree the error(s) should be more explicit about what is incorrect. I don't agree that any of them should be made more lenient.

Factorio doesn't do "it's close, so I'll fix it" logic - the thing is either correct or incorrect - and in the case of changlog files they're incorrect if they don't match the exact formatting required by the changlog system.

We specifically don't allow any variance because there's no reason for it. If the thing is wrong... then it's wrong.
I agree with this regarding ambiguous syntax or changelogs that do something different that the parser expects.

But please, enlighten me on why on earth the parser requires exactly 99 dashes for a divider and fails for a divider made from, say, 80 dashes. Especially since this is not documented anywhere.

I'm totally on your side that the changelogs should be in a standardized syntax, but those arbitrary restrictions are just ludicrous.

Re: Make the changelog parser more lenient, please!

Posted: Fri Apr 12, 2019 7:16 pm
by Rseding91
JAetherwing wrote:
Fri Apr 12, 2019 5:32 pm
I agree with this regarding ambiguous syntax or changelogs that do something different that the parser expects.

But please, enlighten me on why on earth the parser requires exactly 99 dashes for a divider and fails for a divider made from, say, 80 dashes. Especially since this is not documented anywhere.

I'm totally on your side that the changelogs should be in a standardized syntax, but those arbitrary restrictions are just ludicrous.
Look at the changelog section here: https://www.boost.org/users/history/version_1_70_0.html for the "Beast" section. THAT is why we force one and exactly one format on the entire changlog.

Re: Make the changelog parser more lenient, please!

Posted: Fri Apr 12, 2019 8:43 pm
by BlueTemplar
bobingabout wrote:
Tue Apr 09, 2019 3:42 pm
I think the fact that it specifically has to be UTF-8 without BOM, or it FAILS is pretty brutal.
If your text editor doesn't by default save text as UTF-8 (without BOM), then it's your text editor that is to blame.
Even Microsoft ended by making it default in Windows Shell & Notepad
(though internally it still uses an obsolete version of Unicode).
(The update for WIndows 10 should be available this spring.)

jamiechi1 wrote:
Wed Apr 10, 2019 3:08 pm
I have been using notepad++ for years and had to set it to utf-8 (without BOM) to edit html and javascript properly. So I haven't seen some of the issues others see.
I believe the reason for the differences in how things are parsed versus in-game, is due to the different environments.
The web page most likely uses javascript (in html) and the game uses c++ internally. (Different parsers)
Yeah there might be an issue with C++, which handles UTF-8 poorly :
https://stackoverflow.com/questions/171 ... 5#17106065

And it looks like it doesn't even handle it the same way depending on what OS it was compiled on / is being run on ?
https://alfps.wordpress.com/2011/11/22/ ... pproaches/
https://alfps.wordpress.com/2011/12/08/ ... ream-mode/

(so, C++ might choke on a BOM ?)

jamiechi1 wrote:
Wed Apr 10, 2019 3:08 pm
There definately needs a formal document somewhere to explicitly define the required document mark up required.
White space, including linefeeds should be ignored in processing the change log information, just as it is ignored for the most part in many languages such as Lua and C++.

Maybe XML should be used for the change logs. This will make it easier to define exactly where things go and what they are.

Maybe a better option is to make it simple like windows used to do in an 'ini' file. An example of a simple 'ini' file format is what they use in the fallout games ini files. Square brackets to delineate sections and simple text with no need to worry about encoding styles.

Keep it simple.
Orv wrote:XML proved handily that it's possible to make something verbose and inefficient for computers without actually making it human-readable.
(Also, in my experience, while XML in theory now supports UTF-8, hardly any XML tool / library does...)

The go-to human-readable format these days seems to be YAML :
Image
(the current changelog syntax might already be a stricter version of it ?)

Re: Make the changelog parser more lenient, please!

Posted: Sun Apr 14, 2019 10:43 am
by Pi-C
Bilka wrote:
Thu Apr 11, 2019 7:34 am
As a sidenote, some of the errors have already been improved a bit. I changed the "no colon after category error" to instead say "category line does not end in colon", because that is what it is actually checking - a space after the colon will throw this error. The "duplicate date" error now says "duplicate date or version" and the basically empty "error on line x" now says what it expects the line to start with.
Just noticed another new error message:

Code: Select all

invalid changelog file, error on line 1, line does not start with exactly '    - ' or exactly '      '
Thanks for implementing it, that should be very helpful!

For the sake of an example, I'll document a complete debug session for a very basic changelog file. Let's see if I can get it to work by following the error messages. That's what I will start with:

Code: Select all

v0.0.1
-ported to 0.17
It shows this error:

Code: Select all

invalid changelog file, error on line 1, line does not start with exactly '    - ' or exactly '      '..
Not quite as expected, as an error on line 1 always means that the first changelog entry doesn't start with a proper header line. But just relying on the error message, I'll correct that anyway:

Code: Select all

    - v0.0.1
-ported to 0.17
The parser now reports:

Code: Select all

invalid changelog file, error on line 1, missing category.
Let's add one:

Code: Select all

Info:
    - v0.0.1
-ported to 0.17
I still get the same error:

Code: Select all

invalid changelog file, error on line 1, line does not start with exactly '    - ' or exactly '      '
However, this line ends with a colon, so it should be regarded as an incorrect category line. The error message is misleading because category lines must be indented with only two spaces. With the current error message, we'd just end up in a vicious circle. Let's try again, this time with a correct category line:

Code: Select all

  Info:
    - v0.0.1
-ported to 0.17
Now it's better:

Code: Select all

invalid changelog file, error on line 1, missing version
I'll just move the version line up, and because it didn't work previously, I fiddle around a bit and end up with a correct Version line:

Code: Select all

Version: 0.0.1
  Info:
-ported to 0.17
Doesn't help though, we're back where we started:

Code: Select all

invalid changelog file, error on line 1, line does not start with exactly '    - ' or exactly '      '..
In my assumed role as a mod author, I give up at this point. :-)


Let's start all over again:

Code: Select all

v0.0.1
-ported to 0.17
Now, let's assume I get a message like this:

Code: Select all

invalid changelog file, error on line 1, line is not a valid header line.
So I add a header line:

Code: Select all

--------------------------------------------------------------------------------------------------
v0.0.1
-ported to 0.17
I still get an error, because this header line contains only 98 dashes. :-) So, it would make sense to make the error message more explicit:

Code: Select all

invalid changelog file, error on line 1, line is not a valid header line (must only contain exactly 99 dashes)
I finally have figured out what's wrong, so the file now is:

Code: Select all

---------------------------------------------------------------------------------------------------
v0.0.1
-ported to 0.17
This time, the error message is actually helpful again:

Code: Select all

invalid changelog file, error on line 2, missing Version: line.
So I take the clue and correct line 2:

Code: Select all

---------------------------------------------------------------------------------------------------
Version: 0.0.1
-ported to 0.17
Now I get

Code: Select all

invalid changelog file, error on line 3, line does not start with exactly '    - ' or exactly '      '..
The missing category isn't mentioned yet, but let's just ignore that for now and fix the error:

Code: Select all

---------------------------------------------------------------------------------------------------
Version: 0.0.1
    - ported to 0.17
Ah, here's the expected message:

Code: Select all

invalid changelog file, error on line 3, missing category
So we add a category:

Code: Select all

---------------------------------------------------------------------------------------------------
Version: 0.0.1
Info:
    - ported to 0.17
There's still an error:

Code: Select all

invalid changelog file, error on line 3, line does not start with exactly '    - ' or exactly '      '..
This line is supposed to be a Category line, however, because it ends with a colon. But the patterns suggested by the current error message are not correct in this case. Something like

Code: Select all

invalid changelog file, error on line 3, line does not start with exactly '  ' or exactly  '    - ' or exactly '      '.
would be better. We could then proceed with

Code: Select all

---------------------------------------------------------------------------------------------------
Version: 0.0.1
  Info:
    - ported to 0.17
and the changelog would be parsed without an error.

Summary:
An error on line 1 should always result in

Code: Select all

invalid changelog file, error on line 1, line is not a valid header line (must only contain 99 dashes)
An error message for lines after the Version line should also include the 2-space indention for Category lines:

Code: Select all

invalid changelog file, error on line x, line does not start with exactly '  ' (Category: )or  '    - ' or exactly '      ' (list of entries below a category)
I've not enough time now for more testing with more entries, but the suggested changes should improve your changes to the error messages even further! :-)

Edit: Had to tag the error messages as code because a series of spaces is reduced to one space otherwise. If only there was a tag that would allow inline code (prints everything between start and end tag as is, preserving multiple spaces, but doesn't add an extra box like the code tag) in a post! :-)

Re: Make the changelog parser more lenient, please!

Posted: Wed Jul 24, 2019 8:28 pm
by Trebor
I made a sed script that can be used to clean up some problems with change logs. It was inspired by the sed snippets from the original post. Since I don't have GNU's sed this uses a Posix sed.

Code: Select all

# Clean up tabs, spaces and blank lines.
s/	/        /g
s/ +$//
/^$/d

# Clean up lines just containing dashes.
/^ +-+$/s/.*/----/
/^-+$/s/-+/---------------------------------------------------------------------------------------------------/
${/^-+$/d
  # Note detecting dashes on the last line only works if there are no blank lines after!
}

# Make sure the first line is a header, all headers are 99 dashes and the last line is not a header.
1{/^-+$/!i\
---------------------------------------------------------------------------------------------------
}

# Fix version and date lines.
/^ *[Vv][Ee][Rr][Ss][Ii][Oo][Nn][ :]/{
  s/^ *[Vv][Ee][Rr][Ss][Ii][Oo][Nn] *:? *(.+)$/Version: \1/
  b
}
/^ *[Dd][Aa][Tt][Ee][ :]/{
  /^ *[Dd][Aa][Tt][Ee] *:?$/d
  s/^ *[Dd][Aa][Tt][Ee] *:? *(.+)$/Date: \1/
  b
}

# Make sure other lines are indented correctly.
s/^ *([^-].+)/      \1/
/^-+$/!s/^ *- *(.+)$/    - \1/

# Clean up any categories.
s/^ *(( *[^ :])+) *: *-+$/  \1:/
/^ *[^:]+ *: *-+/{
  h
  s/^[^:]+: *- *(.+)/    - \1/
  x
  s/^ *(( *[^ :])+) *:.*$/  \1:/
  G
}
If we run it against this ugly change log:

Code: Select all

   versION : 0.1.0

Date:
    Date : 2019/07/22

  category          :-

othercat    : - stuff

hello
------------
Using the command: sed -Ef cleanup.sed changelog

We get this new change log:

Code: Select all

---------------------------------------------------------------------------------------------------
Version: 0.1.0
Date: 2019/07/22
  category:
  othercat:
    - stuff
      hello
Edit: Found a bug, updated sed script.

Re: Make the changelog parser more lenient, please!

Posted: Thu Jul 25, 2019 6:18 pm
by ssilk
The trend for many newer programming language is to have a pretty printer included. You run the prettifier over your code and it formats everything as your (company, team, whatever) rules tell. Works also for text-formats. Even for JSON there are lots of online-pretty-printers. E.g. https://jsonformatter.curiousconcept.com/

Same could be implemented as a service for that Changelog file onto the mods page.

Re: Make the changelog parser more lenient, please!

Posted: Thu Jul 25, 2019 6:37 pm
by Trebor
ssilk wrote:
Thu Jul 25, 2019 6:18 pm
The trend for many newer programming language is to have a pretty printer included. You run the prettifier over your code and it formats everything as your (company, team, whatever) rules tell. Works also for text-formats. Even for JSON there are lots of online-pretty-printers. E.g. https://jsonformatter.curiousconcept.com/
This was done not so much for pretty printing it for the mod portal, but to get the formatting correct so Factorio would accept it.