duolicious / duolicious-backend Goto Github PK

View Code? Open in Web Editor NEW

34.0 1.0 5.0 15.46 MB

The backend of the Duolicious dating app

Home Page: https://duolicious.app

License: GNU Affero General Public License v3.0

Python 60.67% Shell 25.73% Dockerfile 0.69% PLpgSQL 12.89% HTML 0.03%

dating postgresql python

duolicious-backend's Introduction

Duolicious Backend

There's screenshots of the app at https://github.com/duolicious.

Contributing

There's three ways you can contribute:

Tell your friends about Duolicious and share on social media! This is the best way to make it grow.
Donate on Ko-fi: https://ko-fi.com/duolicious
Raise a pull request. Developer instructions can be found at DEVELOPER.md.

duolicious-backend's People

Contributors

Stargazers

Watchers

Forkers

halthain schwabischesbauernbrot

duolicious-backend's Issues

Deal with old chats

Nobody should be able to participate in a chat where:

One person is blocked by the other
One person deleted their account
One person deactivated their account

Filter explicit images automatically

During my pilot deployment, users uploaded more explicit images than I expected. About 4% of them were explicit. It doesn't sound like a lot, but even one can be enough to ruin the experience for most users. Less than 0.1% seems like a good goal. There must be a tiny, free neural net I can run each time an image is uploaded.

Search Age Range: Default value should depend on users age

Once the app has a sufficiently big audience, the default age range filter should be the user's age +/- 5. Make sure not to exceed the allowable values in the UI.

Deactivate accounts of people who have never been online

If someone's never been online before, they'll have no entry in the duo_chat.last table. The account deactivator requires them to be online at least once to figure out when they were last online. One solution would be to insert a row into the last table when someone signs up and set it to now().

Related: #60.

Feature: More specific bio tags about kids

Include options which communicate something to the effect of:

I'm open to becoming a step parent
I'm on the fence but maybe I can be convinced

In general, a dictionary disambiguating the meaning of each bio option would be nice.

Here's some suggestions from ChatGPT:

Definitely want kids: I'm looking to start or expand a family in the future.
Do not want kids: I'm certain I don't want children.
Open to being a step-parent: I'd welcome the opportunity to be a part of a pre-existing family.
Undecided about biological kids: I'm uncertain about having my own children, but I could be open to the idea.
Open to adoption or fostering: I'd consider adopting or fostering children.
Depends on partner: My decision largely depends on my future partner's wishes.
It's complicated: My feelings on this are complex and are best discussed in person.

Put all the cloud run containers on one machine

Cloud Run is too expensive

Use ejabberd

MongooseIM is using 12% CPU with 34 open connections. But mongoose ejabberd doesn't have mod_inbox. Someone on Stack Exchange said it should take a day for someone who knows elixr to port the module though.

Traits' orders are only partially sorted

Traits for which the app doesn't have enough data tend to appear near the middle of the list for some users. Not sure why this happening. I think I might've assigned those traits a numerical value of 0 instead of -1, which is what it used to be.

Use api service to implement blocking, maybe

Related: #65

Catch errors in chat service

Photo deletion strategy

Consider having a "photo graveyard" table. In the same transaction where photos are deleted from their usual tables, add uuids to the graveyard. Delete old photos in a batch job. Make sure also to handle updated photos.

Performance: Uncomment some autodeactivate code

https://github.com/duolicious/duolicious-backend/pull/103/files

Let users change their email address

Validation for 'Education' and 'Occupation'

Users can upload arbitrarily long strings at the moment

Use cron service to periodically delete expired sessions

Search: Bigger `LIMIT` during initial pass, smaller `LIMIT` during final pass

Change this to something like 2500 then add a limit here which is something like 250. Why? Because:

Having a bigger limit in the initial pass makes the approximation less-wrong.
Bigger limits make the query take more time, but inserting into search_cache is the slowest part of the query. So we can maintain similar query speed while improving search result accuracy by making the initial pass much bigger and making the final pass only slightly smaller.
More concretely, when there's 1000 profiles, the entire query takes about 120ms. Just the selection without inserting into search_cache takes about 30 ms. So insertion takes about 90 ms. If those two parts of the query scale linearly with the sizes of the limits, using the limits suggested in the first sentence would make the query take about 2500 * (30 / 1000) + 250 * (90 / 1000) = 97.5 milliseconds.

Make the "Verify You Email" emails look prettier

Make message normalization even stricter

Remove non-alphanumeric characters
Remove repeated characters

Do something about the reply rate

The reply rate sucks. Consider matching people who have a high reply rate. It'll compound the issue for people who never talk, but they never talk anyway.

Ideally, the app would get the tight-lipped folks speaking too.

Let users change their name

Search Furthest Distance: Default value should be the smallest value such that users see at least `x` others

Not sure what a good value of x is.

`begin; drop table mam_message_backup; commit;`

I might've fixed the root cause of #82. Either way, I cleaned up the DB a little bit with this:

begin;

CREATE TABLE mam_message_backup(
  -- Message UID (64 bits)
  -- A server-assigned UID that MUST be unique within the archive.
  id BIGINT NOT NULL,
  user_id INT NOT NULL,
  -- FromJID used to form a message without looking into stanza.
  -- This value will be send to the client "as is".
  from_jid varchar(250) NOT NULL,
  -- The remote JID that the stanza is to (for an outgoing message) or from (for an incoming message).
  -- This field is for sorting and filtering.
  remote_bare_jid varchar(250) NOT NULL,
  remote_resource varchar(250) NOT NULL,
  -- I - incoming, remote_jid is a value from From.
  -- O - outgoing, remote_jid is a value from To.
  -- Has no meaning for MUC-rooms.
  direction mam_direction NOT NULL,
  -- Term-encoded message packet
  message bytea NOT NULL,
  search_body text,
  origin_id varchar,
  PRIMARY KEY(user_id, id)
);

WITH t1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY
        user_id, from_jid, remote_bare_jid, remote_resource, direction, search_body
      ORDER BY
        user_id, from_jid, remote_bare_jid, remote_resource, direction, search_body, id
    ) AS rn
  FROM mam_message
  where search_body <> ''
), t2 AS (
  SELECT
      *
  FROM t1
  where rn > 1 and direction = 'I'
  order by search_body, rn
), t3 AS (
  delete from mam_message where id in (select id from t2) returning *
)
insert into mam_message_backup
select * from t3;

commit;

Now the situation's like this:

duo_chat=# select count(*), direction
from mam_message
group by from_jid, remote_bare_jid, direction, search_body
having count(*) > 1
order by direction, count desc;
 count | direction 
-------+-----------
     5 | I
     5 | I
     4 | I
     4 | I
     3 | I
     3 | I
     3 | I
     3 | I
     3 | I
     3 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     2 | I
     5 | O
     4 | O
     4 | O
     2 | O
     2 | O
     2 | O
     2 | O
     2 | O
     2 | O
     2 | O
     2 | O
     2 | O
     2 | O
(46 rows)

That query once returned 593 rows.

Anywho, I'm gonna let that cook for a little bit and keep monitoring the situation. At some point I should hopefully be able to give it one of these:

begin;
drop mam_message_backup;
commit;

Use cron service to periodically deactivate accounts

Big 5 "Introversion/Extraversion" scale should just be "Extraversion"

The in-depth screen shows the big 5 personality traits on 0%-100% scales, except for "Introversion/Extraversion". That should be fixed.

Automatically detect profile text violating Apple's and Google's terms

Relates to: #37

Start shuffling questions after 100, not 250

The question bank has been ordered so that questions of better quality some first. Consequently, users would have the best experience if they answer them in-order, given the current matching algorithm.

But I'd like to improve the algorithm in the future. I plan to do that with an auto encoder. But I need training data for that. If only a very small minority of users ever answer the later questions, I wouldn't have enough training data to predict answers to those later questions, given users' initial answers. So I'm currently randomising the order of the questions, but only after users have answered 250 questions. Having the order of the initial questions be fixed and the later questions be random strikes the balance between (1) optimising the performance of the current algorithm, and (2) optimising the size of the training set I can use to build a better model.

The "250" threshold was chosen by feel, after I tested the personality assessment on myself, to check how few answers I needed to give before the system accurately determined my personality. But while 250 questions was better, 100 worked well too. I've also gotten more data since my initial self-test. I had 50 other users sign up by posting the app on a social media website. Based on their usage, a threshold of 100 would increase the percentage of users who get past the initial, fixed portion of the question bank from about 3% to about 35%:

postgres=# select row_number() over (order by count_answers desc), percent_rank() over (order by count_answers desc), count_answers from person;
 row_number | percent_rank | count_answers 
------------+--------------+---------------
          1 |            0 |          1200
          2 |         0.02 |           564 <- Only 2 out of 51 people answered 250 questions or more
          3 |         0.04 |           242    (and I suspect the person who answered 1200 answered randomly)
          4 |         0.06 |           241
          5 |         0.08 |           226
          6 |          0.1 |           217
          7 |         0.12 |           203
          8 |         0.14 |           186
          9 |         0.16 |           183
         10 |         0.18 |           158
         11 |          0.2 |           148
         12 |         0.22 |           132
         13 |         0.24 |           115
         14 |         0.26 |           114
         15 |         0.28 |           113
         16 |          0.3 |           111
         17 |         0.32 |           110
         18 |         0.34 |           105  <- 18 out of 51 people answered at least 100 questions
         19 |         0.36 |            90
         20 |         0.38 |            87
         21 |          0.4 |            84
         22 |         0.42 |            83
         23 |         0.44 |            72
         24 |         0.46 |            62
         25 |         0.48 |            53
         26 |          0.5 |            50
         27 |         0.52 |            45
         28 |         0.54 |            42
         29 |         0.56 |            41
         30 |         0.56 |            41
         31 |          0.6 |            40
         32 |         0.62 |            35
         33 |         0.64 |            30
         34 |         0.66 |            25
         35 |         0.68 |            24
         36 |         0.68 |            24
         37 |         0.72 |            21
         38 |         0.74 |            16
         39 |         0.76 |            14
         40 |         0.76 |            14
         41 |          0.8 |            12
         42 |         0.82 |             9
         43 |         0.84 |             8
         44 |         0.86 |             5
         45 |         0.88 |             3
         46 |         0.88 |             3
         47 |         0.92 |             1
         48 |         0.94 |             0
         49 |         0.94 |             0
         50 |         0.94 |             0
         51 |         0.94 |             0

Add "getting too many notifications?" to emails

Periodically delete onboardees who don't go through the onboarding process

Some onboardees give up part way through, leaving junk behind, including photos.

Android and iOS notifications

Add 'Marriage' to "Looking For" options

Some notifications are doubled

Immediately upon deploying #55, I noticed that 59 emails had been sent to 58 distinct recipients. That means one recipient received two notification emails. That person had at least one unread chat and intro. The two notifications were sent at 05-09-2023 04:10:45 and 05-09-2023 04:08:04.

Edit: I've been looking at the notifications sent. I've noticed that at least five other people had unread chats and intros, yet they only got one notification email each.

Use cron service to periodically nudge users to keep using the app

Other apps do it

Add a 'lesbian' orientation option

Use AWS SES instead of Brevo

Brevo is less reliable and more expensive

Poll less frequently for new notifications

The current cron service polls for new notifications once a second. I'd like to increase that after it's been running for a while. It'll need to be configurable as an env var so it can be set to 1 second while running tests.

Update API database for "people you've messaged" search filter

I might do this one at the same time as #25.

Color correction for profile pics

Users' profile pictures should have their saturation, brightness and/or contrast increased very slightly in some cases.

CI/CD: Deploying the services should depend on the tests passing

Related: #86

Implement moveToChats using the XMPP proxy

This logic should be in the XMPP proxy. That should make it more reliable.

The motivation for this ticket is that some conversations remain in the "intros" box of the person who sent them. They should have been in the "chats" box.

Only show "Getting too many intros?" when necessary

Getting too many intros? You can keep your profile hidden and message first instead 🕵️. Press here to change your privacy settings.

MAM contains duplicated messages

Not a yes/no question: "Should a pet in the bed be a mutual decision or it’s up to the pet’s favorite human?"

Signup at @example.com should give OTP 000000

Increase OTP timeout to 10 minutes

10 minutes works for Tinder. Why not Duolicious too?

Q&A search filters show everyone's filters

Q_GET_SEARCH_FILTER_QUESTIONS in duolicious-backend/service/question/__init__.py is wrong!

Apply gender search filter to Q&A search results

Make the gender filter bi-directional; it's pretty important.

Optimize autodeactivate2 query performance

Logic was introduced in #111. This query can be optimized by looking at changes only since the last polling event.

In-depth: Prospects' traits should be ordered ordered by magnitude, not alphabetically

Traits approach 50% as people answer more questions

Someone who's answered a few questions tends to have quite extreme values for traits (near 0% or 100%). As they answer more questions, the value tends towards 50%. This shouldn't affect the order of matches too much, but it makes the in-depth screen less useful. Consider re-introducing the information > 0.2 threshold as a fix.

Related: #148